Unix Knowledge base

SG Service guard

1. Main configuration files:

/etc/cmcluster.conf - contains binary & library paths, and path to main "rc startup" file
/etc/cmcluster/cmclnodelist – Contains the list of nodes in the cluster
/etc/cmcluster/cluster_config.ascii - cluster configuration file. Edit, then compile it. This script activates VG's.
/etc/cmcluster/package_name/package_config.ascii - package configuration file. Edit, then compile it.
/etc/cmcluster/package_name/package.cntl - package control script.
/etc/cmcluster/package_name/pkg_control_script.log - package control script log
In "/etc/cmcluster/package_name/" you will also usually find the package "stop" and "start" scripts.

=> A few Examples of config files:

# cat /etc/cmcluster.conf

SGCONF=/etc/cmcluster
SGSBIN=/usr/sbin
SGLBIN=/usr/lbin
SGLIB=/usr/lib
SGRUN=/var/adm/cmcluster
SGAUTOSTART=/etc/rc.config.d/cmcluster
SGFFLOC=/opt/cmcluster/cmff
CMSNMPD_LOG_FILE=/var/adm/SGsnmpsuba.log

# cat /etc/rc.config.d/cmcluster

AUTOSTART_CMCLD=1
NODE_TOC_BEHAVIOR="reboot"

# cat /etc/lvmrc

AUTO_VG_ACTIVATE=0
RESYNC="SERIAL"

{
and vg "sync" activation routines
}

# cat /etc/cmcluster/cmclnodelist

pri-node root
pri-node.company.com root
sec-node root
sec-node.company.com root

# cat /etc/cmcluster/cluster.ascii

# **********************************************************************
# ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE ***************
# ***** For complete details about cluster parameters and how to *******
# ***** set them, consult the Serviceguard manual. *********************
# **********************************************************************

CLUSTER_NAME MYCLUSTER

QS_HOST 162.17.16.3
QS_POLLING_INTERVAL 120000000
QS_TIMEOUT_EXTENSION 2000000

NODE_NAME pri-node
NETWORK_INTERFACE lan0
STATIONARY_IP 162.17.16.2
NETWORK_INTERFACE lan1
NETWORK_INTERFACE lan2
NETWORK_INTERFACE lan3
HEARTBEAT_IP 10.10.120.1

NODE_NAME sec-node
NETWORK_INTERFACE lan0
STATIONARY_IP 162.17.16.82
NETWORK_INTERFACE lan1
NETWORK_INTERFACE lan2
NETWORK_INTERFACE lan3
HEARTBEAT_IP 10.10.120.2

HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 6000000

other parameters not listed...

VOLUME_GROUP /dev/cluvg01
VOLUME_GROUP /dev/cluvg02

# cat /etc/cmcluster/package_name/package.conf

# **********************************************************************
# ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) *******
# **********************************************************************
# ******* Note: This file MUST be edited before it can be used. ********
# * For complete details about package parameters and how to set them, *
# * consult the Serviceguard manual.
# **********************************************************************

PACKAGE_NAME mypackage
PACKAGE_TYPE FAILOVER

NODE_NAME pri-node
NODE_NAME sec-node

AUTO_RUN YES
NODE_FAIL_FAST_ENABLED NO

RUN_SCRIPT /etc/cmcluster/mypackage/start_my_package.sh
HALT_SCRIPT /etc/cmcluster/mypackage/stop_my_package.sh
RUN_SCRIPT_TIMEOUT NO_TIMEOUT
HALT_SCRIPT_TIMEOUT NO_TIMEOUT
other parameters not listed...

After editing those conf files, check and compile them:

# cmcheckconf -C /etc/cmcluster/cluster.ascii # check cluster conf file
# cmapplyconf -C /etc/cmcluster/cluster.ascii # apply cluster conf file

# cmcheckconf –P /etc/cmcluster/package_name/package.conf # check package conf file
# cmapplyconf -P /etc/cmcluster/package_name/package.conf # apply package conf file
2. Main Service Guard commands:

=> Viewing cluster and package status:

# cmviewcl -v

Shows you the detailed status of the cluster, nodes, packages and services.

=> start the cluster:

# cmruncl -v # start entire cluster
# cmruncl -v -n nodename # if only one node is available

=> start cluster on one node:

# cmrunnode -v
# cmrunnode -v othernode # start a single node

This command will start the specified node to join an already running cluster.

=> Running a package

# cmrunpkg [ -n nodename ] packagename

This will run the package on the current node or on the node specified.
Logs will be written in /etc/cmcluster//.log.

=> halt the cluster:

# cmhaltcl -v
# cmhaltcl -f

This will force the packages to halt and after that it halts ServiceGuard operations on all nodes
which are currently running in the cluster,

=> stop cluster on one node:

# cmhaltnode -v
# cmhaltnode -v othernode

This command will halt ServiceGuard operations on the specified node. If any packages are running
on that node, the node will not be halted.

# cmhaltnode -f nodename

Force the node to halt even if there are packages or group members running on it.

=> Halting a package:

# cmhaltpkg packagename

This will halt the package, Logs will be written in /etc/cmcluster/packagename/.log.

=> enable or disable switching attributes for a cluster:

# cmmodpkg -e OR -d packagename

Enabling a package to run on a particular node.
After a package has failed on one node, that node is disabled. This means the package
will not be able to run on that node. The following command will enable the package to run on the specified node.

# cmmodpkg -e -n [nodename] packagename

=> Disabling a package from running on a particular node:

# cmmodpkg -d -n nodename packagename

This will command will disable the package to run on the specified node.
3. Failover: Move a Package:

A "package" is the unit to handle for Service Guard, like for example with a failover operation.

The package has a name to identify it, a "virtual IP address", which can be owned by one of the nodes,
and in DNS the package name is registered with it's "virtual IP", so that clients can always access the application,
no matter on which node the package runs on.

So, suppose we have the nodes "black" and "white", and the package "pkg1".
On both nodes, configuration files are present, like those examples shown in section 1.

So, in configuration files, the associated Volume Group(s) are listed, the package name, the nodes, timing variables,
and "start" and "stop" scripts to start and stop the application which is associated with this package.

So, suppose the package currently runs on "white". Now, let's do a failover to "black":

-- Step 1. Halt the package at "white".
-- You halt a Serviceguard package when you want to stop the package, butyou want the node to continue running in the cluster.
-- Then you must manually start it at the "black".

# cmhaltpkg pkg1

-- Step 2. Start the package at "black".
-- After starting the package using "cmrunpkg", you then must also enable package switching.
-- This is necessary when a package has previously been halted on some node, since halting the package, disables switching.

# cmrunpkg -n black pkg1
# cmmodpkg -e pkg1

However, in some cases, the upper sequence is not enough. At many sites, the correct Volume Group operations has not been
implemented in the package stop and start scripts.
So, in our example we might be forced to peform the following extra steps:

on white: "vgchange -a n vgname" after step 1, halting the package on "white"
on black: "vgchange -a y vgname" before step 2, starting the package on "black"

However, many HP articles say that Service Guard expects a VG to be activated in "exclusive mode". In this case,
the appropriate command would be:

# vgchange -a e vgname

It depends a bit on how the VG was created. All disks in the VG have LVM "metadata", which include volume group activation mode bits.
The most general ones are:

- 00=standard activation mode (-a y). This default setting is normal for a VG in a non-clustered setting.
- 01=exclusive activation mode (-a e). This is the value that Serviceguard usually uses for operation.

So the "-a e" activation mode is the correct one. Nevertheless, at many sites the default "-a y" is used.

See also section 5.
4. Serviceguard daemons:

The main Cluster Management Daemon is a process called "cmcld".
One of its main duties is to send and receive heartbeat packets across all designated heartbeat networks.
Other tasks involves management of packages, node memberships, coordinating other cluster daemons etc..

It is up to Serviceguard to activate a cluster Volume Group on a node that needs access to the data. In order to disable volume
group activation at boot time, we need to modify the startup script "/etc/lvmrc".
The first part of this process is as follows:

AUTO_VG_ACTIVATE=1
Changed to …
AUTO_VG_ACTIVATE=0

Make sure every node has the "/etc/cmcluster/cmclnodelist" file in place.

Here are the OS MC ServiceGurard Components:
/usr/lbin/cmclconfd --ServiceGuard Configuration Daemon (gathers cluster info ie network and vol grp info started in /etc/inetd.conf)
/usr/lbin/cmcld --ServiceGuard Cluster Daemon (determines cluster membership. Package Mgr, Cluster Mgr, and Network Mgr run as parts of cmcld.)
/usr/lbin/cmlogd --ServiceGuard Syslog Log Daemon (used by cmcld to write syslog messages.)
/usr/lbin/cmlvmd --Cluster Logical Volume Manager Daemon (keeps track of Volume group info.)
/usr/lbin/cmomd --Cluster Object Manager Daemon - logs to /var/opt/cmom/cmomd.log (provides info to client about the cluster. /etc/inetd.conf.)
/usr/lbin/cmsnmpd --Cluster SNMP subagent (optionally running) (produces MIB for snmp)
/usr/lbin/cmsrvassistd --ServiceGuard Service Assistant Daemon (fork and exec scripts for the cluster.)
/usr/lbin/cmtaped --ServiceGuard Shared Tape Daemon (keeps track of shard tape devices.)
Each of these daemons also logs to the /var/adm/syslog/syslog.log file.

Information about the starting and halting of each package is found in the package’s
control script log. This log provides the history of the operation of the package control script.
It is found at /etc/cmcluster/pkgname/pkgname.cntl.log or /etc/cmcluster/package_name/control_script.log.

You can also find in /var/adm/syslog/syslog.log which indicate what has occurred and whether or not
the package has halted or started.
5. VG operations with Service Guard:

=> Quick recipy for Adding a VG:
Scan for new disks, if neccessary (ioscan, or reboot etc..)
Create PV's
Create new vgs & lvs
Export vg to map file
Import vg at failover node
Deactivate vg (vgchange -a n vgname)
Make vg cluster aware (vgchange -c y vgname )
Active vg exclusively (vgchange -a e vgname )
Mount new new lvs manually with mount command
Take copy of /etc/cmcluster/pkg/pkg.cntl file
Edit /etc/cmcluster/pkg/pkg.cntl & add new vg, lv details
Copy pkg control files to all failover nodes
=> VG operations:

-> Marking a Volume Group for Serviceguard:

Marking a VG for Serviceguard: # vgchange -c y VGName
Marking a VG as non-Serviceguard: # vgchange -c n VGName

The "vgchange -c y VGName" command marks a volume groups as part of a cluster.
The "vgchange -a n VGName" deactivates a VG in the usual way.

The "marking for SG" is applied automatically by the "cmapplyconf" command when the volume group
is listed in the cluster-wide ASCII file.

->VG Activation options:
Standard Volume Groups Activation: # vgchange -a y VGName
Standard Volume Groups Deactiviation: # vgchange -a n VGName
Exclusive Volume Group Activation: # vgchange -a e VGName
Exclusive Volume Group Deactivation: # vgchange -a n VGName
Shared mode Volume Group activation: # vgchange -a s VGName
Add Comment Edit
/****************************************************************************/
/* Document     : UNIX command examples, mainly based on Solaris, AIX, HP   */
/*                and ofcourse, also Linux.                                 */
/* Doc. Version : 115                                                       */
/* File         : unix.txt                                                  */
/* Purpose      : some examples for the Oracle, DB2, SQLServer DBA          */
/* Date         : 07-07-2009                                                */
/* Compiled by  : Albert van der Sel                                        */
/* Best use     : Use find/search in your editor to find a string, command, */
/*                or any identifier                                         */
/****************************************************************************/




############################################
SECTION 1. COMMANDS TO RETREIVE SYSTEM INFO:
############################################


==========================
1. HOW TO GET SYSTEM INFO:
==========================


1.1 Short version:
==================

See section 1.2 for more detailed commands and options.

Memory:
-------
AIX:     bootinfo -r
         lsattr -E -l mem0
         lsattr -E -l sys0 -a realmem
         svmon -G
         vmstat -v
         vmo -L
         lparstat -i
         or use a tool as "topas" or "nmon" (these are utilities)

Linux:   cat /proc/meminfo
         dmesg | grep "Physical"
         free   (the free command)
HP:      getmem
         print_manifest |grep –i memory  
         dmesg | grep -i phys 
         echo "selclass qualifier memory;info;wait;infolog"|cstm     
         wc -c /dev/mem
         or us a tool as "glance", like entering "glance -m" from prompt (is a utility)
Solaris: prtconf | grep "Memory size"        # total memory
         prtmem
         memps -m
Tru64:   vmstat -P | grep "Total Physical Memory"
         uerf | grep memory


Swap:
-----

AIX:           lsps -a
               lsps -s
               pstat -s
               
HP:            swapinfo -a
Solaris:       swap -l
               prtswap -l
Linux:         swapon -s
               cat /proc/swaps
               cat /proc/meminfo


cpu:
----

HP:       ioscan -kfnC processor		
	  getconf CPU_VERSION		
	  getconf CPU_CHIP_TYPE		
	  model	

AIX:      lparstat (-i)       
          prtconf | grep proc
          pmcycles -m
          lsattr -El procx (x is 0,2, etc..)
          lscfg | grep proc
          pstat -S
          mpstat

Linux:    cat /proc/cpuinfo

Solaris:  psrinfo -v
          prtconf
          psrset -p 
          prtdiag




OS version:
-----------

HP:      uname -a

Linux:   cat /proc/version 
      
Solaris: uname -a
         cat /etc/release   (or other way to view that file, like "more /etc/release")
Tru64:   /usr/sbin/sizer -v

AIX:     oslevel -r   (only high-level version)
         oslevel -s   (shows Version, SP, TL level)
         oslevel -qs  (shows complete history)
         lslpp -h bos.rte

AIX Example:

# oslevel -s
5300-08-03-0831


# oslevel -qs
Known Service Packs
-------------------
5300-08-03-0831
5300-08-02-0822
5300-08-01-0819
5300-08-00-0000
5300-07-05-0831
5300-07-04-0818
5300-07-03-0811
5300-07-02-0806
5300-07-01-0748
5300-06-08-0831
5300-06-07-0818
5300-06-06-0811
5300-06-05-0806
5300-06-04-0748
5300-06-03-0732
5300-06-02-0727
5300-06-01-0722
5300-05-CSP-0000
5300-05-06-0000
5300-05-05-0000
5300-05-04-0000
5300-05-03-0000
5300-05-02-0000
5300-05-01-0000
5300-04-CSP-0000
5300-04-03-0000
5300-04-02-0000
5300-04-01-0000
5300-03-CSP-0000



AIX firmware:
lsmcode -c               display the system firmware level and service processor
lsmcode -r -d scraid0    display the adapter microcode levels for a RAID adapter scraid0
lsmcode -A               display the microcode level for all supported devices
prtconf                  shows many setting including memory, firmware, serial# etc..




  Notes about Power 4 or 5 lpars: 
  -------------------------------

  For AIX: The uname -L command identifies a partition on a system with multiple LPARS. The LPAR id  
  can be useful for writing shell scripts that customize system settings such as IP address or hostname. 

  The output of the command looks like: 

  # uname -L
  1 lpar01 

  The output of uname -L varies by maintenance level. For consistent output across maintenance levels,  
  add a -s flag. For illustrate, the following command assigns the partition number to the variable 
  "lpar_number" and partiton name to "lpar_name". 

  For HP-UX:
  Use commands like "parstatus" or "getconf PARTITION_IDENT" to get npar information.



patches:
--------

AIX:     Is a certain fix (APAR) installed?
         instfix -ik APAR_number
         instfix -a -ivk APAR_number
         
         To determine your platform firmware level, at the command prompt, type:

         lscfg -vp | grep -p Platform

         The last six digits of the ROM level represent the platform firmware date in the format, YYMMDD.


HP:      /usr/sbin/swlist -l patch
         swlist | grep patch
Linux:   rpm -qa
Solaris: showrev -p
         pkginfo -i package_name
Tru64:   /usr/sbin/dupatch -track -type kit




Netcards:
---------

AIX:	 lsdev -Cc adapter
         lsdev -Cc adapter | grep ent
	 lsdev -Cc if
         lsattr -E -l ent1
         ifconfig -a
Solaris: prtconf -D    /    prtconf -pv   /     prtconf | grep "card"
         prtdiag | grep "card"
         svcs -x
         ifconfig -a (up plumb)



Quickly find out who is using most memory:
------------------------------------------

See section marked &&& (use find/search on &&&)




Network sniffing:
-----------------

Here are a few short descriptions, and examples, of usefull network trace / dump commands.


-- Solaris: 

snoop command examples:

For example, if we want to observe traffic between systems alpha and beta  we can use the following command: 
# snoop alpha,beta
To enable data captures from the snoop output without losing packets while writing to the screen, send the snoop output to a file. For example:
# snoop -o /tmp/snooper -V 128.50.1.250
To snoop a specific port:
# snoop -o port xxx 


-- AIX:

tcpdump command examples:

# tcpdump port 23
# tcpdump -i en0 
A good way to use tcpdump is to save the network trace to a file with the -w flag and then analyze the trace by using different
filtering options together with the -r flag. The following example show how to run a basic tcpdump network trace, 
saving the output in a file with the -w flag (on a Ethernet network interface):
# tcpdump -w /tmp/tcpdump.en0 -i en0

To limit the number of traced packets, use the -c flag and specify the number, such as in the following example
that traces the first 128 packets (on a token-ring network interface):
# tcpdump -c 128 -w /tmp/tcpdump.tr0 -i tr0

iptrace command examples:

To start the iptrace daemon with the System Resource Controller (SRC),
# startsrc -s iptrace -a "/tmp/nettrace"

To stop the iptrace daemon with SRC enter the following:
# stopsrc -s iptrace

To record packets coming in and going out to any host on every interface, enter the command in the following format:
# iptrace /tmp/nettrace

The recorded packets are received on and sent from the local host. All
packet flow between the local host and all other hosts on any interface is
recorded. The trace information is placed into the /tmp/nettrace file.

To record packets received on an interface from a specific remote host,
enter the command in the following format:
# iptrace - i en0 -p telnet -s airmail /tmp/telnet.trace

The packets to be recorded are received on the en0 interface, from remote
hostairmail, over the telnet port. The trace information is placed into the
/tmp/telnet.trace file.

To record packets coming in and going out from a specific remote host,
enter the command in the following format:
# iptrace -i en0 -s airmail -b /tmp/telnet.trace

The packets to be recorded are received on the en0 interface, from remote
host airmail. The trace information is placed into the /tmp/telnet.trace file.


-- HPUX:

nettl command:

Initialize the tracing/logging facility:
# nettl -start
Logging is enabled for all subsystems as determined by the /etc/nettlgen.conf file. Log messages are sent 
to a log file whose name is determined by adding the suffix .LOG000 to the log file name specified
in the /etc/nettlgen.conf configuration file. 

To stop the tracing facility:
# nettl -stop

Turn on inbound and outbound PDU tracing for the transport and session (OTS/9000) subsystems
and send binary trace messages to file /var/adm/trace.TRC000. 
# nettl -traceon pduin pduout -entity transport session \ 
     -file /var/adm/trace 

Session using nettl and the formatter netfmt:
1. Capture packets
nettl -tn all -e ns_ls_ip -tm 99999 -size 1024 -f some-raw-capture-file

2. Reproduce problem.

3. Turn off trace: nettl -tf -e all

4. Create formatter filter file. Example:
filter tcp_sport 6699
filter tcp_dport 6699

5. Filter the packets:
5.1 "Long" display
netfmt -Nlnc filter-file -f some-raw.capture > formatted.out
5.2 "One-liner" display
netfmt -Nln1Tc filter-file -f some-raw.capture > one-liner.out


-- Restart inetd, nfs:
-- -------------------

Starting and stopping NFS:			
--------------------------
			
On all unixes, a number of daemons should be running in order for NFS to be functional, like for example			
the rpc.* processes, biod, nfsd and others.			
			
Once nfs is running, and in order to actually "share" or "export" your filesystem on your server, so remote clients 			
are able to mount the nfs mount, in most cases you should edit the "/etc/exports" file.			
			
-- AIX:			
The following subsystems are part of the nfs group: nfsd, biod, rpc.lockd, rpc.statd, and rpc.mountd. 			
The nfs subsystem (group) is under control of the "resource controller", so starting and stopping nfs			
is actually easy			
			
# startsrc -g nfs			
# stopsrc -g nfs			
			
Or use smitty.			
			
-- Redhat Linux:			
# /sbin/service nfs restart			
# /sbin/service nfs start			
# /sbin/service nfs stop			
			
-- On some other Linux distros			
# /etc/init.d/nfs start 			
# /etc/init.d/nfs stop			
# /etc/init.d/nfs restart			
			
-- Solaris:			
If the nfs daemons aren't running, then you will need to run:			
# /etc/init.d/nfs.server start 			
			
-- HP-UX:			
Issue the following command on the NFS server to start all the necessary NFS processes (HP): 			
# /sbin/init.d/nfs.server start 			
 			
Or if your machine is only a client:			
# cd /sbin/init.d			
# ./nfs.client start			
			
			
Restart or refresh inetd after you have edited "inetd.conf":			
------------------------------------------------------------
			
After you have edited "/etc/inetd.conf", for example, to enable or disable some service,			
you need to restart, or refresh inetd, to read the new configuration information.			
To let inetd to reread the configfile:			
			
-- AIX:			
# refresh -s inetd			
			
-- HPUX:			
# /usr/sbin/inetd -c 			
			
-- Solaris:			
# /etc/init.d/inetd stop			
# /etc/init.d/inetd start			
# pkill -HUP inetd		# The command will restart the inetd and reread the configuration.	
			
-- RedHat / Linux			
# service xinetd restart			
or			
# /etc/init.d/inetd restart			



1.2 More Detail:
================

1.2.1 Show memory in Solaris:
=============================

prtconf:
--------
Use this command to obtain detailed system information about your Sun Solaris installation
# /usr/sbin/prtconf

# prtconf -v 
Displays the size of the system memory and reports information about peripheral devices 

Use this command to see the amount of memory:
# /usr/sbin/prtconf | grep "Mem" 

sysdef -i reports on several system resource limits. Other parameters can be checked on a running system 
using adb -k :

# adb -k /dev/ksyms /dev/mem
parameter-name/D
^D (to exit) 

Other commands:
---------------

# prtmem
# memps -m



1.2.2 Show memory in AIX: 
=========================

>> Show Total memory:
--------=====--------

# bootinfo -r
# lsattr -El sys0 -a realmem 
# prtconf   (you can grep it on memory)


>> Show Details of memory:
--------------------------

You can have a more detailed and comprehensive look at AIX memory by using "vmstat -v" and "vmo -L" or "vmo -a":

For example:

# vmstat -v
               524288 memory pages
               493252 lruable pages
                67384 free pages
                    7 memory pools
               131820 pinned pages
                 80.0 maxpin percentage
                 20.0 minperm percentage
                 80.0 maxperm percentage
                 25.4 numperm percentage
               125727 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 25.4 numclient percentage
                 80.0 maxclient percentage
               125575 client pages
                    0 remote pageouts scheduled
                14557 pending disk I/Os blocked with no pbuf
              6526890 paging space I/Os blocked with no psbuf
                18631 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
                49038 external pager filesystem I/Os blocked with no fsbuf
                    0 Virtualized Partition Memory Page Faults
                 0.00 Time resolving virtualized partition memory page faults


The vmo command really gives lots of output. In the following example only a small fraction of the output is shown:

# vmo -L

..
lrubucket                 128K   128K   128K   64K           4KB pages         D
--------------------------------------------------------------------------------
maxclient%                80     80     80     1      100    % memory          D
     maxperm%
     minperm%
--------------------------------------------------------------------------------
maxfree                   1088   1088   1088   8      200K   4KB pages         D
     minfree
     memory_frames
--------------------------------------------------------------------------------
maxperm                   394596        394596                                 S
--------------------------------------------------------------------------------
maxperm%                  80     80     80     1      100    % memory          D
     minperm%
     maxclient%
--------------------------------------------------------------------------------
maxpin                    424179        424179                                 S
..
..


>> To further look at your virtual memory and its causes, you can use a combination of: 
---------------------------------------------------------------------------------------
  
# ipcs -bm               (shared memory) 
# lsps -a                (paging) 
# vmo -a  or vmo -L      (virtual memory options) 
# svmon -G               (basic memory allocations) 
# svmon -U               (virtual memory usage by user)
# svmon -P
# vmstat -v 

     To print out the memory usage statistics for the users root and steve
     taking into account only working segments, type:

     svmon -U root steve -w

     To print out the top 10 users of the paging space, type:

     svmon -U -g -t 10

     To print out the memory usage statistics for the user steve, including the
     list of the process identifiers, type:

     svmon -U steve -l
     svmon -U emcdm -l

# vmo -o npswarn=value
# schedo -o pacefork=15

Note: sysdumpdev -e
Although the sysdumpdev command is used to show or alter the dumpdevice for a system dump,
you can also use it to show how much real memory is used.

The command
# sysdumpdev -e
provides an estimated dump size taking into account the current memory (not pagingspace) currently 
in use by the system.

Note: the rmss command:

The rmss (Reduced-Memory System Simulator) command is used to ascertain the effects of reducing the amount 
of available memory on a system without the need to physically remove memory from the system. It is useful 
for system sizing, as you can install more memory than is required and then use rmss to reduce it. 
Using other performance tools, the effects of the reduced memory can be monitored. The rmss command has 
the ability to run a command multiple times using different simulated memory sizes and produce statistics 
for all of those memory sizes.

The rmss command resides in /usr/bin and is part of the bos.perf.tools fileset, which is installable 
from the AIX base installation media.

Syntax rmss -p -c <MB> -r 
Options 
  -p  Print the current value 
  -c MB Change to M size (in Mbytes) 
  -r  Restore all memory to use 
  -p  Print the current value 

Example: find out how much memory you have online
rmss -p  
Example: Change available memory to 256 Mbytes
rmss -c 256  
Example: Undo the above 
rmss -r 

Warning:

rmss can damage performance very seriously 
Don't go below 25% of the machines memory 
Never forget to finish with rmss -r 

The pstat command:
------------------

The pstat command, which displays many system tables such as a process table, inode table, or processor status table, 
The pstat command interprets the contents of the various system tables and writes it to standard output. 

Use the pstat command from the AIX 5.2 command prompt. See the command reference for details and examples, 
or use the syntax summary in the table below. 

Flags 
-a		Displays entries in the process table  
-A		Displays all entries in the kernel thread table  
-f		Displays the file table  
-i		Displays the i-node table and the i-node data block addresses 
-p		Displays the process table 
-P		Displays runnable kernel thread table entries only 
-s		Displays information about the swap or paging space usage 
-S		Displays the status of the processors 
-t		Displays the tty structures 
-u ProcSlot	Displays the user structure of the process in the designated slot of the process table. An error message is generated if you attempt to display a swapped out process. 
-T		Displays the system variables. These variables are briefly described in var.h 
-U ThreadSlot	Displays the user structure of the kernel thread in the designated slot of the kernel thread table. An error message is generated if you attempt to display a swapped out kernel thread. 



&&&
---------------------------------------------------------------------------------
Note 1: How to get a "reaonable" view on memory consumption of a process in UNIX:
---------------------------------------------------------------------------------

With using just the command line, or some free utils.


In general not so easy to answer, because of the "sub components" you might distinguish
in memory occupation. For example, do you mean RSS, real, shared, virtual, paging, including all libraries loaded, etc..?

-- Some people like to use the ps command with some special flags, like
   ps -vg
   ps auxw   # or  ps auxw | sort -r +3 |head -10 (top users)

   But those commands seems not so very satisfactory, and not "complete" in their output.

-- There are some great common utilities like topas, nmon, top etc.., or tools specific to a certain Unix, like SMC for Solaris.
   No bad word on those tools, because they are great. But some people think that they are not satisfactory 
   on the subject of memory consumption of a process (although they show a lot of other interesting information).

-- Some other ways might be:

# procmap pid      (in e.g. AIX)
# pmap -x pid      (in e.g. Solaris)

Those tools also show a "total" memory usage, which is a good indicator.

For example:
   
# pmap -x $$

492328: -ksh
 Address  Kbytes     RSS    Anon  Locked Mode   Mapped File
00010000     192     192       -       - r-x--  ksh
00040000       8       8       8       - rwx--  ksh
00042000      40      40       8       - rwx--    [ heap ]
FF180000     680     680       -       - r-x--  libc.so.1
FF23A000      24      24       -       - rwx--  libc.so.1
FF240000       8       8       8       - rwx--  libc.so.1
FF280000     576     576       -       - r-x--  libnsl.so.1
FF310000      40      40       -       - rwx--  libnsl.so.1
FF31A000      24      16       -       - rwx--  libnsl.so.1
FF350000      16      16       -       - r-x--  libmp.so.2
FF364000       8       8       -       - rwx--  libmp.so.2
FF380000      40      40       -       - r-x--  libsocket.so.1
FF39A000       8       8       -       - rwx--  libsocket.so.1
FF3A0000       8       8       -       - r-x--  libdl.so.1
FF3B0000       8       8       8       - rwx--    [ anon ]
FF3C0000     152     152       -       - r-x--  ld.so.1
FF3F6000       8       8       8       - rwx--  ld.so.1
FFBFC000      16      16       8       - rw---    [ stack ]
-------- ------- ------- ------- -------
total Kb    1856    1848      48       -

This gives you a reasonable idea on memory consumption of a pid.

You can also try:

# svmon -G
# svmon -U
# svmon -P -t 10     (top 10 users)
# svmon -U steve -l  (memory stats for user steve)

But svmon is not available on all unixes.

The following might also be helpfull (not on all unixes):

# ls -l /proc/{pid}/as
# prstat -a -s rss

And ps can give some info as well

# ps -ef | egrep -v "STIME|$LOGNAME" | sort +3 -r | head -n 15
# ps au





1.2.3 Show memory in Linux:
===========================

# /usr/sbin/dmesg | grep "Physical:"
# cat /proc/meminfo
# free -m

The ipcs, vmstat, iostat and that type of commands, are ofcourse more or less the same
in Linux as they are in Solaris or AIX.



1.2.4 Show aioservers in AIX:
=============================

# lsattr -El aio0
autoconfig available STATE to be configured at system restart True
fastpath   enable    State of fast path                       True
kprocprio  39        Server PRIORITY                          True
maxreqs    4096      Maximum number of REQUESTS               True
maxservers 10        MAXIMUM number of servers per cpu        True
minservers 1         MINIMUM number of servers                True

# pstat -a | grep -c aios
20

# ps -k | grep aioserver
  331962      -  0:15 aioserver
  352478      -  0:14 aioserver
  450644      -  0:12 aioserver
  454908      -  0:10 aioserver
  565292      -  0:11 aioserver
  569378      -  0:10 aioserver
  581660      -  0:11 aioserver
  585758      -  0:17 aioserver
  589856      -  0:12 aioserver
  593954      -  0:15 aioserver
  598052      -  0:17 aioserver
  602150      -  0:12 aioserver
  606248      -  0:13 aioserver
  827642      -  0:14 aioserver
  991288      -  0:14 aioserver
  995388      -  0:11 aioserver
 1007616      -  0:12 aioserver
 1011766      -  0:13 aioserver
 1028096      -  0:13 aioserver
 1032212      -  0:13 aioserver

What are aioservers in AIX5?:

With IO on filesystems, for example if a database is involved, you may try to tune the number
of aioservers (asynchronous IO)

AIX 5L supports asynchronous I/O (AIO) for database files created both on file system partitions and on raw devices. 
AIO on raw devices is implemented fully into the AIX kernel, and does not require database processes 
to service the AIO requests. When using AIO on file systems, the kernel database processes (aioserver) 
control each request from the time a request is taken off the queue until it completes. The kernel database 
processes are also used with I/O with virtual shared disks (VSDs) and HSDs with FastPath disabled. By default, 
FastPath is enabled. The number of aioserver servers determines the number of AIO requests that can be executed 
in the system concurrently, so it is important to tune the number of aioserver processes when using file systems 
to store Oracle Database data files. 

- Use one of the following commands to set the number of servers. This applies only when using asynchronous I/O 
on file systems rather than raw devices: 

# smit aio 

# chdev -P -l aio0 -a maxservers='128' -a minservers='20' 

- To set asynchronous IO to `Available':
# chdev -l aio0 -P -a autoconfig=available

You need to restart the Server:
# shutdown -Fr


1.2.5 aio on Linux distro's:
============================

On some Linux distro's, Oracle 9i/10g supports asynchronous I/O but it is disabled by default because 
some Linux distributions do not have libaio by default. For Solaris, the following configuration is not required 
- skip down to the section on enabling asynchronous I/O.

On Linux, the Oracle binary needs to be relinked to enable asynchronous I/O. The first thing to do is shutdown 
the Oracle server. After Oracle has shutdown, do the following steps to relink the binary:

su - oracle
cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk async_on
make -f ins_rdbms.mk ioracle



1.2.6 The ipcs and ipcrm commands:
==================================

The "ipcs" command is really a "listing" command. But if you need to intervene
in memory structures, like for example if you need to "clear" or remove a shared memory segment, 
because a faulty or crashed
application left semaphores, memory identifiers, or queues in place,
you can use to "ipcrm" command to remove those structures.

Example ipcrm command usage:
----------------------------

Suppose an application crashed, but it cannot be started again. The following might help,
if you happened to know which IPC identifier it used.
Suppose the app used 47500 as the IPC key. Calcultate this decimal number to hex
which is, in this example, B98C.

No do the following:

# ipcs -bm | grep B89C

This might give you, for example, the shared memory identifier "50855977".
Now clear the segment: 

# ipcrm -m 50855977

It might also be, that still a semaphore and/or queue is still "left over".
In that case you might also try commands like the following example:

ipcs -q
ipcs -s

# ipcrm -s 2228248    (remove semaphore)
# ipcrm -q 5111883    (remove queue)


Note: in some cases the "slibclean" command can be used to clear unused modules in kernel and library memory.
Just give as root the command:

# slibclean

Other Example:
--------------

If you run the following command to remove a shared memory segment and you get this error:

# ipcrm -m 65537
ipcrm: 0515-020 shmid(65537) was not found.

However, if you run the ipcs command, you still see the segment there:

# ipcs | grep 65537
m 65537 0x00000000 DCrw------- root system

If you look carefully, you will notice the "D" in the forth column. The "D" means:

D If the associated shared memory segment has been removed. It disappears when the last process attached 
to the segment detaches it.

So, to clear the shared memory segment, find the process which is still associated with the segment:

# ps -ef | grep process_owner

where process_owner is the name of the owner using the shared segment 

Now kill the process found from the ps command above

# kill -9 pid

Running another ipcs command will show the shared memory segment no longer exists:

# ipcs | grep 65537 
Example

ipcrm -m 65537 




1.2.7 Show patches, version, systeminfo:
========================================

Solaris:
========

showrev:
--------

#showrev
Displays system summary information.

#showrev -p
Reports which patches are installed 

sysdef and dmesg:
-----------------

The follwing commands also displays configuration information
# sysdef
# dmesg


versions:
---------

==> To check your Solaris version:
# uname -a or uname -m
# cat /etc/release 
# isainfo -v

==> To check your AIX version:

# oslevel
# oslevel -r    tells you which maintenance level you have.

>> To find the known recommended maintenance levels:
# oslevel -rq

>> To find all filesets lower than a certain maintenance level:
# oslevel -rl 5200-06

>> To find all filesets higher than a certain maintenance level:
# oslevel -rg 5200-05

>> To list all known recommended maintenance and technology levels on the system, type:

# oslevel -q -s
Known Service Packs
-------------------
5300-05-04
5300-05-03
5300-05-02
5300-05-01
5300-05-00
5300-04-CSP
5300-04-03
5300-04-02
5300-04-01
5300-03-CSP

>> Example:
5300-02 is TL 02
5300-02-04 is TL 02 and SP 04
5300-02-CSP is TL 02 and CSP for TL 02 
(and there won't be anymore SPs because when you see a CSP it is because the next TL has been released.  
In this case it would be TL 03).

>> How can I determine which fileset updates are missing from a particular AIX level?
To determine which fileset updates are missing from 5300-04, for example, run the following command:

# oslevel -rl 5300-04 

>> What SP (Service Pack) is installed on my system?
To see which SP is currently installed on the system, run the oslevel -s command. Sample output for an 
AIX 5L Version 5.3 system, with TL4, and SP2 installed would be:

# oslevel -s
5300-04-02
			 
>> Is a CSP (Concluding Service Pack) installed on my system?
To see if a CSP is currently installed on the system, run the oslevel -s command. 
Sample output for an AIX 5L Version 5.3 system, with TL3, and CSP installed would be:

# oslevel -s
5300-03-CSP
 



==> To check your HP machine:

# model
9000/800/rp7410


: machine info on AIX

How do I find out the Chip type, System name, Node name, Model Number etc.? 

The uname command provides details about your system. uname -p  Displays the chip type of the system. 
For example, powerpc. 

uname -r  Displays the release number of the operating system. 
uname -s  Displays the system name. For example, AIX. 
uname -n  Displays the name of the node.  
uname -a  Displays the system name, nodename,Version, Machine id. 
uname -M  Displays the system model name. For example, IBM, 7046-B50. 
uname -v  Displays the operating system version 
uname -m  Displays the machine ID number of the hardware running the system. 
uname -u  Displays the system ID number.  

Architecture:
-------------

To see if you have a CHRP machine, log into the machine as the root user, and run the following command:

# lscfg | grep Architecture               or use:
# lscfg -pl sysplanar0 | more

The bootinfo -p command also shows the architecture of the pSeries, RS/6000

# bootinfo -p
chrp


1.2.8 Check whether you have a 32 bit or 64 bit version:
========================================================

- Solaris:

# iasinfo -vk

If /usr/bin/isainfo cannot be found, then the OS only 
supports 32-bit process address spaces. (Solaris 7 
was the first version that could run 64-bit binaries 
on certain SPARC-based systems.) 
So a ksh-based test might look something like

if [ -x /usr/bin/isainfo ]; then
bits=`/usr/bin/isainfo -b`
else
bits=32
fi

- AIX:

Command:        /bin/lslpp -l bos.64bit     ...to see if bos.64bit is installed & committed.           
        -or-    /bin/locale64               ...error message if on 32bit machine such as:           
                                               Could not load program /bin/locale64: 
                                               Cannot run a 64-bit program on a 32-bit machine.      

Or use:

# bootinfo -K         displays the current kernel wordsize of "32" or "64"
# bootinfo -y         tells if hardware is 64-bit capable
# bootinfo -p         If it returns the string 32 it is only capable of running the 
                      32-bit kernel. If it returns the string chrp the machine is 
                      capable of running the 64-bit kernel or the 32-bit kernel.
Or use:

# /usr/bin/getconf HARDWARE_BITMODE

This command should return the following output:

64



Note:
-----

  HOW TO CHANGE KERNEL MODE OF IBM AIX 5L (5.1)
  ---------------------------------------------
 
  The AIX 5L has pre-configured kernels. These are listed below for Power 
  processors:

     /usr/lib/boot/unix_up    32 bit uni-processor
     /usr/lib/boot/unix_mp    32 bit multi-processor kernel 
     /usr/lib/boot/unix_64    64 bit multi-processor kernel

  Switching between kernel modes means using different kernels. This is simply
  done by pointing the location that is referenced by the system to these kernels.
  Use symbolic links for this purpose. During boot AIX system runs the kernel
  in the following locations:

     /unix
     /usr/lib/boot/unix

  The base operating system 64-bit runtime fileset is bos.64bit. Installing bos.64bit also installs 
  the /etc/methods/cfg64 file. The /etc/methods/cfg64 file provides the option of enabling or disabling 
  the 64-bit environment via SMIT, which updates the /etc/inittab file with the load64bit line. 
  (Simply adding the load64bit line does not enable the 64-bit environment).

  The command lslpp -l bos.64bit reveals if this fileset is installed. The bos.64bit fileset 
  is on the AIX media; however, installing the bos.64bit fileset does not ensure that you will be able 
  to run 64-bit software. If the bos.64bit fileset is installed on 32-bit hardware, you should be able 
  to compile 64-bit software, but you cannot run 64-bit programs on 32-bit hardware.

  The syscalls64 extension must be loaded in order to run a 64-bit executable. This is done from 
  the load64bit entry in the inittab file. You must load the syscalls64 extension even when running 
  a 64-bit kernel on 64-bit hardware.

  To determine if the 64-bit kernel extension is loaded, at the command line, enter genkex |grep 64.
  Information similar to the following displays: 
  149bf58 a3ec /usr/lib/drivers/syscalls64.ext


  To change the kernel mode follow steps below:

     1. Create symbolic link from /unix and /usr/lib/boot/unix to the location 
        of the desired kernel.
     2. Create boot image.
     3. Reboot AIX.

  Below lists the detailed actions to change kernel mode:

  To change to 32 bit uni-processor mode:

     # ln -sf /usr/lib/boot/unix_up  /unix
     # ln -sf /usr/lib/boot/unix_up  /usr/lib/boot/unix
     # bosboot -ad /dev/ipldevice
     # shutdown -r

  To change to 32 bit multi-processor mode:
  
     # ln -sf /usr/lib/boot/unix_mp  /unix
     # ln -sf /usr/lib/boot/unix_mp  /usr/lib/boot/unix
     # bosboot -ad /dev/ipldevice
     # shutdown -r

  To change to 64 bit multi-processor mode:

     # ln -sf /usr/lib/boot/unix_64  /unix
     # ln -sf /usr/lib/boot/unix_64  /usr/lib/boot/unix
     # bosboot -ad /dev/ipldevice
     # shutdown -r

  IMPORTANT NOTE: If you are changing the kernel mode to 32-bit and you will run 
  9.2 on this server, the following line should be included in /etc/inittab:

     load64bit:2:wait:/etc/methods/cfg64 >/dev/console 2>&1 # Enable 64-bit execs

  This allows 64-bit applications to run on the 32-bit kernel. Note that this 
  line is also mandatory if you are using the 64-bit kernel.


In AIX 5.2, the 32-bit kernel is installed by default. The 64-bit kernel, along with JFS2 
(enhanced journaled file system), can be enabled at installation time.


Checking if other unixes are in 32 or 64 mode:
----------------------------------------------

- Digital UNIX/Tru64:    This OS is only available in 64bit form.    

- HP-UX(Available in 64bit starting with HP-UX 11.0):  
  Command: /bin/getconf KERNEL_BITS    ...returns either 32 or 64   

- SGI:  This OS is only available in 64bit form.  

- The remaining supported UNIX platforms are only available in 32bit form.   


scinstall:
----------

# scinstall -pv 
Displays Sun Cluster software release and package version information 


1.2.9 Info about CPUs:
======================

Solaris:
--------

# psrinfo -v
Shows the number of processors and their status.

# psrinfo -v|grep "Status of processor"|wc -l
Shows number of cpu's

Linux:
------

# cat /proc/cpuinfo
# cat /proc/cpuinfo | grep processor|wc -l

Especially with Linux, the /proc directory contains special "files" that either extract information from 
or send information to the kernel

HP-UX:
------

# ioscan -kfnC processor
# /usr/sbin/ioscan -kf | grep processor
# grep processor /var/adm/syslog/syslog.log
# /usr/contrib/bin/machinfo   (Itanium)

Several ways as,

1. sam -> performance monitor -> processor
2. print_manifest (if ignite-ux installed)
3. machinfo (11.23 HP versions)
4. ioscan -fnC processor
5. echo "processor_count/D" | adb /stand/vmunix /dev/kmem
6. top command to get cpu count

The "getconf" command can give you a lot of interesting info. The parameters are:

          ARG_MAX                _BC_BASE_MAX              BC_DIM_MAX
           BS_SCALE_MAX          BC_STRING_MAX             CHARCLASS_NAME_MAX
           CHAR_BIT              CHAR_MAX                  CHAR_MIN
           CHILD_MAX             CLK_TCK                   COLL_WEIGHTS_MAX
           CPU_CHIP_TYPE         CS_MACHINE_IDENT          CS_PARTITION_IDENT
           CS_PATH               CS_MACHINE_SERIAL         EXPR_NEST_MAX
           HW_CPU_SUPP_BITS      HW_32_64_CAPABLE          INT_MAX
           INT_MIN               KERNEL_BITS               LINE_MAX
           LONG_BIT              LONG_MAX                  LONG_MIN
           MACHINE_IDENT         MACHINE_MODEL             MACHINE_SERIAL
           MB_LEN_MAX            NGROUPS_MAX               NL_ARGMAX
           NL_LANGMAX            NL_MSGMAX                 NL_NMAX
           NL_SETMAX             NL_TEXTMAX                NZERO
           OPEN_MAX              PARTITION_IDENT           PATH
           _POSIX_ARG_MAX        _POSIX_JOB_CONTROL        _POSIX_NGROUPS_MAX
           _POSIX_OPEN_MAX       _POSIX_SAVED_IDS          _POSIX_SSIZE_MAX
           _POSIX_STREAM_MAX     _POSIX_TZNAME_MAX         _POSIX_VERSION
           POSIX_ARG_MAX         POSIX_CHILD_MAX           POSIX_JOB_CONTROL
           POSIX_LINK_MAX        POSIX_MAX_CANON           POSIX_MAX_INPUT
           POSIX_NAME_MAX        POSIX_NGROUPS_MAX         POSIX_OPEN_MAX
           POSIX_PATH_MAX        POSIX_PIPE_BUF            POSIX_SAVED_IDS
           POSIX_SSIZE_MAX       POSIX_STREAM_MAX          POSIX_TZNAME_MAX
           POSIX_VERSION         POSIX2_BC_BASE_MAX        POSIX2_BC_DIM_MAX
           POSIX2_BC_SCALE_MAX   POSIX2_BC_STRING_MAX      POSIX2_C_BIND
           POSIX2_C_DEV          POSIX2_C_VERSION          POSIX2_CHAR_TERM
           POSIX_CHILD_MAX       POSIX2_COLL_WEIGHTS_MAX   POSIX2_EXPR_NEST_MAX
           POSIX2_FORT_DEV       POSIX2_FORT_RUN           POSIX2_LINE_MAX
           POSIX2_LOCALEDEF      POSIX2_RE_DUP_MAX         POSIX2_SW_DEV
           POSIX2_UPE            POSIX2_VERSION            SC_PASS_MAX
           SC_XOPEN_VERSION      SCHAR_MAX                 SCHAR_MIN
           SHRT_MAX              SHRT_MIN                  SSIZE_MAX

Example:

# getconf CPU_VERSION


sample function in shell script:

get_cpu_version() 
{

   case `getconf CPU_VERSION` in
      # ???) echo "Itanium[TM] 2" ;;
      768) echo "Itanium[TM] 1" ;;
      532) echo "PA-RISC 2.0" ;;
      529) echo "PA-RISC 1.2" ;;
      528) echo "PA-RISC 1.1" ;;
      523) echo "PA-RISC 1.0" ;;
        *) return 1 ;;
   esac
   return 0



AIX:
----

# pmcycles -m
Cpu 0 runs at 1656 MHz
Cpu 1 runs at 1656 MHz
Cpu 2 runs at 1656 MHz
Cpu 3 runs at 1656 MHz


# lscfg | grep proc

More cpu information on AIX:

# lsattr -El procx        (where x is the number of the cpu)
type powerPC_POWER5     Processor type     False
frequency 165600000     Processor speed    False
..
..
where False means that the value cannot be changed through an AIX command.


# lparstat              (only for latest AIX versions)
# lparstat -i


To view CPU scheduler tunable parameters, use the schedo command:

# schedo -a

In AIX 5L on Power5, you can switch from Simultaneous Multithreading SMT, or Single Threading ST, as follows
(smtcl)
# smtctl -m off		will set SMT mode to disabled
# smtctl -m on		will set SMT mode to enabled
# smtctl -W boot	makes SMT effective on next boot
# smtctl -W now		effects SMT now, but will not persist across reboots

When you want to keep the setting across reboots, you must use the bosboot command
in order to create a new boot image.


1.2.10 Other stuff:
===================

runlevel:
---------
To show the init runlevel:
# who -r 


Top users:
----------

To get a quick impression about the top 10 users in the system at this time:

ps auxw | sort -r +3 |head -10    -Shows top 10 memory usage by process
ps auxw | sort -r +2 |head -10    -Shows top 10 CPU usage by process


More accuracy in memory usage with the ps command: ps -vg

ps -vg:
-------

Using "ps vg" gives a per process tally of memory usage for each running process. Several fields give memory usage 
in different units, but these numbers do not tell the whole story on where all the memory goes. 

First of all, the man page for ps does not give an accurate description of the memory related fields. 
Here is a better description: 

RSS - This tells how much RAM resident memory is currently being used for the text and data segments 
for a particular process in units of kilobytes. (this value will always be a multiple of 4 since memory is allocated in 4 KB pages). 

%MEM - This is the fraction of RSS divided by the total size of RAM for a particular process. 
Since RSS is some subset of the total resident memory usage for a process, the %MEM value will also be lower than actual. 

TRS - This tells how much RAM resident memory is currently being used for the text segment for a particular process 
in units of kilobytes. This will always be less than or equal to RSS. 

SIZE - This tells how much paging space is allocated for this process for the text and data segments in units 
of kilobytes. If the executable file is on a local filesystem, the page space usage for text is zero. 
If the executable is on an NFS filesystem, the page space usage will be nonzero. This number may be greater 
than RSS, or it may not, depending on how much of the process is paged in. The reason RSS can be larger is that 
RSS counts text whereas SIZE does not. 

TSIZ - This field is absolutely bogus because it is not a multiple of 4 and does not correlate to any of the other fields. 

These fields only report on a process text and data segments. Segment size which cannot be interrogated at this time are: 

Text portion of shared libraries (segment 13)
Files that are in use. Open files are cached in memory as individual segments. 


Shared data segments created with shmat. 
Kernel segments such as kernel segment 0, kernel extension segments, 
and virtual memory management segments. 

In summary, ps is not a very good tool to measure system memory usage. It can give you some idea where some 
of the memory goes, but it leaves too many questions unanswered about the total usage. 





shared memory:
--------------
To check shared memory segment, semaphore array, and message queue limits, issue the ipcs -l command. 
# ipcs

The following tools are available for monitoring the performance of your UNIX-based system. 

pfiles:
-------
/usr/proc/bin/pfiles
This shows the open files for this process, which helps you diagnose whether you are having problems 
caused by files not getting closed.

lsof:
-----

This utility lists open files for running UNIX processes, like pfiles. However, lsof gives more 
useful information than pfiles. You can find lsof at ftp://vic.cc.purdue.edu/pub/tools/unix/lsof/.

Example of lsof usage:

You can see CIO (concurrent IO) in the FILE-FLAG column if you run lsof +fg, e.g.:
 
tarunx01:/home/abielewi:# /p570build/LSOF/lsof-4.76/usr/local/bin/lsof +fg /baanprd/oradat

COMMAND     PID     USER   FD   TYPE              FILE-FLAG DEVICE
SIZE/OFF NODE NAME
oracle   434222   oracle   16u  VREG     R,W,CIO,DSYN,LG;CX   39,1
6701056  866 /baanprd/oradat (/dev/bprdoradat)
oracle   434222   oracle   17u  VREG     R,W,CIO,DSYN,LG;CX   39,1
6701056  867 /baanprd/oradat (/dev/bprdoradat)
oracle   442384   oracle   15u  VREG     R,W,CIO,DSYN,LG;CX   39,1
1174413312  875 /baanprd/oradat (/dev/bprdoradat)
oracle   442384   oracle   16u  VREG     R,W,CIO,DSYN,LG;CX   39,1
734011392  877 /baanprd/oradat (/dev/bprdoradat)
oracle   450814   oracle   15u  VREG     R,W,CIO,DSYN,LG;CX   39,1
1174413312  875 /baanprd/oradat (/dev/bprdoradat)
oracle   450814   oracle   16u  VREG     R,W,CIO,DSYN,LG;CX   39,1
1814044672  876 /baanprd/oradat (/dev/bprdoradat)
oracle   487666   oracle   15u  VREG     R,W,CIO,DSYN,LG;CX   39,1
1174413312  875 /baanprd/oradat (/dev/bprdoradat
 
You should also see O_CIO in your file open calls if you run truss,
e.g.:
 
open("/opt/oracle/rcat/oradat/redo01.log",
O_RDWR|O_CIO|O_DSYNC|O_LARGEFILE) = 18
 



VMSTAT SOLARIS:
---------------
# vmstat 
This command is ideal for monitoring paging rate, which can be found under the page in (pi) and page out (po) columns. 
Other important columns are the amount of allocated virtual storage (avm) and free virtual storage (fre). 
This command is useful for determining if something is suspended or just taking a long time.

Example:

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr m0 m1 m3 m4   in   sy   cs us sy id
 0 0 0 2163152 1716720 157 141 1179 1 1 0 0 0  0  0  0  680 1737  855 10  3 87
 0 0 0 2119080 1729352 0  1  0  0  0  0  0  0  0  1  0  345  658  346  1  1 98
 0 0 0 2118960 1729232 0 167 0  0  0  0  0  0  0  0  0  402 1710  812  4  2 94
 0 0 0 2112992 1723264 0 1261 0 0  0  0  0  0  0  0  0 1026 5253 1848 10  5 85
 0 0 0 2112088 1722352 0 248 0  0  0  0  0  0  0  0  0  505 2822 1177  5  2 92
 0 0 0 2116288 1726544 4 80  0  0  0  0  0  0  0  0  0  817 4015 1530  6  4 90
 0 0 0 2117744 1727960 4  2 30  0  0  0  0  0  0  0  0  473 1421  640  2  2 97


procs/r: Run queue length. 
procs/b: Processes blocked while waiting for I/O. 
procs/w: Idle processes which have been swapped. 
memory/swap: Free, unreserved swap space (Kb). 
memory/free: Free memory (Kb). (Note that this will grow until it reaches lotsfree, at which point 
            the page scanner is started. See "Paging" for more details.) 
page/re: Pages reclaimed from the free list. (If a page on the free list still contains data needed 
         for a new request, it can be remapped.) 
page/mf: Minor faults (page in memory, but not mapped). (If the page is still in memory, a minor fault 
         remaps the page. It is comparable to the vflts value reported by sar -p.) 
page/pi: Paged in from swap (Kb/s). (When a page is brought back from the swap device, the process 
         will stop execution and wait. This may affect performance.) 
page/po: Paged out to swap (Kb/s). (The page has been written and freed. This can be the result of 
         activity by the pageout scanner, a file close, or fsflush.) 
page/fr: Freed or destroyed (Kb/s). (This column reports the activity of the page scanner.) 
page/de: Freed after writes (Kb/s). (These pages have been freed due to a pageout.) 
page/sr: Scan rate (pages). Note that this number is not reported as a "rate," but as a total number of pages scanned. 
disk/s#: Disk activity for disk # (I/O's per second). 
faults/in: Interrupts (per second). 
faults/sy: System calls (per second). 
faults/cs: Context switches (per second). 
cpu/us: User CPU time (%). 
cpu/sy: Kernel CPU time (%). 
cpu/id: Idle + I/O wait CPU time (%). 

When analyzing vmstat output, there are several metrics to which you should pay attention. For example, 
keep an eye on the CPU run queue column. The run queue should never exceed the number of CPUs on the server. 
If you do notice the run queue exceeding the amount of CPUs, it's a good indication that your server 
has a CPU bottleneck.
To get an idea of the RAM usage on your server, watch the page in (pi) and page out (po) columns 
of vmstat's output. By tracking common virtual memory operations such as page outs, you can infer 
the times that the Oracle database is performing a lot of work. Even though UNIX page ins must correlate 
with the vmstat's refresh rate to accurately predict RAM swapping, plotting page ins can tell you 
when the server is having spikes of RAM usage.

Once captured, it's very easy to take the information about server performance directly from the 
Oracle tables and plot them in a trend graph. Rather than using an expensive statistical package 
such as SAS, you can use Microsoft Excel. Copy and paste the data from the tables into Excel. 
After that, you can use the Chart Wizard to create a line chart that will help you view server 
usage information and discover trends.


# VMSTAT AIX:
-------------

This is virtually equal to the usage of vmstat under solaris.

vmstat can be used to give multiple statistics on the system. For CPU-specific work, try the following command:

# vmstat -t 1 3 

This will take 3 samples, 1 second apart, with timestamps (-t). You can, of course, change the parameters 
as you like. The output is shown below. 

      kthr     memory             page              faults        cpu        time
      ----- ----------- ------------------------ ------------ ----------- --------
       r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa hr mi se
       0  0 45483   221   0   0   0   0    1   0 224  326 362 24  7 69  0 15:10:22
       0  0 45483   220   0   0   0   0    0   0 159   83  53  1  1 98  0 15:10:23
       2  0 45483   220   0   0   0   0    0   0 145  115  46  0  9 90  1 15:10:24


In this output some of the things to watch for are: 

"avm", which is Active Virtual Memory.
Ideally, under normal conditions, the largest avm value should in general be smaller than the amount of RAM.
If avm is smaller than RAM, and still exessive paging occurs, that could be due to RAM being filled
with file pages.

avm x 4K = number of bytes


Columns r (run queue) and b (blocked) start going up, especially above 10. This usually is an indication 
that you have too many processes competing for CPU. 

If cs (contact switches) go very high compared to the number of processes, then you may need to tune 
the system with vmtune. 

In the cpu section, us (user time) indicates the time is being spent in programs. Assuming Java is 
at the top of the list in tprof, then you need to tune the Java application). 

In the cpu section, if sys (system time) is higher than expected, and you still have id (idle) time left, 
this may indicate lock contention. Check the tprof for lock related calls in the kernel time. You may want 
to try multiple instances of the JVM. It may also be possible to find deadlocks in a javacore file. 

In the cpu section, if wa (I/O wait) is high, this may indicate a disk bottleneck, and you should use 
iostat and other tools to look at the disk usage. 

Values in the pi, po (page in/out) columns are non-zero may indicate that you are paging and need more memory. 
It may be possible that you have the stack size set too high for some of your JVM instances. 
It could also mean that you have allocated a heap larger than the amount of memory on the system. Of course, 
you may also have other applications using memory, or that file pages may be taking up too much of the memory


Other example:
--------------

# vmstat 1

System configuration: lcpu=2 mem=3920MB

kthr    memory                page              faults          cpu    
-----  -----------    ------------------------ ------------  -----------
r  b    avm   fre    re  pi  po  fr   sr  cy  in   sy  cs   us sy id wa
0  0  229367 332745   0   0   0   0    0   0   3  198  69    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0   3   33  66    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0   2   33  68    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0  80  306 100    0  1 97  1
0  0  229367 332745   0   0   0   0    0   0   1   20  68    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0   2   36  64    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0   2   33  66    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0   2   21  66    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0   1  237  64    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0   2   19  66    0  0 99  0
0  0  229367 332745   0   0   0   0    0   0   6   37  76    0  0 99  0
 


The most important fields to look at here are:

r -- The average number of runnable kernel threads over whatever sampling interval you have chosen. 
b -- The average number of kernel threads that are in the virtual memory waiting queue over your sampling interval. r should always be higher than b; if it is not, it usually means you have a CPU bottleneck. 
fre -- The size of your memory free list. Do not worry so much if the amount is really small. More importantly, determine if there is any paging going on if this amount is small. 
pi -- Pages paged in from paging space. 
po -- Pages paged out to paging space. 
CPU section:
us 
sy 
id 
wa 

Let's look at the last section, which also comes up in most other CPU monitoring tools, albeit with different headings:

us -- user time 
sy -- system time 
id -- idle time 
wa -- waiting on I/O 



# IOSTAT:
---------
This command is useful for monitoring I/O activities. You can use the read and write rate to estimate the 
amount of time required for certain SQL operations (if they are the only activity on the system). 
This command is also useful for determining if something is suspended or just taking a long time. 

Basic synctax is iostat  <options>   interval  count

option - let you specify the device for which information is needed like disk , 
         cpu or terminal. (-d , -c , -t  or -tdc ) .  x options gives the extended statistics .

interval -  is time period in seconds between two samples . iostat  4  will give data at each 4 seconds interval.

count  - is the number of times the data is needed .  iostat 4 5 will give data at 4 seconds interval 5 times.

Example:

$ iostat -xtc 5 2
                          extended disk statistics       tty         cpu
     disk r/s  w/s Kr/s Kw/s wait actv svc_t  %w  %b  tin tout us sy wt id
     sd0   2.6 3.0 20.7 22.7 0.1  0.2  59.2   6   19   0   84  3  85 11 0
     sd1   4.2 1.0 33.5  8.0 0.0  0.2  47.2   2   23
     sd2   0.0 0.0  0.0  0.0 0.0  0.0   0.0   0    0
     sd3  10.2 1.6 51.4 12.8 0.1  0.3  31.2   3   31
 
disk    name of the disk
r/s     reads per second
w/s     writes per second
Kr/s    kilobytes read per second
Kw/s    kilobytes written per second
wait    average number of transactions waiting for service (Q length)
actv    average number of transactions  actively  
        being serviced (removed  from  the queue but not yet completed)
%w      percent of time there are transactions  waiting for service (queue non-empty)
%b      percent of time the disk is busy  (transactions in progress)

The values to look from the iostat output  are:

Reads/writes  per second (r/s , w/s) 
Percentage busy (%b) 
Service time (svc_t) 
If a disk shows consistently high reads/writes along with , the percentage busy (%b) of the disks 
is greater than 5 percent, and the average service time  (svc_t) is greater than 30 milliseconds, 
then action needs to be taken.


# netstat 
This command lets you know the network traffic on each node, and the number of error packets encountered. 
It is useful for isolating network problems. 

Example:

To find out all listening services, you can use the command

# netstat -a -f inet




1.2.11 Some other utilities for Solaris:
========================================

# top
For example:

load averages:  0.66,  0.54,  0.56   11:14:48
187 processes: 185 sleeping, 2 on cpu
CPU states:     % idle,     % user,     % kernel,     % iowait,     % swap
Memory: 4096M real, 1984M free, 1902M swap in use, 2038M swap free

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 2795 oraclown   1  59    0  265M  226M sleep   0:13  4.38% oracle
 2294 root      11  59    0 8616K 7672K sleep  10:54  3.94% bpbkar
13907 oraclown  11  59    0  271M  218M cpu2    4:02  2.23% oracle
14138 oraclown  12  59    0  270M  230M sleep   9:03  1.76% oracle
 2797 oraclown   1  59    0  189M  151M sleep   0:01  0.96% oracle
 2787 oraclown  11  59    0  191M  153M sleep   0:06  0.69% oracle
 2799 oraclown   1  59    0  190M  151M sleep   0:02  0.45% oracle
 2743 oraclown  11  59    0  191M  155M sleep   0:25  0.35% oracle
 2011 oraclown  11  59    0  191M  149M sleep   2:50  0.27% oracle
 2007 oraclown  11  59    0  191M  149M sleep   2:22  0.26% oracle
 2009 oraclown  11  59    0  191M  149M sleep   1:54  0.20% oracle
 2804 oraclown   1  51    0 1760K 1296K cpu2    0:00  0.19% top
 2013 oraclown  11  59    0  191M  148M sleep   0:36  0.14% oracle
 2035 oraclown  11  59    0  191M  149M sleep   2:44  0.13% oracle
  114 root      10  59    0 5016K 4176K sleep  23:34  0.05% picld

Process ID
This column shows the process ID (pid) of each process. The process ID is a positive number, 
usually less than 65536. It is used for identification during the life of the process. 
Once a process has exited or been killed, the process ID can be reused. 

Username
This column shows the name of the user who owns the process. The kernel stores this information 
as a uid, and top uses an appropriate table (/etc/passwd, NIS, or NIS+) to translate this uid in to a name. 

Threads
This column displays the number of threads for the current process. This column is present only 
in the Solaris 2 port of top.
For Solaris, this number is actually the number of lightweight processes (lwps) created by the 
threads package to handle the threads. Depending on current resource utilization, there may not 
be one lwp for every thread. Thus this number is actually less than or equal to the total number 
of threads created by the process. 

Nice
This column reflects the "nice" setting of each process. A process's nice is inhereted from its parent. 
Most user processes run at a nice of 0, indicating normal priority. Users have the option of starting 
a process with a positive nice value to allow the system to reduce the priority given to that process. 
This is normally done for long-running cpu-bound jobs to keep them from interfering with 
interactive processes. The Unix command "nice" controls setting this value. Only root can set 
a nice value lower than the current value. Nice values can be negative. On most systems they range from -20 to 20.
The nice value influences the priority value calculated by the Unix scheduler. 

Size
This column shows the total amount of memory allocated by each process. This is virtual memory 
and is the sum total of the process's text area (program space), data area, and dynamically 
allocated area (or "break"). When a process allocates additional memory with the system call "brk", 
this value will increase. This is done indirectly by the C library function "malloc". 
The number in this column does not reflect the amount of physical memory currently in use by the process. 

Resident Memory
This column reflects the amount of physical memory currently allocated to each process. 
This is also known as the "resident set size" or RSS. A process can have a large amount 
of virtual memory allocated (as indicated by the SIZE column) but still be using very little physical memory. 

Process State
This column reflects the last observed state of each process. State names vary from system to system. 
These states are analagous to those that appear in the process states line: the second line of the display. 
The more common state names are listed below.
cpu   - Assigned to a CPU and currently running 
run   - Currently able to run 
sleep - Awaiting an external event, such as input from a device 
stop  - Stopped by a signal, as with control Z 
swap  - Virtual address space swapped out to disk 
zomb  - Exited, but parent has not called "wait" to receive the exit status 

CPU Time
This column displayes the accumulated CPU time for each process. This is the amount of time 
that any cpu in the system has spent actually running this process. The standard format shows 
two digits indicating minutes, a colon, then two digits indicating seconds. 
For example, the display "15:32" indicates fifteen minutes and thirty-two seconds. 
When a time value is greater than or equal to 1000 minutes, it is displayed as hours with the suffix H. 
For example, the display "127.4H" indicates 127 hours plus four tenths of an hour (24 minutes). 
When the number of hours exceeds 999.9, the "H" suffix is dropped so that the display 
continues to fit in the column. 

CPU Percentage
This column shows the percentage of the cpu that each process is currently consuming. 
By default, top will sort this column of the output.
Some versions of Unix will track cpu percentages in the kernel, as the figure is used in the calculation 
of a process's priority. On those versions, top will use the figure as calculated by the kernel. 
Other versions of Unix do not perform this calculation, and top must determine the percentage explicity 
by monitoring the changes in cpu time.
On most multiprocessor machines, the number displayed in this column is a percentage of the total 
available cpu capacity. Therefore, a single threaded process running on a four processor system will never 
use more than 25% of the available cpu cycles. 

Command
This column displays the name of the executable image that each process is running. 
In most cases this is the base name of the file that was invoked with the most recent kernel "exec" call. 
On most systems, this name is maintained separately from the zeroth argument. A program that changes 
its zeroth argument will not affect the output of this column. 



# modinfo
The modinfo command provides information about the modules currently loaded by the kernel.

The /etc/system  file:
Available for Solaris Operating Environment, the /etc/system file contains definitions for kernel configuration limits 
such as the maximum number of users allowed on the system at a time, the maximum number of processes per user, 
and the inter-process communication (IPC) limits on size and number of resources. These limits are important because 
they affect DB2 performance on a Solaris Operating Environment machine. See the Quick Beginnings information 
for further details. 

# more /etc/path_to_inst
To see the mapping between the kernel abbreviated instance name for physical device names,
view the /etc/path_to_inst file.

# uptime
uptime - show how long the system has been up

/export/home/oraclown>uptime
 11:32am  up  4:19,  1 user,  load average: 0.40, 1.17, 0.90


1.2.12 proc toos for Solaris:
=============================

The proc tools are called that way, because the retreive information fromn the /proc virtual filesystem
They are:

/usr/proc/bin/pflags  [-r] pid...
/usr/proc/bin/pcred   pid...
/usr/proc/bin/pmap    [-rxlF] pid...
/usr/proc/bin/pldd    [-F] pid...
/usr/proc/bin/psig    pid...
/usr/proc/bin/pstack  [-F] pid...
/usr/proc/bin/pfiles  [-F] pid...
/usr/proc/bin/pwdx    [-F] pid...
/usr/proc/bin/pstop   pid...
/usr/proc/bin/prun    pid...
/usr/proc/bin/pwait   [-v] pid...
/usr/proc/bin/ptree   [-a] [[pid| user]...]
/usr/proc/bin/ptime   command [arg...]
/usr/proc/bin/pattr   [-x ] [pid...]
/usr/proc/bin/pclear  [pid...]
/usr/proc/bin/plabel  [pid...]
/usr/proc/bin/ppriv   [-a] [pid...]



-- pfiles:
reports all the files which are opened by a given pid

-- pldd 
lists all the dynamic libraries linked to the process

-- pwdx 
gives the directory from which the process is running

-- ptree
The ptree utility prints the process trees containing the specified pids or users, with child processes 
indented from their respective parent processes. An argument of all digits is taken to be a process-ID, 
otherwise it is assumed to be a user login name. The default is all processes.


Use it like 


# ptree <PID>


Or use it with params, which enables you to produce different listings

The following example prints the process tree (including children of process 0) for processes which match the command name ssh: 

$ ptree -a `pgrep ssh`
        1     /sbin/init
          100909 /usr/lib/ssh/sshd
            569150 /usr/lib/ssh/sshd
              569157 /usr/lib/ssh/sshd
                569159 -ksh
                  569171 bash
                    569173 /bin/ksh
                      569193 bash 

  ----------------------------------------------------------------------
  Remark: many Linux distros adopted the ptree command, as the "pstree" command.
  As in

  ubuntu$ pstree -pl
  init(1)---NetworkManager(5427)
          +-NetworkManagerD(5441)
          +-acpid(5210)
          +-apache2(6966)---apache2(2890)
          İ               +-apache2(2893)
          İ               +-apache2(7163)
          İ               +-apache2(7165)
          İ               +-apache2(7166)
          İ               +-apache2(7167)
          İ               +-apache2(7168)
          +-atd(6369)
          +-avahi-daemon(5658)---avahi-daemon(5659)
          +-bonobo-activati(7816)---{bonobo-activati}(7817)
         etc..
         ..

  ------------------------------------------------------------------------

Back to Solaris again:

Suppose you did a pfiles on an Apache process:

# pfiles 13789

13789: /apps11i/erpdev/10GAS/Apache/Apache/bin/httpd -d /apps11i/erpdev/10G
Current rlimit: 1024 file descriptors
0: S_IFIFO mode:0000 dev:350,0 ino:114723 uid:65060 gid:54032 size:301
O_RDWR
1: S_IFREG mode:0640 dev:307,28001 ino:612208 uid:65060 gid:54032 size:386
O_WRONLY|O_APPEND|O_CREAT
/apps11i/erpdev/10GAS/opmn/logs/HTTP_Server~1
2: S_IFIFO mode:0000 dev:350,0 ino:143956 uid:65060 gid:54032 size:0
O_RDWR
3: S_IFREG mode:0600 dev:307,28001 ino:606387 uid:65060 gid:54032 size:1056768
O_RDWR|O_CREAT
/apps11i/erpdev/10GAS/Apache/Apache/logs/mm.19389.mem
4: S_IFREG mode:0600 dev:307,28001 ino:606383 uid:65060 gid:54032 size:0
O_RDWR|O_CREAT
5: S_IFREG mode:0600 dev:307,28001 ino:621827 uid:65060 gid:54032 size:1056768
O_RDWR|O_CREAT
6: S_IFDOOR mode:0444 dev:351,0 ino:58 uid:0 gid:0 size:0
O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[421]
/var/run/name_service_door
7: S_IFIFO mode:0000 dev:350,0 ino:143956 uid:65060 gid:54032 size:0
O_RDWR
8: S_IFCHR mode:0666 dev:342,0 ino:47185924 uid:0 gid:3 rdev:90,0
O_RDONLY
/devices/pseudo/kstat@0:kstat
etc..
..
..
O_RDWR|O_CREAT
/apps11i/erpdev/10GAS/Apache/Apache/logs/dms_metrics.19389.shm.sem
21: S_IFREG mode:0600 dev:307,28001 ino:603445 uid:65060 gid:54032 size:17408
O_RDONLY FD_CLOEXEC
/apps11i/erpdev/10GAS/rdbms/mesg/ocius.msb
23: S_IFSOCK mode:0666 dev:348,0 ino:60339 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(49152),SO_RCVBUF(49152),IP_NEXTHOP(0.0.192.0)
sockname: AF_INET 3.56.189.4 port: 45395
peername: AF_INET 3.56.189.4 port: 12501
256: S_IFREG mode:0444 dev:85,0 ino:234504 uid:0 gid:3 size:1616
O_RDONLY|O_LARGEFILE
/etc/inet/hosts


Suppose you tried pldd on the same process gave this result:

# pldd 13789

13789: /apps11i/erp
dev/10GAS/Apache/Apache/bin/httpd -d /apps11i/erpdev/10G
/apps11i/erpdev/10GAS/lib32/libdms2.so
/lib/libpthread.so.1
/lib/libsocket.so.1
/lib/libnsl.so.1
/lib/libdl.so.1
/lib/libc.so.1
/platform/sun4u-us3/lib/libc_psr.so.1
/lib/libmd5.so.1
/platform/sun4u/lib/libmd5_psr.so.1
/lib/libscf.so.1
/lib/libdoor.so.1
/lib/libuutil.so.1
/lib/libgen.so.1
/lib/libmp.so.2
/lib/libm.so.2
/lib/libresolv.so.2
/apps11i/erpdev/10GAS/Apache/Apache/libexec/mod_onsint.so
/lib/librt.so.1
/apps11i/erpdev/10GAS/lib32/libons.so
/lib/libkstat.so.1
/lib/libaio.so.1
/apps11i/erpdev/10GAS/Apache/Apache/libexec/mod_mmap_static.so
/apps11i/erpdev/10GAS/Apache/Apache/libexec/mod_vhost_alias.so
/apps11i/erpdev/10GAS/Apache/Apache/libexec/mod_env.so
..
..
etc

/usr/lib/libsched.so.1
/apps11i/erpdev/10GAS/lib32/libclntsh.so.10.1
/apps11i/erpdev/10GAS/lib32/libnnz10.so
/apps11i/erpdev/10GAS/Apache/Apache/libexec/mod_wchandshake.so
/apps11i/erpdev/10GAS/Apache/Apache/libexec/mod_oc4j.so
/apps11i/erpdev/10GAS/Apache/Apache/libexec/mod_dms.so
/apps11i/erpdev/10GAS/Apache/Apache/libexec/mod_rewrite.so
/apps11i/erpdev/10GAS/Apache/oradav/lib/mod_oradav.so
/apps11i/erpdev/10GAS/Apache/modplsql/bin/modplsql.so 


# pmap -x $$

492328: -ksh
 Address  Kbytes     RSS    Anon  Locked Mode   Mapped File
00010000     192     192       -       - r-x--  ksh
00040000       8       8       8       - rwx--  ksh
00042000      40      40       8       - rwx--    [ heap ]
FF180000     680     680       -       - r-x--  libc.so.1
FF23A000      24      24       -       - rwx--  libc.so.1
FF240000       8       8       8       - rwx--  libc.so.1
FF280000     576     576       -       - r-x--  libnsl.so.1
FF310000      40      40       -       - rwx--  libnsl.so.1
FF31A000      24      16       -       - rwx--  libnsl.so.1
FF350000      16      16       -       - r-x--  libmp.so.2
FF364000       8       8       -       - rwx--  libmp.so.2
FF380000      40      40       -       - r-x--  libsocket.so.1
FF39A000       8       8       -       - rwx--  libsocket.so.1
FF3A0000       8       8       -       - r-x--  libdl.so.1
FF3B0000       8       8       8       - rwx--    [ anon ]
FF3C0000     152     152       -       - r-x--  ld.so.1
FF3F6000       8       8       8       - rwx--  ld.so.1
FFBFC000      16      16       8       - rw---    [ stack ]
-------- ------- ------- ------- -------
total Kb    1856    1848      48       -




1.2.13 Wellknown tools for AIX:
===============================

1. commands:
------------

CPU		Memory Subsystem	I/O Subsystem		Network Subsystem
---------------------------------------------------------------------------------
vmstat		vmstat			iostat			netstat
iostat		lsps			vmstat			ifconfig
ps		svmon			lsps			tcpdump
sar		filemon			filemon
tprof		ipcs			lvmstat

nmon and topas can be used to monitor those subsystems in general.

2. topas:
---------

topas is a useful graphical interface that will give you immediate results of what is going on in the system. 
When you run it without any command-line arguments, the screen looks like this: 


Topas Monitor for host:    aix4prt              EVENTS/QUEUES    FILE/TTY
Mon Apr 16 16:16:50 2001   Interval:  2         Cswitch    5984  Readch     4864
                                                Syscall   15776  Writech   34280
Kernel   63.1   |##################          |  Reads         8  Rawin         0
User     36.8   |##########                  |  Writes     2469  Ttyout        0
Wait      0.0   |                            |  Forks         0  Igets         0
Idle      0.0   |                            |  Execs         0  Namei         4
                                                Runqueue   11.5  Dirblk        0
Network  KBPS   I-Pack  O-Pack   KB-In  KB-Out  Waitqueue   0.0
lo0     213.9   2154.2  2153.7   107.0   106.9
tr0      34.7     16.9    34.4     0.9    33.8  PAGING           MEMORY
                                                Faults     3862  Real,MB    1023
Disk    Busy%     KBPS     TPS KB-Read KB-Writ  Steals     1580  % Comp     27.0
hdisk0    0.0      0.0     0.0     0.0     0.0  PgspIn        0  % Noncomp  73.9
                                                PgspOut       0  % Client    0.5
Name         PID CPU% PgSp Owner                PageIn        0
java       16684 83.6 35.1 root                 PageOut       0  PAGING SPACE
java       12192 12.7 86.2 root                 Sios          0  Size,MB     512
lrud        1032  2.7  0.0 root                                  % Used      1.2
aixterm    19502  0.5  0.7 root                 NFS (calls/sec)  % Free     98.7
topas       6908  0.5  0.8 root                 ServerV2       0
ksh        18148  0.0  0.7 root                 ClientV2       0   Press:
gil         1806  0.0  0.0 root                 ServerV3       0   "h" for help
 

The information on the bottom left side shows the most active processes; here, java is consuming 83.6% of CPU. 
The middle right area shows the total physical memory (1 GB in this case) and Paging space (512 MB), 
as well as the amount being used. So you get an excellent overview of what the system is doing 
in a single screen, and then you can select the areas to concentrate based on the information being shown here.


Note: about waits:
------------------

Don't get caught up in this whole wait i/o thing. a single cpu system 
with 1 i/o outstanding and no other runable threads (i.e. idle) will 
have 100% wait i/o. There was a big discussion a couple of years ago on 
removing the kernel tick as it has confused many many many techs. 

So, if you have only 1 or few cpu, then you are going to have high wait i.o 
figures, it does not neccessarily mean your disk subsystem is slow. 



3. trace:
---------

trace captures a sequential flow of time-stamped system events. The trace is a valuable tool for observing 
system and application execution. While many of the other tools provide high level statistics such as 
CPU and I/O utilization, the trace facility helps expand the information as to where the events happened, 
which process is responsible, when the events took place, and how they are affecting the system. 
Two post processing tools that can extract information from the trace are utld (in AIX 4) and curt 
(in AIX 5). These provide statistics on CPU utilization and process/thread activity. The third post 
processing tool is splat which stands for Simple Performance Lock Analysis Tool. This tool is used to analyze 
lock activity in the AIX kernel and kernel extension for simple locks.

4. nmon:
--------

nmon is a free software tool that gives much of the same information as topas, but saves the information 
to a file in Lotus 123 and Excel format. The download site is 
http://www.ibm.com/developerworks/eserver/articles/analyze_aix/. 
The information that is collected included CPU, disk, network, adapter statistics, kernel counters, 
memory and the "top" process information. 

5. tprof:
---------

tprof is one of the AIX legacy tools that provides a detailed profile of CPU usage for every 
AIX process ID and name. It has been completely rewritten for AIX 5.2, and the example below uses 
the AIX 5.1 syntax. You should refer to AIX 5.2 Performance Tools update: Part 3 for the new syntax. 

The simplest way to invoke this command is to use:  

# tprof -kse -x "sleep 10" 
# tprof -ske -x "sleep 30"


At the end of ten seconds, or 30 seconds, a new file __prof.all, or sleep.prof, is generated that contains 
information about what commands are using CPU on the system. Searching for FREQ, the information looks something 
like the example below:


              Process   FREQ  Total Kernel   User Shared  Other
              =======    ===  ===== ======   ==== ======  =====
               oracle    244  10635   3515   6897    223      0
                 java    247   3970    617      0   2062   1291
                 wait     16   1515   1515      0      0      0
    ...
              =======    ===  ===== ======   ==== ======  =====
                Total   1060  19577   7947   7252   3087   1291

 

This example shows that over half the CPU time is associated with the oracle application and that Java 
is using about 3970/19577 or 1/5 of the CPU. The wait usually means idle time, but can also include 
the I/O wait portion of the CPU usage.


svmon:
------

The svmon command captures a snapshot of the current state om memory.
use it with the -G switch to get global statistics for the whole system.

svmon is the most useful tool at your disposal when monitoring a Java process, especially native heap. 
The article "When segments collide" gives examples of how to use svmon -P <pid> -m to monitor the 
native heap of a Java process on AIX. But there is another variation, svmon -P <pid> -m -r, that is very 
effective in identifying native heap fragmentation. The -r switch prints the address range in use, so it gives 
a more accurate view of how much of each segment is in use. 
As an example, look at the partially edited output below: 

   Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd LPage
   10556 java            681613     2316     2461   501080      N     Y     N

    Vsid      Esid Type Description              LPage  Inuse   Pin Pgsp Virtual
   22ac4         9 mmap mapped to sid b1475          -      0     0    -     - 
   21047         8 mmap mapped to sid 30fe5          -      0     0    -     - 
   126a2         a mmap mapped to sid 91072          -      0     0    -     - 
   7908c         7 mmap mapped to sid 6bced          -      0     0    -     - 
   b2ad6         b mmap mapped to sid b1035          -      0     0    -     - 
   b1475         - work                              -  65536     0  282 65536 
   30fe5         - work                              -  65536     0  285 65536 
   91072         - work                              -  65536     0   54 65536 
   6bced         - work                              -  65536     0  261 65536 
   b1035         - work                              -  45054     0    0 45054 
                   Addr Range: 0..45055
   e0f9f         5 work shmat/mmap                   -  48284     0    3 48284 
   19100         3 work shmat/mmap                   -  46997     0  463 47210 
   c965a         4 work shmat/mmap                   -  46835     0  281 46953 
   7910c         6 work shmat/mmap                   -  37070     0    0 37070 
                   Addr Range: 0..50453
   e801d         d work shared library text          -   9172     0    0  9220 
                   Addr Range: 0..30861
   a0fb7         f work shared library data          -    105     0    1   106 
                   Addr Range: 0..2521
   21127         2 work process private              -     50     2    1    51 
                   Addr Range: 65300..65535
   a8535         1 pers code,/dev/q109waslv:81938    -     11     0    -     - 
                   Addr Range: 0..11


Other example:

# svmon -G -i 2 5    # sample five times at two second intervals

memory                 in use                     pin         pg space
size  inuse free pin   work  pers   clnt   work   pers  clnt  size  inuse
16384 16250 134  2006  10675 2939   2636   2006   0     0     40960  12674
16384 16250 134  2006  10675 2939   2636   2006   0     0     40960  12674
16384 16250 134  2006  10675 2939   2636   2006   0     0     40960  12674
16384 16250 134  2006  10675 2939   2636   2006   0     0     40960  12674
16384 16250 134  2006  10675 2939   2636   2006   0     0     40960  12674

In this example, there are 16384 pages of total size of memory. Multuply this number by 4096
to see the total real memory size. In this case the total memory is 64 MB.



filemon:
--------

filemon can be used to identify the files that are being used most actively. This tool gives a very 
comprehensive view of file access, and can be useful for drilling down once vmstat/iostat confirm disk 
to be a bottleneck.

Example:

# filemon -o /tmp/filemon.log; sleep 60; trcstop

The generated log file is quite large. Some sections that may be useful are:

Most Active Files
    ------------------------------------------------------------------------
      #MBs  #opns   #rds   #wrs  file                 volume:inode
    ------------------------------------------------------------------------

      25.7     83   6589      0  unix                 /dev/hd2:147514
      16.3      1   4175      0  vxe102               /dev/mailv1:581
      16.3      1      0   4173  .vxe102.pop          /dev/poboxv:62
      15.8      1      1   4044  tst1                 /dev/mailt1:904
       8.3   2117   2327      0  passwd               /dev/hd4:8205
       3.2    182    810      1  services             /dev/hd4:8652
    ...
    ------------------------------------------------------------------------
    Detailed File Stats
    ------------------------------------------------------------------------

    FILE: /var/spool/mail/v/vxe102  volume: /dev/mailv1 (/var/spool2/mail/v)  inode: 581
    opens:                  1
    total bytes xfrd:       17100800
    reads:                  4175    (0 errs)
      read sizes (bytes):   avg  4096.0 min    4096 max    4096 sdev     0.0
      read times (msec):    avg   0.543 min   0.011 max  78.060 sdev   2.753
    ...

curt:
-----

curt Command
Purpose
The CPU Utilization Reporting Tool (curt) command converts an AIX trace file into a number of statistics related 
to CPU utilization and either process, thread or pthread activity. These statistics ease the tracking of 
specific application activity. curt works with both uniprocessor and multiprocessor AIX Version 4 and AIX Version 5 
traces.

Syntax
curt -i inputfile [-o outputfile] [-n gennamesfile] [-m trcnmfile] [-a pidnamefile] [-f timestamp] 
                  [-l timestamp] [-ehpstP]

Description
The curt command takes an AIX trace file as input and produces a number of statistics related to 
processor (CPU) utilization and process/thread/pthread activity. It will work with both uniprocessor and 
multiprocessor AIX traces if the processor clocks are properly synchronized.


genkld:
-------

genkld Command

Purpose
The genkld command extracts the list of shared objects currently loaded onto the system and displays the address, 
size, and path name for each object on the list. 


Syntax
genkld 

Description
For shared objects loaded onto the system, the kernel maintains a linked list consisting of data structures 
called loader entries. A loader entry contains the name of the object, its starting address, and its size. 
This information is gathered and reported by the genkld command. 

Implementation Specifics
This command is valid only on the POWER-based platform. 

Examples
To obtain a list of loaded shared objects, enter: 

# genkld

..
        d0791c00   18ab27 /usr/lib/librtl.a[shr.o]
        d0194500     7e07 /usr/lib/libbsd.a[shr.o]
        d019d0f8     3d39 /usr/lib/libbind.a[shr.o]
        d0237100    1eac0 /usr/lib/libwlm.a[shr.o]
        d01d5100    1fff9 /usr/lib/libC.a[shr.o]
        d02109e0    262b2 /usr/lib/libC.a[shrcore.o]
        d01f6c60    190dc /usr/lib/libC.a[ansicore_32.o]
        d01b0000    24cfd /usr/lib/boot/bin/libcfg_chrp
        d010a000    367ad /usr/lib/libpthreads.a[shr_xpg5.o]
        d0142000     3cee /usr/lib/libpthreads.a[shr_comm.o]
        d017f100    1172a /usr/lib/libcfg.a[shr.o]
        d016c100    128b2 /usr/lib/libodm.a[shr.o]
        d014c100     b12d /usr/lib/libi18n.a[shr.o]
        d0158100    13b41 /usr/lib/libiconv.a[shr4.o]
        d01410f8      846 /usr/lib/libcrypt.a[shr.o]
..

etc..





1.2.14 Not so well known tools for AIX: the proc tools:
=======================================================


--proctree 
Displays the process tree containing the specified process IDs or users. To display the ancestors 
and all the children of process 12312, enter: 

# proctree 21166
11238    /usr/sbin/srcmstr
  21166    /usr/sbin/rsct/bin/IBM.AuditRMd 


To display the ancestors and children of process 21166, including children of process 0, enter: 

#proctree -a 21166 
1    /etc/init
   11238    /usr/sbin/srcmstr
      21166    /usr/sbin/rsct/bin/IBM.AuditRMd 



-- procstack 
Displays the hexadecimal addresses and symbolic names for each of the stack frames of the current thread 
in processes. To display the current stack of process 15052, enter: 

# procstack 15052
15052 : /usr/sbin/snmpd
d025ab80  select   (?, ?, ?, ?, ?) + 90
100015f4  main   (?, ?, ?) + 1814
10000128  __start   () + 8c
 
Currently, procstack displays garbage or wrong information for the top stack frame, and possibly for the 
second top stack frame. Sometimes it will erroneously display "No frames found on the stack," and sometimes 
it will display: deadbeef ???????? (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ...) The fix for this problem had not 
been released at the writing of this article. When the fix becomes available, you need to download the 
APAR IY48543 for 5.2. For AIX 5.3 it all should work OK.

-- procmap 
Displays a process address map. To display the address space of process 13204, enter: 

# procmap  13204 
13204 : /usr/sbin/biod 6
10000000	  3K	read/exec	biod
20000910	  0K	read/write	biod
d0083100	 79K	read/exec	/usr/lib/libiconv.a
20013bf0	 41K	read/write	/usr/lib/libiconv.a
d007a100	 34K	read/exec	/usr/lib/libi18n.a
20011378	  4K	read/write	/usr/lib/libi18n.a
d0074000	 11K	read/exec	/usr/lib/nls/loc/en_US
d0077130	  8K	read/write	/usr/lib/nls/loc/en_US
d00730f8	  2K	read/exec	/usr/lib/libcrypt.a
f03c7508	  0K	read/write	/usr/lib/libcrypt.a
d01d4e20    1997K	read/exec 	/usr/lib/libc.a
f0337e90     570K	read/write	/usr/lib/libc.a 


-- procldd 
Displays a list of libraries loaded by a process. To display the list of dynamic libraries loaded by 
process 11928, enter 

# procldd 11928. T 
 11928 : -sh
 /usr/lib/nls/loc/en_US
 /usr/lib/libcrypt.a
 /usr/lib/libc.a 


-- procflags 
Displays a process tracing flags, and the pending and holding signals. To display the tracing flags of 
process 28138, enter: 

# procflags 28138
28138 : /usr/sbin/rsct/bin/IBM.HostRMd
data model = _ILP32 flags = PR_FORK
/64763: flags = PR_ASLEEP | PR_NOREGS
/66315: flags = PR_ASLEEP | PR_NOREGS
/60641: flags = PR_ASLEEP | PR_NOREGS
/66827: flags = PR_ASLEEP | PR_NOREGS
/7515: flags = PR_ASLEEP | PR_NOREGS
/70439: flags = PR_ASLEEP | PR_NOREGS
/66061: flags = PR_ASLEEP | PR_NOREGS
/69149: flags = PR_ASLEEP | PR_NOREGS 


-- procsig 
Lists the signal actions for a process. To list all the signal actions defined for process 30552, enter: 

# procsig 30552
30552 : -ksh
HUP caught
INT caught
QUIT caught
ILL caught
TRAP caught
ABRT caught
EMT caught
FPE caught
KILL default RESTART BUS caught 


-- proccred 
Prints a process' credentials. To display the credentials of process 25632, enter: 

# proccred  25632
25632: e/r/suid=0  e/r/sgid=0 


-- procfiles 
Prints a list of open file descriptors. To display status and control information on the file descriptors 
opened by process 20138, enter: 

# procfiles -n 20138
20138 : /usr/sbin/rsct/bin/IBM.CSMAgentRMd
  Current rlimit: 2147483647 file descriptors
   0: S_IFCHR mode:00 dev:10,4 ino:4178 uid:0 gid:0 rdev:2,2
      O_RDWR  name:/dev/null
   2: S_IFREG mode:0311 dev:10,6 ino:250 uid:0 gid:0 rdev:0,0
      O_RDWR size:0   name:/var/ct/IBM.CSMAgentRM.stderr
   4: S_IFREG mode:0200 dev:10,6 ino:255 uid:0 gid:0 rdev:0,0 


-- procwdx 
Prints the current working directory for a process. To display the current working directory 
of process 11928, enter: 

# procwdx 11928
11928 :  /home/guest 


-- procstop 
Stops a process. To stop process 7500 on the PR_REQUESTED event, enter:

# procstop 7500 . 

-- procrun 
Restart a process. To restart process 30192 that was stopped on the PR_REQUESTED event, enter:

# procrun 30192 . 

-- procwait 
Waits for all of the specified processes to terminate. To wait for process 12942 to exit and display 
the status, enter 

# procwait -v 12942 .  
12942 : terminated, exit status 0 



1.2.15 Other monitoring:
========================


Nagios: open source Monitoring for most unix systems:
-----------------------------------------------------

Nagios is an open source host, service and network monitoring program. 

Latest versions: 2.5 (stable) 

Overview 
 
Nagios is a host and service monitor designed to inform you of network problems before your clients, 
end-users or managers do. It has been designed to run under the Linux operating system, but works fine 
under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify 
using external "plugins" which return status information to Nagios. When problems are encountered, 
the daemon can send notifications out to administrative contacts in a variety of different ways 
(email, instant message, SMS, etc.). Current status information, historical logs, and reports can all 
be accessed via a web browser. 
 
System Requirements 

The only requirement of running Nagios is a machine running Linux (or UNIX variant) and a C compiler. 
You will probably also want to have TCP/IP configured, as most service checks will be performed over the network. 

You are not required to use the CGIs included with Nagios. However, if you do decide to use them, 
you will need to have the following software installed... 


- A web server (preferrably Apache) 
- Thomas Boutell's gd library version 1.6.3 or higher (required by the statusmap and trends CGIs) 



rstat: Monitoring Machine Utilization with rstat:
-------------------------------------------------

rstat stands for Remote System Statistics service

Ports exist for most unixes, like Linux, Solaris, AIX etc..

-- rstat on Linux, Solaris:

rstat is an RPC client program to get and print statistics from any machine running the rpc.rstatd daemon, 
its server-side counterpart. The rpc.rstad daemon has been used for many years by tools such as Sun's perfmeter 
and the rup command. The rstat program is simply a new client for an old daemon. The fact that the rpc.rstatd daemon 
is already installed and running on most Solaris and Linux machines is a huge advantage over other tools 
that require the installation of custom agents. 

The rstat client compiles and runs on Solaris and Linux as well and can get statistics from any machine running 
a current rpc.rstatd daemon, such as Solaris, Linux, AIX, and OpenBSD. The rpc.rstatd daemon is started 
from /etc/inetd.conf on Solaris. It is similar to vmstat, but has some advantages over vmstat:

You can get statistics without logging in to the remote machine, including over the Internet. 

It includes a timestamp. 

The output can be plotted directly by gnuplot. 

The fact that it runs remotely means that you can use a single central machine to monitor the performance 
of many remote machines. It also has a disadvantage in that it does not give the useful scan rate measurement 
of memory shortage, the sr column in vmstat. rstat will not work across most firewalls because it relies on 
port 111, the RPC port, which is usually blocked by firewalls.

To use rstat, simply give it the name or IP address of the machine you wish to monitor. Remember that rpc.rstatd 
must be running on that machine. The rup command is extremely useful here because with no arguments, 
it simply prints out a list of all machines on the local network that are running the rstatd demon. 
If a machine is not listed, you may have to start rstatd manually. 

To start rpc.rstatd under Red Hat Linux, run 

# /etc/rc.d/init.d/rstatd start     as root. 

On Solaris, first try running the rstat client because inetd is often already configured to automatically 
start rpc.rstatd on request. If it the client fails with the error "RPC: Program not registered," 
make sure you have this line in your /etc/inet/inetd.conf and kill -HUP your inetd process to get it to 
re-read inetd.conf, as follows:

rstatd/2-4 tli rpc/datagram_v wait root /usr/lib/netsvc/rstat/rpc.rstatd rpc.rstatd

Then you can monitor that machine like this: 

% rstat enkidu 
2001 07 10 10 36 08  0   0   0 100    0    27   54     1     0    0   12  0.1 

This command will give you a one-second average and then it will exit. If you want to continuously monitor, 
give an interval in seconds on the command line. Here's an example of one line of output every two seconds: 

% rstat enkidu 2 
2001 07 10 10 36 28  0   0   1  98    0     0    7     2     0    0   61  0.0 
2001 07 10 10 36 30  0   0   0 100    0     0    0     2     0    0   15  0.0 
2001 07 10 10 36 32  0   0   0 100    0     0    0     2     0    0   15  0.0 
2001 07 10 10 36 34  0   0   0 100    0     5   10     2     0    0   19  0.0 
2001 07 10 10 36 36  0   0   0 100    0     0   46     2     0    0  108  0.0 
^C 

To get a usage message, the output format, the version number, and where to go for updates, just type rstat 
with no parameters:

% rstat
usage: rstat machine [interval]
output:
yyyy mm dd hh mm ss usr wio sys idl pgin pgout intr ipkts opkts coll  cs load
docs and src at http://patrick.net/software/rstat/rstat.html

Notice that the column headings line up with the output data.


-- AIX:

In order to get rstat working on AIX, you may need to configure rstatd.

As root 

1. Edit /etc/inetd.conf
Uncomment or add entry for rstatd
Eg
rstatd sunrpc_udp udp wait root /usr/sbin/rpc.rstatd rstatd 100001 1-3

2. Edit /etc/services
Uncomment or add entry for rstatd
Eg
rstatd 100001/udp

3. Refresh services
refresh -s inetd

4. Start rstatd
/usr/sbin/rpc.rstatd



1.2.16 UNIX ERROR CODES:
========================

It's always "handy" to have a list of errcodes from the errno.h headerfile.
It should be reasonable the same accross the unix versions.

Actually, this is only a very small list of errors and code. 
It is ONLY associated with the interaction of a process with the system. 

For example, the errors can be seen at boottime of a system, or what an 
error logging daemon might write in a logfile, is a very different story.


from the errno.h file:


>>> Errcodes Linux (generic):


#define EPERM            1      /* Operation not permitted */
#define ENOENT           2      /* No such file or directory */
#define ESRCH            3      /* No such process */
#define EINTR            4      /* Interrupted system call */
#define EIO              5      /* I/O error */
#define ENXIO            6      /* No such device or address */
#define E2BIG            7      /* Arg list too long */
#define ENOEXEC          8      /* Exec format error */
#define EBADF            9      /* Bad file number */
#define ECHILD          10      /* No child processes */
#define EAGAIN          11      /* Try again */
#define ENOMEM          12      /* Out of memory */
#define EACCES          13      /* Permission denied */
#define EFAULT          14      /* Bad address */
#define ENOTBLK         15      /* Block device required */
#define EBUSY           16      /* Device or resource busy */
#define EEXIST          17      /* File exists */
#define EXDEV           18      /* Cross-device link */
#define ENODEV          19      /* No such device */
#define ENOTDIR         20      /* Not a directory */
#define EISDIR          21      /* Is a directory */
#define EINVAL          22      /* Invalid argument */
#define ENFILE          23      /* File table overflow */
#define EMFILE          24      /* Too many open files */
#define ENOTTY          25      /* Not a typewriter */
#define ETXTBSY         26      /* Text file busy */
#define EFBIG           27      /* File too large */
#define ENOSPC          28      /* No space left on device */
#define ESPIPE          29      /* Illegal seek */
#define EROFS           30      /* Read-only file system */
#define EMLINK          31      /* Too many links */
#define EPIPE           32      /* Broken pipe */
#define EDOM            33      /* Math argument out of domain of func */
#define ERANGE          34      /* Math result not representable */
#define EDEADLK         35      /* Resource deadlock would occur */
#define ENAMETOOLONG    36      /* File name too long */
#define ENOLCK          37      /* No record locks available */
#define ENOSYS          38      /* Function not implemented */
#define ENOTEMPTY       39      /* Directory not empty */
#define ELOOP           40      /* Too many symbolic links encountered */
#define EWOULDBLOCK     EAGAIN  /* Operation would block */
#define ENOMSG          42      /* No message of desired type */
#define EIDRM           43      /* Identifier removed */
#define ECHRNG          44      /* Channel number out of range */
#define EL2NSYNC        45      /* Level 2 not synchronized */
#define EL3HLT          46      /* Level 3 halted */
#define EL3RST          47      /* Level 3 reset */
#define ELNRNG          48      /* Link number out of range */
#define EUNATCH         49      /* Protocol driver not attached */
#define ENOCSI          50      /* No CSI structure available */
#define EL2HLT          51      /* Level 2 halted */
#define EBADE           52      /* Invalid exchange */
#define EBADR           53      /* Invalid request descriptor */
#define EXFULL          54      /* Exchange full */
#define ENOANO          55      /* No anode */
#define EBADRQC         56      /* Invalid request code */
#define EBADSLT         57      /* Invalid slot */
#define EDEADLOCK       EDEADLK
#define EBFONT          59      /* Bad font file format */
#define ENOSTR          60      /* Device not a stream */
#define ENODATA         61      /* No data available */
#define ETIME           62      /* Timer expired */
#define ENOSR           63      /* Out of streams resources */
#define ENONET          64      /* Machine is not on the network */
#define ENOPKG          65      /* Package not installed */
#define EREMOTE         66      /* Object is remote */
#define ENOLINK         67      /* Link has been severed */
#define EADV            68      /* Advertise error */
#define ESRMNT          69      /* Srmount error */
#define ECOMM           70      /* Communication error on send */
#define EPROTO          71      /* Protocol error */
#define EMULTIHOP       72      /* Multihop attempted */
#define EDOTDOT         73      /* RFS specific error */
#define EBADMSG         74      /* Not a data message */
#define EOVERFLOW       75      /* Value too large for defined data type */
#define ENOTUNIQ        76      /* Name not unique on network */
#define EBADFD          77      /* File descriptor in bad state */
#define EREMCHG         78      /* Remote address changed */
#define ELIBACC         79      /* Can not access a needed shared library */
#define ELIBBAD         80      /* Accessing a corrupted shared library */
#define ELIBSCN         81      /* .lib section in a.out corrupted */
#define ELIBMAX         82      /* Attempting to link in too many shared libraries */
#define ELIBEXEC        83      /* Cannot exec a shared library directly */
#define EILSEQ          84      /* Illegal byte sequence */
#define ERESTART        85      /* Interrupted system call should be restarted */
#define ESTRPIPE        86      /* Streams pipe error */
#define EUSERS          87      /* Too many users */
#define ENOTSOCK        88      /* Socket operation on non-socket */
#define EDESTADDRREQ    89      /* Destination address required */
#define EMSGSIZE        90      /* Message too long */
#define EPROTOTYPE      91      /* Protocol wrong type for socket */
#define ENOPROTOOPT     92      /* Protocol not available */
#define EPROTONOSUPPORT 93      /* Protocol not supported */
#define ESOCKTNOSUPPORT 94      /* Socket type not supported */
#define EOPNOTSUPP      95      /* Operation not supported on transport endpoint */
#define EPFNOSUPPORT    96      /* Protocol family not supported */
#define EAFNOSUPPORT    97      /* Address family not supported by protocol */
#define EADDRINUSE      98      /* Address already in use */
#define EADDRNOTAVAIL   99      /* Cannot assign requested address */
#define ENETDOWN        100     /* Network is down */
#define ENETUNREACH     101     /* Network is unreachable */
#define ENETRESET       102     /* Network dropped connection because of reset */
#define ECONNABORTED    103     /* Software caused connection abort */
#define ECONNRESET      104     /* Connection reset by peer */
#define ENOBUFS         105     /* No buffer space available */
#define EISCONN         106     /* Transport endpoint is already connected */
#define ENOTCONN        107     /* Transport endpoint is not connected */
#define ESHUTDOWN       108     /* Cannot send after transport endpoint shutdown */
#define ETOOMANYREFS    109     /* Too many references: cannot splice */
#define ETIMEDOUT       110     /* Connection timed out */
#define ECONNREFUSED    111     /* Connection refused */
#define EHOSTDOWN       112     /* Host is down */
#define EHOSTUNREACH    113     /* No route to host */
#define EALREADY        114     /* Operation already in progress */
#define EINPROGRESS     115     /* Operation now in progress */
#define ESTALE          116     /* Stale NFS file handle */
#define EUCLEAN         117     /* Structure needs cleaning */
#define ENOTNAM         118     /* Not a XENIX named type file */
#define ENAVAIL         119     /* No XENIX semaphores available */
#define EISNAM          120     /* Is a named type file */
#define EREMOTEIO       121     /* Remote I/O error */
#define EDQUOT          122     /* Quota exceeded */
#define ENOMEDIUM       123     /* No medium found */
#define EMEDIUMTYPE     124     /* Wrong medium type */


The list above should actually be enough, but we shall list the same for AIX:


>>> errcodes AIX:


#define EPERM   1       /* Operation not permitted              */
#define ENOENT  2       /* No such file or directory            */
#define ESRCH   3       /* No such process                      */
#define EINTR   4       /* interrupted system call              */
#define EIO     5       /* I/O error                            */
#define ENXIO   6       /* No such device or address            */
#define E2BIG   7       /* Arg list too long                    */
#define ENOEXEC 8       /* Exec format error                    */
#define EBADF   9       /* Bad file descriptor                  */
#define ECHILD  10      /* No child processes                   */
#define EAGAIN  11      /* Resource temporarily unavailable     */
#define ENOMEM  12      /* Not enough space                     */
#define EACCES  13      /* Permission denied                    */
#define EFAULT  14      /* Bad address                          */
#define ENOTBLK 15      /* Block device required                */
#define EBUSY   16      /* Resource busy                        */
#define EEXIST  17      /* File exists                          */
#define EXDEV   18      /* Improper link                        */
#define ENODEV  19      /* No such device                       */
#define ENOTDIR 20      /* Not a directory                      */
#define EISDIR  21      /* Is a directory                       */
#define EINVAL  22      /* Invalid argument                     */
#define ENFILE  23      /* Too many open files in system        */
#define EMFILE  24      /* Too many open files                  */
#define ENOTTY  25      /* Inappropriate I/O control operation  */
#define ETXTBSY 26      /* Text file busy                       */
#define EFBIG   27      /* File too large                       */
#define ENOSPC  28      /* No space left on device              */
#define ESPIPE  29      /* Invalid seek                         */
#define EROFS   30      /* Read only file system                */
#define EMLINK  31      /* Too many links                       */
#define EPIPE   32      /* Broken pipe                          */
#define EDOM    33      /* Domain error within math function    */
#define ERANGE  34      /* Result too large                     */
#define ENOMSG  35      /* No message of desired type           */
#define EIDRM   36      /* Identifier removed                   */
#define ECHRNG  37      /* Channel number out of range          */
#define EL2NSYNC 38     /* Level 2 not synchronized             */
#define EL3HLT  39      /* Level 3 halted                       */
#define EL3RST  40      /* Level 3 reset                        */
#define ELNRNG  41      /* Link number out of range             */
#define EUNATCH 42      /* Protocol driver not attached         */
#define ENOCSI  43      /* No CSI structure available           */
#define EL2HLT  44      /* Level 2 halted                       */
#define EDEADLK 45      /* Resource deadlock avoided            */
#define ENOTREADY       46      /* Device not ready             */
#define EWRPROTECT      47      /* Write-protected media        */
#define EFORMAT         48      /* Unformatted media            */
#define ENOLCK          49      /* No locks available           */
#define ENOCONNECT      50      /* no connection                */
#define ESTALE          52      /* no filesystem                */
#define EDIST           53      /* old, currently unused AIX errno*/
#define EINPROGRESS     55      /* Operation now in progress */
#define EALREADY        56      /* Operation already in progress */
#define ENOTSOCK        57      /* Socket operation on non-socket */
#define EDESTADDRREQ    58      /* Destination address required */
#define EDESTADDREQ     EDESTADDRREQ /* Destination address required */
#define EMSGSIZE        59      /* Message too long */
#define EPROTOTYPE      60      /* Protocol wrong type for socket */
#define ENOPROTOOPT     61      /* Protocol not available */
#define EPROTONOSUPPORT 62      /* Protocol not supported */
#define ESOCKTNOSUPPORT 63      /* Socket type not supported */
#define EOPNOTSUPP      64      /* Operation not supported on socket */
#define EPFNOSUPPORT    65      /* Protocol family not supported */
#define EAFNOSUPPORT    66      /* Address family not supported by protocol family */
#define EADDRINUSE      67      /* Address already in use */
#define EADDRNOTAVAIL   68      /* Can't assign requested address */
#define ENETDOWN        69      /* Network is down */
#define ENETUNREACH     70      /* Network is unreachable */
#define ENETRESET       71      /* Network dropped connection on reset */
#define ECONNABORTED    72      /* Software caused connection abort */
#define ECONNRESET      73      /* Connection reset by peer */
#define ENOBUFS         74      /* No buffer space available */
#define EISCONN         75      /* Socket is already connected */
#define ENOTCONN        76      /* Socket is not connected */
#define ESHUTDOWN       77      /* Can't send after socket shutdown */
#define ETIMEDOUT       78      /* Connection timed out */
#define ECONNREFUSED    79      /* Connection refused */
#define EHOSTDOWN       80      /* Host is down */
#define EHOSTUNREACH    81      /* No route to host */
#define ERESTART        82      /* restart the system call */
#define EPROCLIM        83      /* Too many processes */
#define EUSERS          84      /* Too many users */
#define ELOOP           85      /* Too many levels of symbolic links      */
#define ENAMETOOLONG    86      /* File name too long                     */
#define EDQUOT          88      /* Disc quota exceeded */
#define ECORRUPT        89      /* Invalid file system control data */
#define EREMOTE         93      /* Item is not local to host */
#define ENOSYS          109     /* Function not implemented  POSIX */
#define EMEDIA          110     /* media surface error */
#define ESOFT           111     /* I/O completed, but needs relocation */
#define ENOATTR         112     /* no attribute found */
#define ESAD            113     /* security authentication denied */
#define ENOTRUST        114     /* not a trusted program */
#define ETOOMANYREFS    115     /* Too many references: can't splice */
#define EILSEQ          116     /* Invalid wide character */
#define ECANCELED       117     /* asynchronous i/o cancelled */
#define ENOSR           118     /* temp out of streams resources */
#define ETIME           119     /* I_STR ioctl timed out */
#define EBADMSG         120     /* wrong message type at stream head */
#define EPROTO          121     /* STREAMS protocol error */
#define ENODATA         122     /* no message ready at stream head */
#define ENOSTR          123     /* fd is not a stream */
#define ECLONEME        ERESTART /* this is the way we clone a stream ... */
#define ENOTSUP         124     /* POSIX threads unsupported value */
#define EMULTIHOP       125     /* multihop is not allowed */
#define ENOLINK         126     /* the link has been severed */
#define EOVERFLOW       127     /* value too large to be stored in data type */







==================================
2. NFS and Mount command examples:
==================================


Let's start with something that might be of interrest right now:


Examples of mounting a DVD or CDROM:
===================================

AIX:
----
# mount -r -v cdrfs /dev/cd0 /cdrom


Solaris:
--------
# mount -r -F hsfs /dev/dsk/c0t6d0s2 /cdrom


HPUX:
-----

mount -F cdfs -o rr /dev/dsk/c1t2d0 /cdrom


SuSE Linux:
-----------
# mount -t iso9660 /dev/cdrom /cdrom
# mount -t iso9660 /dev/cdrom /media/cdrom


Redhat Linux:
-------------
# mount -t iso9660 /dev/cdrom /media/cdrom

Other commands on Linux:
------------------------

Sometimes on some Linux, and some scsi CDROM devices, you might try

# mount /dev/sr0 /mount_point
# mount -t iso9660 /dev/sr0 /mount_point


Now we return to a discussion of "mounting" and NFS.


2.1 NFS:
========

We will discuss the most important feaures of NFS, by showing how its implemented on 
Solaris, Redhat and SuSE Linux. Most of this applies to HP-UX and AIX as well.


2.1.1 NFS and Redhat Linux:
---------------------------

Linux uses a combination of kernel-level support and continuously running daemon processes to provide 
NFS file sharing, however, NFS support must be enabled in the Linux kernel to function. 
NFS uses Remote Procedure Calls (RPC) to route requests between clients and servers, meaning that the 
portmap service must be enabled and active at the proper runlevels for NFS communication to occur. 
Working with portmap, various other processes ensure that a particular NFS connection is allowed and may 
proceed without error: 

rpc.mountd  - The running process that receives the mount request from an NFS client and checks to see 
              if it matches with a currently exported file system. 
rpc.nfsd    - The process that implements the user-level part of the NFS service. It works with the Linux kernel 
              to meet the dynamic demands of NFS clients, such as providing additional server threads for 
              NFS clients to uses. 
rpc.lockd   - A daemon that is not necessary with modern kernels. NFS file locking is now done by the kernel. 
              It is included with the nfs-utils package for users of older kernels that do not include this 
              functionality by default. 
rpc.statd   - Implements the Network Status Monitor (NSM) RPC protocol. This provides reboot notification 
              when an NFS server is restarted without being gracefully brought down. 
rpc.rquotad - An RPC server that provides user quota information for remote users. 

Not all of these programs are required for NFS service. The only services that must be enabled are rpc.mountd, 
rpc.nfsd, and portmap. The other daemons provide additional functionality and should only be used if your server 
environment requires them. 


NFS version 2 uses the User Datagram Protocol (UDP) to provide a stateless network connection between 
the client and server. NFS version 3 can use UDP or TCP running over an IP. The stateless UDP connection 
minimizes network traffic, as the NFS server sends the client a cookie after the client is authorized 
to access the shared volume. This cookie is a random value stored on the server's side and is passed 
with along with RPC requests from the client. The NFS server can be restarted without affecting the clients 
and the cookie will remain intact. 

NFS only performs authentication when a client system attempts to mount a remote file system. To limit access, 
the NFS server first employs TCP wrappers. TCP wrappers reads the /etc/hosts.allow and /etc/hosts.deny files 
to determine if a particular client should be permitted or prevented access to the NFS server.  
After the client is allowed past TCP wrappers, the NFS server refers to its configuration file, 
"/etc/exports", to determine whether the client has enough privileges to mount any of the exported file systems. 
After granting access, any file and directory operations are sent to the server using remote procedure calls. 

 Warning 
  NFS mount privileges are granted specifically to a client, not a user. If you grant a client machine access 
  to an exported file system, any users of that machine will have access to the data. 

When configuring the /etc/exports file, be extremely careful about granting read-write permissions 
(rw) to a remote host. 
 
-- NFS and portmap
NFS relies upon remote procedure calls (RPC) to function. portmap is required to map RPC requests to the 
correct services. RPC processes notify portmap when they start, revealing the port number they are monitoring 
and the RPC program numbers they expect to serve. The client system then contacts portmap on the server with 
a particular RPC program number. portmap then redirects the client to the proper port number to communicate 
with its intended service. 

Because RPC-based services rely on portmap to make all connections with incoming client requests, 
portmap must be available before any of these services start. If, for some reason, the portmap service 
unexpectedly quits, restart portmap and any services running when it was started. 

The portmap service can be used with the host access files (/etc/hosts.allow and /etc/hosts.deny) to control 
which remote systems are permitted to use RPC-based services on your machine. Access control rules for portmap 
will affect all RPC-based services. Alternatively, you can specify each of the NFS RPC daemons to be affected 
by a particular access control rule. The man pages for rpc.mountd and rpc.statd contain information regarding 
the precise syntax of these rules. 

-- portmap Status
As portmap provides the coordination between RPC services and the port numbers used to communicate with them, 
it is useful to be able to get a picture of the current RPC services using portmap when troubleshooting. 
The rpcinfo command shows each RPC-based service with its port number, RPC program number, version, 
and IP protocol type (TCP or UDP). 
To make sure the proper NFS RPC-based services are enabled for portmap, rpcinfo -p can be useful: 

# rpcinfo -p

   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp   1024  status
    100024    1   tcp   1024  status
    100011    1   udp    819  rquotad
    100011    2   udp    819  rquotad
    100005    1   udp   1027  mountd
    100005    1   tcp   1106  mountd
    100005    2   udp   1027  mountd
    100005    2   tcp   1106  mountd
    100005    3   udp   1027  mountd
    100005    3   tcp   1106  mountd
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100021    1   udp   1028  nlockmgr
    100021    3   udp   1028  nlockmgr
    100021    4   udp   1028  nlockmgr
 

The -p option probes the portmapper on the specified host or defaults to localhost if no specific host is listed. 
Other options are available from the rpcinfo man page. 
From the output above, various NFS services can be seen running. If one of the NFS services does not start up 
correctly, portmap will be unable to map RPC requests from clients for that service to the correct port. 
In many cases, restarting NFS as root (/sbin/service nfs restart) will cause those service to correctly 
register with portmap and begin working. 

# /sbin/service nfs restart

-- NFS Server Configuration Files
Configuring a system to share files and directories using NFS is straightforward. Every file system being 
exported to remote users via NFS, as well as the access rights relating to those file systems, 
is located in the /etc/exports file. This file is read by the exportfs command to give rpc.mountd and rpc.nfsd 
the information necessary to allow the remote mounting of a file system by an authorized host. 

The exportfs command allows you to selectively export or unexport directories without restarting the various 
NFS services. When exportfs is passed the proper options, the file systems to be exported are written to 
/var/lib/nfs/xtab. Since rpc.mountd refers to the xtab file when deciding access privileges to a file system, 
changes to the list of exported file systems take effect immediately. 

Various options are available when using exportfs: 


-r - Causes all directories listed in /etc/exports to be exported by constructing a new export list in 
     /etc/lib/nfs/xtab. This option effectively refreshes the export list with any changes that have been 
     made to /etc/exports. 

-a - Causes all directories to be exported or unexported, depending on the other options passed to exportfs. 

-o   options - Allows the user to specify directories to be exported that are not listed in /etc/exports. 
     These additional file system shares must be written in the same way they are specified in /etc/exports. 
     This option is used to test an exported file system before adding it permanently to the list of file systems 
     to be exported. 

-i - Tells exportfs to ignore /etc/exports; only options given from the command line are used to define 
     exported file systems. 

-u - Unexports directories from being mounted by remote users. The command exportfs -ua effectively suspends 
     NFS file sharing while keeping the various NFS daemons up. To allow NFS sharing to continue, type exportfs -r. 

-v - Verbose operation, where the file systems being exported or unexported are displayed in greater detail 
     when the exportfs command is executed. 

If no options are passed to the exportfs command, it displays a list of currently exported file systems. 

Changes to /etc/exports can also be read by reloading the NFS service with the service nfs reload command. 
This keeps the NFS daemons running while re-exporting the /etc/exports file. 

-- /etc/exports
The /etc/exports file is the standard for controlling which file systems are exported to which hosts, 
as well as specifying particular options that control everything. Blank lines are ignored, comments can be made 
using #, and long lines can be wrapped with a backslash (\). Each exported file system should be on its own line. 
Lists of authorized hosts placed after an exported file system must be separated by space characters. 
Options for each of the hosts must be placed in parentheses directly after the host identifier, without any spaces 
separating the host and the first parenthesis. 

In its simplest form, /etc/exports only needs to know the directory to be exported and the hosts 
permitted to use it: 

/some/directory bob.domain.com
/another/exported/directory 192.168.0.3
 
n5111sviob

After re-exporting /etc/exports with the "/sbin/service nfs reload" command, the bob.domain.com host will be 
able to mount /some/directory and 192.168.0.3 can mount /another/exported/directory. Because no options 
are specified in this example, several default NFS preferences take effect.

In order to override these defaults, you must specify an option that takes its place. For example, if you do 
not specify rw, then that export will only be shared read-only. Each default for every exported file system 
must be explicitly overridden. Additionally, other options are available where no default value is in place. 
These include the ability to disable sub-tree checking, allow access from insecure ports, and allow insecure 
file locks (necessary for certain early NFS client implementations). See the exports man page for details 
on these lesser used options. 

When specifying hostnames, you can use the following methods: 

single host - Where one particular host is specified with a fully qualified domain name, hostname, or IP address. 

wildcards   - Where a * or ? character is used to take into account a grouping of fully qualified domain names 
              that match a particular string of letters. Wildcards are not to be used with IP addresses; however, 
              they may accidently work if reverse DNS lookups fail. 

However, be careful when using wildcards with fully qualified domain names, as they tend to be more exact 
than you would expect. For example, the use of *.domain.com as wildcard will allow sales.domain.com to access 
the exported file system, but not bob.sales.domain.com. To match both possibilities, as well as 
sam.corp.domain.com, you would have to provide *.domain.com *.*.domain.com. 

IP networks - Allows the matching of hosts based on their IP addresses within a larger network. For example, 
              192.168.0.0/28 will allow the first 16 IP addresses, from 192.168.0.0 to 192.168.0.15, 
              to access the exported file system but not 192.168.0.16 and higher. 

netgroups   - Permits an NIS netgroup name, written as @<group-name>, to be used. This effectively puts the 
              NIS server in charge of access control for this exported file system, where users can be added 
              and removed from an NIS group without affecting /etc/exports. 


Warning 
  The way in which the /etc/exports file is formatted is very important, particularly concerning the use of 
  space characters. Remember to always separate exported file systems from hosts and hosts from one another 
  with a space character. However, there should be no other space characters in the file unless they are used 
  in comment lines. 

  For example, the following two lines do not mean the same thing: 

 /home bob.domain.com(rw)
 /home bob.domain.com (rw)
 

  The first line allows only users from bob.domain.com read-write access to the /home directory. 
  The second line allows users from bob.domain.com to mount the directory read-only (the default), but the rest 
  of the world can mount it read-write. Be careful where space characters are used in /etc/exports. 
 

-- NFS Client Configuration Files - What to do on a client?

Any NFS share made available by a server can be mounted using various methods. Of course, the share can be 
manually mounted, using the mount command, to acquire the exported file system at a particular mount point. 
However, this requires that the root user type the mount command every time the system restarts. 
In addition, the root user must remember to unmount the file system when shutting down the machine. 
Two methods of configuring NFS mounts include modifying the /etc/fstab or using the autofs service. 

> /etc/fstab
Placing a properly formatted line in the /etc/fstab file has the same effect as manually mounting the 
exported file system. The /etc/fstab file is read by the /etc/rc.d/init.d/netfs script at system startup. 
The proper file system mounts, including NFS, are put into place. 

A sample /etc/fstab line to mount an NFS export looks like the following: 

<server>:</path/of/dir> </local/mnt/point> nfs <options> 0 0
 
The <server-host> relates to the hostname, IP address, or fully qualified domain name of the server exporting 
the file system. The </path/to/shared/directory> tells the server what export to mount. 
The </local/mount/point> specifies where on the local file system to mount the exported directory. 
This mount point must exist before /etc/fstab is read or the mount will fail. The nfs option specifies 
the type of file system being mounted. 

The <options> area specifies how the file system is to be mounted. For example, if the options 
area states rw,suid on a particular mount, the exported file system will be mounted read-write and the 
user and group ID set by the server will be used. Note, parentheses are not to be used here.  



2.1.2 NFS and SuSE Linux:
-------------------------

-- Importing File Systems with YaST

Any user authorized to do so can mount NFS directories from an NFS server into his own file tree. 
This can be achieved most easily using the YaST module `NFS Client'. Just enter the host name of the NFS server, 
the directory to import, and the mount point at which to mount this directory locally. 
All this is done after clicking `Add' in the first dialog.


-- Importing File Systems Manually

File systems can easily be imported manually from an NFS server. The only prerequisite is a running 
RPC port mapper, which can be started by entering the command 
# rcportmap start 

as root. Once this prerequisite is met, remote file systems exported on the respective machines 
can be mounted in the file system just like local hard disks using the command mount with the following syntax: 

# mount host:remote-path local-path

If user directories from the machine sun, for example, should be imported, the following command can be used: 

# mount sun:/home /home
 

-- Exporting File Systems with YaST

With YaST, turn a host in your network into an NFS server - a server that exports directories and files 
to all hosts granted access to it. This could be done to provide applications to all coworkers of a group 
without installing them locally on each and every host. To install such a server, start YaST and select 
`Network Services' -> `NFS Server' 

Next, activate `Start NFS Server' and click `Next'. In the upper text field, enter the directories to export. 
Below, enter the hosts that should have access to them. 
There are four options that can be set for each host: single host, netgroups, wildcards, and IP networks. 
A more thorough explanation of these options is provided by man exports. `Exit' completes the configuration. 


-- Exporting File Systems Manually

If you do not want to use YaST, make sure the following systems run on the NFS server: 

RPC portmapper (portmap) 
RPC mount daemon (rpc.mountd) 
RPC NFS daemon (rpc.nfsd) 

For these services to be started by the scripts "/etc/init.d/portmap" and "/etc/init.d/nfsserver" 
when the system is booted, enter the commands 

# insserv /etc/init.d/nfsserver    and 
# insserv /etc/init.d/portmap. 

Also define which file systems should be exported to which host in the configuration file "/etc/exports". 

For each directory to export, one line is needed to set which machines may access that directory 
with what permissions. All subdirectories of this directory are automatically exported as well. 
Authorized machines are usually specified with their full names (including domain name), but it is possible 
to use wild cards like * or ? (which expand the same way as in the Bash shell). If no machine is specified here, 
any machine is allowed to import this file system with the given permissions. 

Set permissions for the file system to export in brackets after the machine name. The most important options are: 

ro 		File system is exported with read-only permission (default).  
rw 		File system is exported with read-write permission.  
root_squash 	This makes sure the user root of the given machine does not have root permissions 
                on this file system. This is achieved by assigning user ID 65534 to users with user ID 0 (root). 
                This user ID should be set to nobody (which is the default).  
no_root_squash 	Does not assign user ID 0 to user ID 65534, keeping the root permissions valid.  
link_relative	Converts absolute links (those beginning with /) to a sequence of ../. 
                This is only useful if the entire file system of a machine is mounted (default).  
link_absolute	Symbolic links remain untouched.  
map_identity	User IDs are exactly the same on both client and server (default).  
map_daemon	Client and server do not have matching user IDs. This tells nfsd to create a conversion table 
                for user IDs. The ugidd daemon is required for this to work.  

/etc/exports is read by mountd and nfsd. If you change anything in this file, restart mountd and nfsd 
for your changes to take effect. This can easily be done with "rcnfsserver restart". 


Example SuSE /etc/exports

#
# /etc/exports
#
/home            sun(rw)   venus(rw)
/usr/X11         sun(ro)   venus(ro)
/usr/lib/texmf   sun(ro)   venus(rw)
/                earth(ro,root_squash)
/home/ftp        (ro)
# End of exports




2.2 Mount command:
==================

The standard form of the mount command, is 

mount -F typefs device mountdir (solaris, HP-UX)
mount -t typefs device mountdir (many other unix's)

This tells the kernel to attach the file system found on "device" (which is of type type) 
at the directory "dir". 
The previous contents (if any) and owner and mode of dir become invisible, 
and as long as this file system remains mounted, 
the pathname dir refers to the root of the file system on device. 

The syntax is:
mount [options] [type] [device] [mountpoint]


-- mounting a remote filesystem:

syntax: mount -F nfs <options> <-o specific options> -O  <server>:<filesystem> <local_mount_point>

# mount -F nfs hpsrv:/data /data
# mount -F nfs -o hard,intr thor:/data  /data


- standard mounts are determined by files like  /etc/fstab (HP-UX) or /etc/filesystems (AIX) or /etc/vfstab etc..


2.2.1 Where are the standard mounts defined?
============================================

In Solaris:
===========

- standard mounts are determined by /etc/vfstab etc..
- NFS mounts are determined by the file /etc/dfs/dfstab. Here you will find share commands. 
- currently mounted filesystems are listed in /etc/mnttab

In Linux:
=========

- standard mounts are determined by most Linux distros by "/etc/fstab".

In AIX:
=======

- standard mounts and properties are determined by the file "/etc/filesystems".

In HP-UX:
=========

There is a /etc/fstab which contains all of the filesystems are mounted at boot time.
The filesystems that are OS related are / , /var, /opt , /tmp, /usr , /stand

The filesystem that is special is /stand, this is where your kernel is built and resides. 
Notice that the filesystem type is "hfs". HPUX kernels MUST reside on an hfs filesystem



An example of /etc/vfstab:
--------------------------

starboss:/etc $ more vfstab
#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/md/dsk/d1  -       -       swap    -       no      -
/dev/md/dsk/d0  /dev/md/rdsk/d0 /       ufs     1       no      logging
/dev/md/dsk/d4  /dev/md/rdsk/d4 /usr    ufs     1       no      logging
/dev/md/dsk/d3  /dev/md/rdsk/d3 /var    ufs     1       no      logging
/dev/md/dsk/d7  /dev/md/rdsk/d7 /export ufs     2       yes     logging
/dev/md/dsk/d5  /dev/md/rdsk/d5 /usr/local      ufs     2       yes     logging
/dev/dsk/c2t0d0s0 /dev/rdsk/c2t0d0s0    /export2        ufs     2       yes     logging
swap - /tmp tmpfs - yes size=512m


mount adds an entry, umount deletes an entry.
mounting applies to local filesystemes, or remote filesystems via NFS


Local mount example:


mount -F ufs -o logging /dev/dsk/c0t0d0s3 /mnt

At Remote server: 
share, shareall, or add entry in /etc/dfs/dfstab
# share -F nfs /var/mail  

Unmount a mounted FS

First check who is using it
# fuser -c mountpoint
# umount mointpoint




2.2.2 Mounting a NFS filesystem in HP-UX:
=========================================

Mounting Remote File Systems 
You can use either SAM or the mount command to mount file systems located on a remote system.

Before you can mount file systems located on a remote system, NFS software must be installed and 
configured on both local and remote systems. Refer to Installing and Administering NFS for information.

For information on mounting NFS file systems using SAM, see SAM's online help.

To mount a remote file system using HP-UX commands,

You must know the name of the host machine and the file system's directory on the remote machine.
Establish communication over a network between the local system (that is, the "client") and the 
remote system. (The local system must be able to reach the remote system via whatever hosts database is in use.) 
(See named(1M) and hosts(4).) If necessary, test the connection with /usr/sbin/ping; see ping(1M).

Make sure the file /etc/exports on the remote system lists the file systems that you wish to make available 
to clients (that is, to "export") and the local systems that you wish to mount the file systems.

For example, to allow machines called rolf and egbert to remotely mount the /usr file system, edit the file 
/etc/exports on the remote machine and include the line:
 
/usr rolf egbert 
 
Execute /usr/sbin/exportfs -a on the remote system to export all directories in /etc/exports to clients.

For more information, see exportfs(1M).
 
 NOTE: If you wish to invoke exportfs -a at boot time, make sure the NFS configuration file /etc/rc.config.d/nfsconf 
 on the remote system contains the following settings: NFS_SERVER=1 and START_MOUNTD=1. 
 The client's /etc/rc.config.d/nfsconf file must contain NFS_CLIENT=1. Then issue the following command 
 to run the script: 
 /sbin/init.d/nfs.server start  
 
Mount the file system on the local system, as in:
 
# mount -F nfs remotehost:/remote_dir /local_dir 



Just a bunch of mount command examples:
---------------------------------------

# mount
# mount -a
# mountall -l
# mount -t type device dir                  
# mount -F pcfs /dev/dsk/c0t0d0p0:c /pcfs/c 
# mount /dev/md/dsk/d7 /u01
# mount sun:/home /home
# mount -t nfs 137.82.51.1:/share/sunos/local /usr/local
# mount /dev/fd0 /mnt/floppy
# mount -o ro /dev/dsk/c0t6d0s1 /mnt/cdrom
# mount -V cdrfs -o ro /dev/cd0  /cdrom



2.2.3 Solaris mount command:
============================

The unix mount command is used to mount a filesystem, and it attaches disks, and directories logically 
rather than physically. It takes a minimum of two arguments:

1) the name of the special device which contains the filesystem
2) the name of an existing directory on which to mount the file system

Once the file system is mounted, the directory becomes the mount point. All the file systems will now be usable 
as if they were subdirectories of the file system they were mounted on. The table of currently mounted file systems 
can be found by examining the mounted file system information file. This is provided by a file system that is usually 
mounted on /etc/mnttab.


Mounting a file system causes three actions to occur:

1. The superblock for the mounted file system is read into memory
2. An entry is made in the /etc/mnttab file
3. An entry is made in the inode for the directory on which the file system is mounted which marks the directory 
as a mount point

The /etc/mountall command mounts all filesystems as described in the /etc/vfstab file.
Note that /etc/mount and /etc/mountall commands can only be executed by the superuser.

OPTIONS

-F FSType
   Used to specify the FSType on which to operate. The FSType must be specified or must be determinable from
   /etc/vfstab, or by consulting /etc/default/fs or /etc/dfs/fstypes.

-a [ mount_points. . . ]
   Perform mount or umount operations in parallel, when possible.

If mount points are not specified, mount will mount all file systems whose /etc/vfstab "mount at boot"
field is "yes". If mount points are specified, then /etc/vfstab "mount at boot" field will be ignored.

If mount points are specified, umount will only umount those mount points. If none is specified, then umount
will attempt to unmount all file systems in /etc/mnttab, with the exception of certain system
required file systems: /, /usr, /var, /var/adm, /var/run, /proc, /dev/fd and /tmp.

-f Forcibly unmount a file system.
   Without this option, umount does not allow a file system to be unmounted if a file on the file system is
   busy. Using this option can cause data loss for open files; programs which access files after the file sys-
   tem has been unmounted will get an error (EIO).

-p Print the list of mounted file systems in the /etc/vfstab format. Must be the only option specified.

-v Print the list of mounted file systems in verbose format. Must be the only option specified.

-V Echo the complete command line, but do not execute the command. umount generates a command line by using the
   options and arguments provided by the user and adding to them information derived from /etc/mnttab. This
   option should be used to verify and validate the command line.

generic_options
Options that are commonly supported by most FSType-specific command modules. The following options are
available:

-m Mount the file system without making an entry in /etc/mnttab.

-g Globally mount the file system. On a clustered system, this globally mounts the file system on
   all nodes of the cluster. On a non-clustered system this has no effect.

-o Specify FSType-specific options in a comma separated (without spaces) list of suboptions
   and keyword-attribute pairs for interpretation by the FSType-specific module of the command.
   (See mount_ufs(1M))

-O Overlay mount. Allow the file system to be mounted over an existing mount point, making
   the underlying file system inaccessible. If a mount is attempted on a pre-existing mount point
   without setting this flag, the mount will fail, producing the error "device busy".

-r Mount the file system read-only.


Example mount:

mount -F ufs -o logging /dev/dsk/c0t0d0s3 /mnt


Example mountpoints and disks:
------------------------------

Mountpunt	Device	        Omvang 	Doel
/	       /dev/md/dsk/d1   100	Unix Root-filesysteem
/usr	       /dev/md/dsk/d3	1200	Unix usr-filesysteem
/var	       /dev/md/dsk/d4	200	Unix var-filesysteem
/home	       /dev/md/dsk/d5	200	Unix opt-filesysteem
/opt	       /dev/md/dsk/d6	4700	Oracle_Home
/u01	       /dev/md/dsk/d7	8700	Oracle datafiles	
/u02	       /dev/md/dsk/d8	8700	Oracle datafiles	
/u03	       /dev/md/dsk/d9	8700	Oracle datafiles	
/u04	       /dev/md/dsk/d10	8700	Oracle datafiles	
/u05	       /dev/md/dsk/d110	8700	Oracle datafiles	
/u06	       /dev/md/dsk/d120	8700	Oracle datafiles	
/u07	       /dev/md/dsk/d123	8650 	Oracle datafiles	

Suppose you have only 1 disk of about 72GB, 2GB RAM:

Entire disk= Slice 2

/        Slice 0, partition  about 2G
swap     Slice 1, partition  about 4G
/export  Slice 3, partition  about 50G, maybe you link it to /u01
/var     Slice 4, partition  about 2G
/opt     Slice 5, partition  about 10G if you plan to install apps here
/usr     Slice 6, partition  about 2G
/u01     Slice 7, partition  optional, standard it's /home
         Depending on how you configure /export, size could be around 20G


find . -name dfctowdk\*.zip | while read file; do pkzip25 -extract -translate=unix ->


2.2.4 mount command on AIX:
===========================

Typical examples:

# mount -o soft 10.32.66.75:/data/nim /mnt
# mount -o soft abcsrv:/data/nim /mnt
# mount -o soft n580l03:/data/nim /mnt


Note 1:
-------

mount [ -f ] [ -n Node ] [ -o Options ] [ -p ] [ -r ] [ -v VfsName ] [ -t Type | [ Device | Node:Directory ] 
      Directory | all | -a ] [-V [generic_options] special_mount_points 

If you specify only the Directory parameter, the mount command takes it to be the name of the directory or file on which 
a file system, directory, or file is usually mounted (as defined in the /etc/filesystems file). The mount command looks up 
the associated device, directory, or file and mounts it. This is the most convenient way of using the mount command, 
because it does not require you to remember what is normally mounted on a directory or file. You can also specify only 
the device. In this case, the command obtains the mount point from the /etc/filesystems file.

The /etc/filesystems file should include a stanza for each mountable file system, directory, or file. This stanza should 
specify at least the name of the file system and either the device on which it resides or the directory name. 
If the stanza includes a mount attribute, the mount command uses the associated values. It recognizes five values 
for the mount attributes: automatic, true, false, removable, and readonly. 

The mount all command causes all file systems with the mount=true attribute to be mounted in their normal places. 
This command is typically used during system initialization, and the corresponding mounts are referred to as 
automatic mounts. 

Example mount command on AIX:
-----------------------------

$ mount

  node       mounted        mounted over    vfs       date        options
-------- ---------------  ---------------  ------ ------------ ---------------
         /dev/hd4         /                jfs2   Jun 06 17:15 rw,log=/dev/hd8
         /dev/hd2         /usr             jfs2   Jun 06 17:15 rw,log=/dev/hd8
         /dev/hd9var      /var             jfs2   Jun 06 17:15 rw,log=/dev/hd8
         /dev/hd3         /tmp             jfs2   Jun 06 17:15 rw,log=/dev/hd8
         /dev/hd1         /home            jfs2   Jun 06 17:16 rw,log=/dev/hd8
         /proc            /proc            procfs Jun 06 17:16 rw
         /dev/hd10opt     /opt             jfs2   Jun 06 17:16 rw,log=/dev/hd8
         /dev/fslv00      /XmRec           jfs2   Jun 06 17:16 rw,log=/dev/hd8
         /dev/fslv01      /tmp/m2          jfs2   Jun 06 17:16 rw,log=/dev/hd8
         /dev/fslv02      /software        jfs2   Jun 06 17:16 rw,log=/dev/hd8
         /dev/oralv       /opt/app/oracle  jfs2   Jun 06 17:25 rw,log=/dev/hd8
         /dev/db2lv       /db2_database    jfs2   Jun 06 19:54 rw,log=/dev/loglv00
         /dev/fslv03      /bmc_home        jfs2   Jun 07 12:11 rw,log=/dev/hd8
         /dev/homepeter   /home/peter      jfs2   Jun 13 18:42 rw,log=/dev/hd8
         /dev/bmclv       /bcict/stage     jfs2   Jun 15 15:21 rw,log=/dev/hd8
         /dev/u01         /u01             jfs2   Jun 22 00:22 rw,log=/dev/loglv01
         /dev/u02         /u02             jfs2   Jun 22 00:22 rw,log=/dev/loglv01
         /dev/u05         /u05             jfs2   Jun 22 00:22 rw,log=/dev/loglv01
         /dev/u03         /u03             jfs2   Jun 22 00:22 rw,log=/dev/loglv01
         /dev/backuo      /backup_ora      jfs2   Jun 22 00:22 rw,log=/dev/loglv02
         /dev/u02back     /u02back         jfs2   Jun 22 00:22 rw,log=/dev/loglv03
         /dev/u01back     /u01back         jfs2   Jun 22 00:22 rw,log=/dev/loglv03
         /dev/u05back     /u05back         jfs2   Jun 22 00:22 rw,log=/dev/loglv03
         /dev/u04back     /u04back         jfs2   Jun 22 00:22 rw,log=/dev/loglv03
         /dev/u03back     /u03back         jfs2   Jun 22 00:22 rw,log=/dev/loglv03
         /dev/u04         /u04             jfs2   Jun 22 10:25 rw,log=/dev/loglv01



Example /etc/filesystems file:

/var:
        dev             = /dev/hd9var
        vfs             = jfs2
        log             = /dev/hd8
        mount           = automatic
        check           = false
        type            = bootfs
        vol             = /var
        free            = false

/tmp:
        dev             = /dev/hd3
        vfs             = jfs2
        log             = /dev/hd8
        mount           = automatic
        check           = false
        vol             = /tmp
        free            = false


/opt:
        dev             = /dev/hd10opt
        vfs             = jfs2
        log             = /dev/hd8
        mount           = true
        check           = true
        vol             = /opt
        free            = false

Example of the relation of Logigal Volumes and mountpoints:

/dev/lv01 = /u01
/dev/lv02 = /u02
/dev/lv03 = /u03
/dev/lv04 = /data
/dev/lv00 = /spl




2.2.5 Some other commands related to mounts:
===========================================

fsstat command:
---------------

On some unixes, the fsstat command is available. It provides filesystem statitstics.
It can take a lot of switches, thus be sure to check the man pages.

On Solaris, the following example shows the statistics for each file operation for "/" (using the -f option):

$ fsstat -f /
Mountpoint: /
 operation  #ops  bytes
      open 8.54K
     close  9.8K
      read 43.6K  65.9M
     write 1.57K  2.99M
     ioctl 2.06K
     setfl     4
   getattr 40.3K
   setattr    38
    access 9.19K
    lookup  203K
    create   595
    remove    56
      link     0
    rename     9
     mkdir    19
     rmdir     0
   readdir 2.02K  2.27M
   symlink     4
  readlink 8.31K
     fsync   199
  inactive 2.96K
       fid     0
    rwlock 47.2K
  rwunlock 47.2K
      seek 29.1K
       cmp 42.9K
    frlock 4.45K
     space     8
    realvp 3.25K
   getpage  104K
   putpage 2.69K
       map 13.2K
    addmap 34.4K
    delmap 33.4K
      poll   287
      dump     0
  pathconf    54
    pageio     0
   dumpctl     0
   dispose 23.8K
getsecattr   697
setsecattr     0
   shrlock     0
   vnevent     0



fuser command:
--------------

AIX:

Purpose
Identifies processes using a file or file structure. 

Syntax
fuser [ -c | -d | -f ] [ -k ] [ -u ] [ -x ] [ -V ]File ... 


Description
The fuser command lists the process numbers of local processes that use the local or remote files 
specified by the File parameter. For block special devices, the command lists the processes that use 
any file on that device.


Flags

-c Reports on any open files in the file system containing File. 
-d Implies the use of the -c and -x flags. Reports on any open files which haved been unlinked from the file system 
   (deleted from the parent directory). When used in conjunction with the -V flag, it also reports the inode number 
   and size of the deleted file.  
-f Reports on open instances of File only. 
-k Sends the SIGKILL signal to each local process. Only the root user can kill a process of another user.  
-u Provides the login name for local processes in parentheses after the process number. 
-V Provides verbose output. 
-x Used in conjunction with -c or -f, reports on executable and loadable objects in addition to the standard fuser output. 


To list the process numbers of local processes using the /etc/passwd file, enter: 
# fuser /etc/passwd

To list the process numbers and user login names of processes using the /etc/filesystems file, enter: 
# fuser -u /etc/filesystems

To terminate all of the processes using a given file system, enter: 
#fuser -k -x -u /dev/hd1 -OR-
#fuser -kxuc /home

Either command lists the process number and user name, and then terminates each process that is using 
the /dev/hd1 (/home) file system. Only the root user can terminate processes that belong to another user. 
You might want to use this command if you are trying to unmount the /dev/hd1 file system and a process 
that is accessing the /dev/hd1 file system prevents this.

To list all processes that are using a file which has been deleted from a given file system, enter: 
# fuser -d /usr



Examples on linux distro's:

- To kill all processes accessing the file system /home in any way.
# fuser  -km /home 

- invokes something if no other process is using /dev/ttyS1.       
if fuser -s /dev/ttyS1; then :; else something; fi 

- shows all processes at the (local) TELNET port.       
# fuser telnet/tcp 

A similar command is the lsof command.


2.2.6 Starting and stopping NFS:
================================

Short note on stopping and starting NFS. See other sections for more detail.

On all unixes, a number of daemons should be running in order for NFS to be functional, like for example
the rpc.* processes, biod, nfsd and others.

Once nfs is running, and in order to actually "share" or "export" your filesystem on your server, so remote clients 
are able to mount the nfs mount, in most cases you should edit the "/etc/exports" file.
See other sections in this document (search on exportfs) on how to accomplish this.

-- AIX:

The following subsystems are part of the nfs group: nfsd, biod, rpc.lockd, rpc.statd, and rpc.mountd. 
The nfs subsystem (group) is under control of the "resource controller", so starting and stopping nfs
is actually easy

# startsrc -g nfs
# stopsrc -g nfs

Or use smitty.


-- Redhat Linux:
# /sbin/service nfs restart
# /sbin/service nfs start
# /sbin/service nfs stop

-- On some other Linux distros
# /etc/init.d/nfs start 
# /etc/init.d/nfs stop
# /etc/init.d/nfs restart


-- Solaris:
If the nfs daemons aren't running, then you will need to run:
# /etc/init.d/nfs.server start 


-- HP-UX:
Issue the following command on the NFS server to start all the necessary NFS processes (HP): 
# /sbin/init.d/nfs.server start 
 
Or if your machine is only a client:

# cd /sbin/init.d
# ./nfs.client start




===========================================
3. Change ownership file/dir, adding users:
===========================================

3.1 Changing ownership:
-----------------------

chown -R user[:group] file/dir        (SVR4)
chown -R user[.group] file/dir        (bsd)

(-R recursive dirs)

Examples:
chown -R oracle:oinstall /opt/u01
chown -R oracle:oinstall /opt/u02
chown -R oracle:oinstall /opt/u03
chown -R oracle:oinstall /opt/u04

-R means all subdirs also.

chown rjanssen file.txt             - Give permissions as owner to user rjanssen. 


#  groupadd dba
#  useradd oracle
#  mkdir /usr/oracle
#  mkdir /usr/oracle/9.0
#  chown -R oracle:dba /usr/oracle
#  touch /etc/oratab
#  chown oracle:dba /etc/oratab


Note: Not owner message:
------------------------

>>> Solaris:

it is possible to turn the chown command on or off (i.e., allow it to be used or disallow its use) on a system by 
altering the /etc/system file. The /etc/system file, along with the files in /etc/default should be thought of a 
"system policy files" -- files that allow the systems administrator to determine such things as whether 
root can login over the network, whether su commands are logged, and whether a regular user can change ownership of his own files. 

On a system disallowing a user to change ownership of his files (this is now the default), the value of rstchown is set to 1. 
Think of this as saying "restrict chown is set to TRUE". You might see a line like this in /etc/system (or no rstchown value at all): 

set rstchown=1 

On a system allowing chown by regular users, this value will be set to 0 as shown here: 

set rstchown=0 

Whenever the /etc/system file is changed, the system will have to be rebooted for the changes to take effect. 
Since there is no daemon process associated with commands such a chown, there is no process that one could send 
a hangup (HUP) to effect the change in policy "on the fly". 

Why might system administrators restrict access to the chown command? For a system on which disk quotas are enforced,
 they might not want to allow files to be "assigned" by one user to another user's quota. More importantly, 
for a system on which accountability is deemed important, system administrators will want to know who 
created each file on a system - whether to track down a potential system abuse or simply to ask if a file that is 
occupying space in a shared directory or in /tmp can be removed. 

When a system disallows use of the chown command, you can expect to see dialog like this: 

% chown wallace myfile
chown: xyz: Not owner 

Though it would be possible to disallow "chowning" of files by changing permissions on /usr/bin/chown, 
such a change would not slow down most Unix users. They would simple copy the /usr/bin/chown file to their own directory 
and make their copy executable. Designed to be extensible, Unix will happily comply. Making the change in the /etc/system 
file blocks any chown operation from taking effect, regardless of where the executable is stored, who owns it, 
and what it is called. If usage of chown is restricted in /etc/system, only the superuser can change ownership of files. 


 


3.2 Add a user in Solaris:
--------------------------

Examples:

# useradd -u 3000 -g other -d /export/home/tempusr -m -s /bin/ksh -c "temporary user" tempusr
# useradd -u 1002 -g dba -d /export/home/avdsel -m -s /bin/ksh -c "Albert van der Sel" avdsel
# useradd -u 1001 -g oinstall -G dba -d /export/home/oraclown -m -s /bin/ksh -c "Oracle owner" oraclown
# useradd -u 1005 -g oinstall -G dba -d /export/home/brighta -m -s /bin/ksh -c "Bright Alley" brighta

useradd -u 300 -g staff -G staff -d /home/emc -m -s /usr/bin/ksh -c "EMC user" emc

a password cannot be specified using the useradd command. 
Use passwd to give the user a password:

# passwd tempusr

UID must be unique and is typically a number between 100 and 60002
GID is a number between 0 and 60002

Or use the graphical "admintool" or smc, the solaris management console.


-- Profiles a user can use to set the environment:

1. Korn Shell ksh:
------------------

When the POSIX or Korn Shell is your login shell, it looks for these following files and executes them, if they exist:

/etc/profile
This default system file is executed by the shell program and sets up default environment variables.

.profile
If this file exists in your home directory, it is executed next at login.

At any time-this includes login time-the POSIX or Korn Shell is invoked, it looks for the file referenced by the following shell variable, 
and executes it, if it exists:

ENV
When you invoke the shell, it looks for a shell variable called ENV which is usually set in your .profile. ENV is evaluated and if it is set 
to an existing file, that file is executed. By convention, ENV is usually set to .kshrc but may be set to any file name.

These files provide the means for customizing the shell environment to fit your needs.


2. Bourne Shell sh:
-------------------

it looks for these following files and executes them, if they exist:

/etc/profile

.profle in the home directory, for example "/home/user1/.profile"


3.3 Add a user in AIX:
----------------------

You can also use the useradd command, just as in Solaris.
Or use the native "mkuser" command.

# mkuser albert

The mkuser command does not create password information for a user. It initializes the password field 
with an * (asterisk). Later, this field is set with the passwd or pwdadm command. 
New accounts are disabled until the passwd or pwdadm commands are used to add authentication 
information to the /etc/security/passwd file.

You can use the Users application in Web-based System Manager to change user characteristics. You could also 
use the System Management Interface Tool (SMIT) "smit mkuser" fast path to run this command.

The /usr/lib/security/mkuser.default file contains the default attributes for new users. 
This file is an ASCII file that contains user stanzas. These stanzas have attribute default values 
for users created by the mkuser command. Each attribute has the Attribute=Value form. If an attribute 
has a value of $USER, the mkuser command substitutes the name of the user. The end of each attribute pair 
and stanza is marked by a new-line character.

There are two stanzas, user and admin, that can contain all defined attributes except the id and admin attributes. 
The mkuser command generates a unique id attribute. The admin attribute depends on whether the -a flag is used with 
the mkuser command.

A typical user stanza looks like the following:

user:
   pgroup = staff
   groups = staff
   shell = /usr/bin/ksh
   home = /home/$USER
   auth1 = SYSTEM

# mkuser [ -de | -sr ] [-attr Attributes=Value [ Attribute=Value... ] ] Name
# mkuser [ -R load_module ] [ -a ] [ Attribute=Value ... ] Name



To create the davis user account with the default values in the /usr/lib/security/mkuser.default file, type: 
# mkuser davis

To create the davis account with davis as an administrator, type: 
# mkuser -a davis

Only the root user or users with the UserAdmin authorization can create davis as an administrative user.

To create the davis user account and set the su attribute to a value of false, type: 
# mkuser su=false davis

To create the davis user account that is identified and authenticated through the LDAP load module, type: 
# mkuser -R LDAP davis


To add davis to the groups finance and accounting, enter: 
chuser groups=finance,accounting davis 

-- Add a user with the smit utility:
-- ---------------------------------
Start SMIT by entering

smit <Enter>

  From the Main Menu, make the following selections:

  -Security and Users 
    -Users 
      -Add a User to the System

The utility displays a form for adding new user information. Use the <Up-arrow> and <Down-arrow> keys to move through 
the form. Do not use <Enter> until you are finished and ready to exit the screen.
Fill in the appropriate fields of the Create User form (as listed in Create User Form) and press <Enter>.
The utility exits the form and creates the new user.


-- Using SMIT to Create a Group:
-- -----------------------------
Use the following procedure to create a group.

Start SMIT by entering the following command:

smit <Enter>

The utility displays the Main Menu.

  From the Main Menu, make the following selections:

  -Security and Users 
    -Users 
      -Add a Group to the System

The utility displays a form for adding new group information. 
Type the group name in the Group Name field and press <Enter>.
The group name must be eight characters or less.
The utility creates the new group, automatically assigns the next available GID, and exits the form

Primary Authentication method of system:
----------------------------------------

To check whether root has a primary authentication method of SYSTEM, use the following command:
# lsuser -a auth1 root

If needed, change the value by using
# chuser auth1=SYSTEM root


3.4 Add a user in HP-UX:
------------------------

-- Example 1:

Add user john to the system with all of the default attributes.

# useradd john

Add the user john to the system with a UID of 222 and a primary group
of staff.

# useradd -u 222 -g staff john

-- Example 2:

=> Add a user called guestuser as per following requirements
=> Primary group member of guests 
=> Secondary group member of www and accounting
=> Shell must be /usr/bin/bash3
=> Home directory must be /home/guestuser

# useradd -g guests -G www,accounting -d /home/guests -s /home/guestuser/ -m guestuser
# passwd guestuser



3.5 Add a user in Linux Redhat:
-------------------------------

You can use tools like useradd or groupadd to create new users and groups from the shell prompt. 
But an easier way to manage users and groups is through the graphical application, User Manager. 

Users are described in the /etc/passwd file
Groups are stored on Red Hat Linux in the /etc/group file. 

Or invoke the Gnome Linuxconf GUI Tool by typing "linuxconf". In Red Hat Linux, linuxconf is found in the 
/bin directory.






================================
4. Change filemode, permissions:
================================

Permissions are given to:
u = user
g = group
o = other/world
a = all

file/directory permissions (or also called "filemodes") are:
r = read
w = write
x = execute

special modes are:
X = sets execute if already set (this one is particularly sexy, look below)
s = set setuid/setgid bit
t = set sticky bit



Examples:
---------

readable by all, everyone
% chmod a+r essay.001

to remove read write and execute permissions on the file biglist for the group and others
% chmod go-rwx biglist 

make executable:
% chmod +x mycommand

set mode:
% chmod 644 filename

    rwxrwxrwx=777
    rw-rw-rw-=666
    rw-r--r--=644 corresponds to umask 022
    r-xr-xr-x=555
    rwxrwxr-x=775

1 = execute
2 = write
4 = read 

note that the total is 7
execute and read are: 1+4=5
read and write are: 2+4=6
read, write and exec: 1+2+4=7
and so on 

directories must always be executable... 

so a file with, say 640, means, the owner can read and write (4+2=6), the group can read (4) 
and everyone else has no permission to use the file (0). 

chmod -R a+X .
This command would set the executable bit (for all users) of all directories and executables 
below the current directory that presently have an execute bit set. Very helpful when you want to set 
all your binary files executable for everyone other than you without having to set the executable bit 
of all your conf files, for instance. *wink* 

chmod -R g+w .
This command would set all the contents below the current directory writable by your current group. 

chmod -R go-rwx
This command would remove permissions for group and world users without changing the bits for the file owner. 
Now you don't have to worry that 'find . -type f -exec chmod 600 {}\;' will change your binary files 
non-executable. Further, you don't need to run an additional command to chmod your directories. 

chmod u+s /usr/bin/run_me_setuid
This command would set the setuid bit of the file. It's simply easier than remembering which number to use 
when wanting to setuid/setgid, IMHO. 




========================
5. About the sticky bit:
========================


- This info is valid for most Unix OS including Solaris and AIX:
----------------------------------------------------------------

A 't' or 'T' as the last character of the "ls -l" mode characters
indicates that the "sticky" (save text image) bit is set. See ls(1) for
an explanation the distinction between 't' and 'T'.

The sticky bit has a different meaning, depending on the type of file it
is set on...

sticky bit on directories
-------------------------
[From chmod(2)]
If the mode bit S_ISVTX (sticky bit) is set on a directory, files
inside the directory may be renamed or removed only by the owner of
the file, the owner of the directory, or the superuser (even if the
modes of the directory would otherwise allow such an operation).

[Example]
drwxrwxrwt  104 bin        bin          14336 Jun  7 00:59 /tmp

Only root is permitted to turn the sticky bit on or off. In addition the sticky bit applies to anyone 
who accesses the file. The syntax for setting the sticky bit on a dir /foo directory is as follows: 

chmod +t /foo 


sticky bit on regular files
---------------------------
[From chmod(2)]
If an executable file is prepared for sharing, mode bit S_ISVTX prevents
the system from abandoning the swap-space image of the program-text
portion of the file when its last user terminates.  Then, when the next
user of the file executes it, the text need not be read from the file
system but can simply be swapped in, thus saving time.

[From HP-UX Kernel Tuning and Performance Guide]
Local paging. When applications are located remotely, set the "sticky
bit"
on the applications binaries, using the chmod +t command. This tells the
system to page the text to the local disk. Otherwise, it is "retrieved"
across the network. Of course, this would only apply when there is actual
paging occurring. More recently, there is a kernel parameter,
page_text_to_local, which when set to 1, will tell the kernel to page all
NFS executable text pages to local swap space.

[Example]
-r-xr-xr-t   6 bin        bin         24111111111664 Nov 14  2000
/usr/bin/vi


Solaris:
--------

The sticky bit on a directory is a permission bit that protects files within that directory. 
If the directory has the sticky bit set, only the owner of the file, the owner of the directory, 
or root can delete the file. The sticky bit prevents a user from deleting other users' files from 
public directories, such as uucppublic:

castle% ls -l /var/spool/uucppublic
drwxrwxrwt   2 uucp     uucp         512 Sep 10 18:06 uucppublic
castle%

When you set up a public directory on a TMPFS temporary file system, make sure that you set the sticky bit manually. 

You can set sticky bit permissions by using the chmod command to assign the octal value 1 as the first number 
in a series of four octal values. Use the following steps to set the sticky bit on a directory:

1.  If you are not the owner of the file or directory, become superuser. 
2.  Type chmod <1nnn> <filename> and press Return. 
3.  Type ls -l <filename> and press Return to verify that the permissions of the file have changed. 
The following example sets the sticky bit permission on the pubdir directory:

castle% chmod 1777 pubdir
castle% ls -l pubdir
drwxrwxrwt   2 winsor    staff    512 Jul 15 21:23 pubdir
castle%



================
6. About SETUID:
================

Each process has three user ID's: 
the real user ID (ruid)
the effective user ID (euid) and
the saved user ID (suid)

The real user ID identifies the owner of the process, the effective uid is used in most
access control decisions, and the saved uid stores a previous user ID so that it
can be restored later.
Similar, a process has three group ID's.

When a process is created by fork, it inherits the three uid's from the parent process.
When a process executes a new file by exec..., it keeps its three uid's unless the
set-user-ID bit of the new file is set, in which case the effective uid and saved uid
are assigned the user ID of the owner of the new file.


When setuid (set-user identification) permission is set on an executable file, a process that runs this file 
is granted access based on the owner of the file (usually root), rather than the user who created the process. 
This permission enables a user to access files and directories that are normally available only to the owner.

The setuid permission is shown as an s in the file permissions. 
For example, the setuid permission on the passwd command enables a user to change passwords, 
assuming the permissions of the root ID are the following:

castle% ls -l /usr/bin/passwd
-r-sr-sr-x   3 root     sys        96796 Jul 15 21:23 /usr/bin/passwd
castle%

You setuid permissions by using the chmod command to assign the octal value 4 as the first number 
in a series of four octal values. Use the following steps to setuid permissions:

1.  If you are not the owner of the file or directory, become superuser. 
2.  Type chmod <4nnn> <filename> and press Return. 
3.  Type ls -l <filename> and press Return to verify that the permissions of the file have changed. 

The following example sets setuid permission on the myprog file:

#chmod 4555 myprog
-r-sr-xr-x   1 winsor    staff    12796 Jul 15 21:23 myprog
#


The setgid (set-group identification) permission is similar to setuid, except that the effective group ID 
for the process is changed to the group owner of the file and a user is granted access based on permissions 
granted to that group. The /usr/bin/mail program has setgid permissions:

castle% ls -l /usr/bin/mail
-r-x-s-x   1 bin      mail       64376 Jul 15 21:27 /usr/bin/mail
castle%

When setgid permission is applied to a directory, files subsequently created in the directory belong to the group 
the directory belongs to, not to the group the creating process belongs to. Any user who has write permission 
in the directory can create a file there; however, the file does not belong to the group of the user, 
but instead belongs to the group of the directory.

You can set setgid permissions by using the chmod command to assign the octal value 2 as the first number 
in a series of four octal values. Use the following steps to set setgid permissions:

1.  If you are not the owner of the file or directory, become superuser. 
2.  Type chmod <2nnn> <filename> and press Return. 
3.  Type ls -l <filename> and press Return to verify that the permissions of the file have changed. 
The following example sets setuid permission on the myprog2 file:

#chmod 2551 myprog2
#ls -l myprog2
-r-xr-s-x   1 winsor    staff  26876 Jul 15 21:23 myprog2
#


=========================
7. Find command examples:
=========================

Introduction 
The find command allows the Unix user to process a set of files and/or directories in a file subtree. 

You can specify the following: 

where to search (pathname) 
what type of file to search for (-type: directories, data files, links) 
how to process the files        (-exec: run a process against a selected file) 
the name of the file(s)         (-name) 
perform logical operations on selections (-o and -a) 
Search for file with a specific name in a set of files (-name) 


EXAMPLES
--------

# find . -name "rc.conf" -print 

This command will search in the current directory and all sub directories for a file named rc.conf. 
Note: The -print option will print out the path of any file that is found with that name. In general -print wil 
print out the path of any file that meets the find criteria. 

# find . -name "rc.conf" -exec chmod o+r '{}' \; 

This command will search in the current directory and all sub directories. All files named rc.conf will be processed 
by the chmod -o+r command. The argument '{}' inserts each found file into the chmod command line. 
The \; argument indicates the exec command line has ended. 
The end results of this command is all rc.conf files have the other permissions set to read access
(if the operator is the owner of the file). 


How to find text in a set of files:
-----------------------------------

# find . -exec grep "www.athabasca" '{}' \; -print 

This command will search in the current directory and all sub directories. 
All files that contain the string will have their path printed to standard output. 

# find .  -exec grep "CI_ADJ_TYPE" {} \; -print

This command search all subdirs all files to find text CI_ADJ_TYPE


How to find files of certain size:
----------------------------------

# find / -xdev -size +2048 -ls | sort -r +6 
# find . -xdev -size +2048 -ls | sort -r +6 

This command will find all files in the root directory larger than 1 MB.


How to find files between dates:
--------------------------------

thread 1:
---------

olddate="200407010001"
newdat="200407312359"
touch -t $olddate ./tmpoldfile
touch -t $newdat ./tmpnewfile
find /path/to/directory -type f  -newer a ./tmpoldfile ! -newer  a ./tmpnewfile

the "-newer a " means access time, you can use "-newer m " for modify time


thread 2:
---------

Touch 2 files, start_date and stop_date, like this:
$ touch -t 200603290000.00 start_date
$ touch -t 200603290030.00 stop_date

Ok, start_date is 03/29/06 midnight, stop_date is 03/29/06 30 minutes after midnight. You might want to do a ls -al to check.

On to find, you can find -newer and then ! -newer, like this:
$ find /dir -newer start_date ! -newer stop_date -print

Combine that with ls -l, you get:
$ find /dir -newer start_date ! -newer stop_date -print0 | xargs -0 ls -l

(Or you can try -exec to execute ls -l. I am not sure of the format, so you have to muck around a little bit)

HTH
.

thread 3:
---------

ls -lrtR | awk '{print $6$7"\t"$9}'|grep Nov


thread 4:
---------

1) between 2 dates (say 15 Aug 08 to 31 Aug 08)

touch -t 20080150000 /tmp/start
touch -t 20080831000 /tmp/finish
find / -size +10k -newer /tmp/start -a ! -newer /tmp/finish


2) later than a specified date (say 25 Aug 08)

touch -t 20080250000 /tmp/ref
find / -size +10k ! -newer /tmp/ref



Other examples:
---------------
# find . -name file -print
# find / -name $1 -exec ls -l {} \;

# find / -user nep -exec ls -l {} \; >nepfiles.txt
In English: search from the root directory for any files owned by nep 
and execute an ls -l on the file when any are found. 
Capture all output in nepfiles.txt.

# find $HOME -name \*.txt -print
In order to protect the asterisk from being expanded by the shell, 
it is necessary to use a backslash to escape the asterisk as in:

# find / -atime +30 -print
This prints files that have not been accessed in the last 30 days

# find / -atime +100 -size +500000c -print
The find search criteria can be combined. This command will locate and list all files 
that were last accessed more than 100 days ago, and whose size exceeds 500,000 bytes.

# find /opt/bene/process/logs -name 'ALBRACHT*'  -mtime +90 -exec rm {} \;

# find /example /new/example -exec grep -l 'Where are you' {} \;
# find / \( -name a.out -o -name '*.o' \) -atime +7 -exec rm {} \;
# find . -name '*.trc' -mtime +3 -exec rm {} \;
# find / -fsonly hfs -print
# cd /; find . ! -path ./Disk -only -print | cpio -pdxm /Disk
# cd /; find . -path ./Disk -prune -o -print | cpio -pdxm /Disk
# cd /;  find . -xdev -print | cpio -pdm /Disk
# find  -type f -print | xargs chmod 444
# find  -type d -print | xargs chmod 555
# find . -atime +1 -name '*' -exec rm -f {} \; 
# find /tmp -atime +1 -name '*' -exec rm -f {} \; 
# find /usr/tmp -atime +1 -name '*' -exec rm -f {} \; 
# find / -name core -exec rm -f {} \; 
# find . -name "*.dbf" -mtime -2 -exec ls {} \;


* Search and list all files from current directory and down for the string ABC:
find ./ -name "*" -exec grep -H ABC {} \;
find ./ -type f -print | xargs grep -H "ABC" /dev/null
egrep -r ABC *
* Find all files of a given type from current directory on down:
find ./ -name "*.conf" -print
* Find all user files larger than 5Mb:
find /home -size +5000000c -print
* Find all files owned by a user (defined by user id number. see /etc/passwd) on the system: (could take a very long time)
find / -user 501 -print
* Find all files created or updated in the last five minutes: (Great for finding effects of make install)
find / -cmin -5
* Find all users in group 20 and change them to group 102: (execute as root)
find / -group 20 -exec chown :102 {} \;
* Find all suid and setgid executables:
find / \( -perm -4000 -o -perm -2000 \) -type f -exec ls -ldb {} \;
find / -type f -perm +6000 -ls


Example:
--------

cd /database/oradata/pegacc/archive
archdir=`pwd`
if [ $archdir=="/database/oradata/pegacc/archive" ]
   then
      find . -name "*.dbf" -mtime +5 -exec rm {} \;
   else
      echo "error in onderhoud PEGACC archives" >> /opt/app/oracle/admin/log/archmaint.log
fi


Example:
--------

The following example shows how to find files larger than 400 blocks in the current directory:

# find . -size +400 -print


REAL COOL EXAMPLE:
------------------

This example could even help in recovery of a file:

In some rare cases a strangely-named file will show itself in your directory and appear to be 
un-removable with the rm command. Here is will the use of ls -li and find with its -inum [inode] 
primary does the job. 
Let's say that ls -l shows your irremovable as 

-rw-------  1 smith  smith  0 Feb  1 09:22 ?*?*P

Type: 

ls -li

to get the index node, or inode. 

153805 -rw-------  1 smith  smith  0 Feb  1 09:22 ?*?^P

The inode for this file is 153805. Use find -inum [inode] to make sure that the file is correctly identified. 


%  find -inum 153805 -print
./?*?*P

Here, we see that it is. Then used the -exec functionality to do the remove. . 
  
% find . -inum 153805 -print -exec /bin/rm {} \;

Note that if this strangely named file were not of zero-length, it might contain accidentally misplaced 
and wanted data. Then you might want to determine what kind of data the file contains and move the file 
to some temporary directory for further investigation, for example: 

% find . -inum 153805 -print -exec /bin/mv {} unknown.file \;

Will rename the file to unknown.file, so you can easily inspect it. 


COOL EXAMPLE: Using find and cpio to create really good backups:
----------------------------------------------------------------

Suppose you have a lot of subdirs and files in "/dir1/dira"
Now you want to copy, or backup, this to "/dir2/dirb"
And not only just the files and subdirs, BUT ALSO all filemodes (permissions), ownership information, acl's etc..

Then DO NOT USE "cp -R" or something similar. Instead use "find" in combination with the "cpio" backup command.

# cd /dir1/dira
# find . | cpio -pvdm /dir2/dirb


Note: difference betweeen mtime and atime:
------------------------------------------

In using the find command where you want to delete files older than a certain date, you can use
commands like
find . -name "*.log" -mtime +30 -exec rm {} \;   or
find . -name "*.dbf" -atime +30 -exec rm {} \;

Why should you choose, or not choose, between atime and mtime?

It is important to distinguish between a file or directory's change time (ctime), access time (atime), 
and modify time (mtime).

ctime -- In UNIX, it is not possible to tell the actual creation time of a file. The ctime--change time--
         is the time when changes were made to the file or directory's inode (owner, permissions, etc.). 
         The ctime is also updated when the contents of a file change. It is needed by the dump command 
         to determine if the file needs to be backed up. You can view the ctime with the ls -lc command.

atime -- The atime--access time--is the time when the data of a file was last accessed. Displaying the contents 
         of a file or executing a shell script will update a file's atime, for example. 

mtime -- The mtime--modify time--is the time when the actual contents of a file was last modified. 
         This is the time displayed in a long directoring listing (ls -l).

Thats why backup utilities use the mtime when performing incremental backups:
When the utility reads the data for a file that is to be included in a backup, it does not 
affect the file's modification time, but it does affect the file's access time. 

So for most practical reasons, if you want to delete logfiles (or other files) older than a certain
date, its best to use the mtime attribute.

How to make those times visible?

"ls -l"   shows atime
"ls -lc"  shows ctime
"ls -lm"  shows mtime

"istat filename" will show all three.

pago-am1:/usr/local/bb>istat bb18b3.tar.gz
Inode 20 on device 10/9 File
Protection: rw-r--r--   
Owner: 100(bb)          Group: 100(bb)
Link count:   1         Length 427247 bytes

Last updated:   Tue Aug 14 11:01:46 2001
Last modified:  Thu Jun 21 07:36:32 2001
Last accessed:  Thu Nov 01 20:38:46 2001





===================
7. Crontab command:
===================

Cron is uded to schedule or run periodically all sorts of executable programs or shell scripts,
like backupruns, housekeeping jobs etc..
The crond daemon makes it all happen.

Who has access to cron, is on most unixes determined by the "cron.allow" and "cron.deny" files.
Every allowed user, can have it's own "crontab" file.
The crontab of root, is typically used for system administrative jobs.

On most unixes the relevant files can be found in:
/var/spool/cron/crontabs     or 
/var/adm/cron                or 
/etc/cron.d

For example, on Solaris the /var/adm/cron/cron.allow and /var/adm/cron/cron.deny files control 
which users can use the crontab command. 

Most common usage:

- if you just want a listing:     crontab -l
- if you want to edit and change: crontab -e

crontab [ -e | -l | -r | -v | File ]
 
-e: edit, submit  -r remove, -l list

A crontab file contains entries for each cron job. Entries are separated by newline characters. 
Each crontab file entry contains six fields separated by spaces or tabs in the following form:

 
  minute  hour  day_of_month  month  weekday  command

  0       0     *             8       *       /u/harry/bin/maintenance


Notes:
------

Note 1: start and stop cron:
----------------------------

-- Solaris and some other unixes:

The proper way to stop and restart cron are:

# /etc/init.d/cron stop
# /etc/init.d/cron start

In Solaris 10 you could use the following command as well:
# svcadm refresh cron
# svcadm restart cron

-- Other way to restart cron:

In most unixes, cron is started by init and there is a record in the /etc/initab file
which makes that happen. Check if your system has indeed a record of cron in the inittab file.
The type of start should be "respawn", which means that should the
superuser do a "kill -9 crond", the cron daemon is simply restarted again.
Again, preferrably, there should be a stop and start script to restart cron.

Especially on AIX, there is no true way to restart cron in a neat way. Not via the Recourse Control startscr command, 
or script, a standard method is available. Just kill crond and it will be restarted.

-- On many linux distros:
 
to restart the cron daemon, you could do either a "service crond restart" or a "service 
crond reload". 


Note 2:
-------

Create a cronjobs file
You can do this on your local computer in Notepad or you can create the file directly on 
your Virtual Server using your favorite UNIX text editor (pico, vi, etc). 
Your file should contain the following entries: 

    MAILTO="USER@YOUR-DOMAIN.NAME"
    0 1 1 1-12/3 *   /usr/local/bin/vnukelog


This will run the command "/usr/local/bin/vnukelog" (which clears all of your log files) at 
1 AM on the first day of the first month of every quarter, or January, April, July, and October (1-12/3). 
Obviously, you will need to substitute a valid e-mail address in the place of "USER@YOUR-DOMAIN.NAME". 

If you have created this file on your local computer, 
FTP the file up to your Virtual Server and store it in your home directory under the name 
"cronjobs" (you can actually use any name you would like). 


Register your cronjobs file with the system
After you have created your cronjobs file (and have uploaded it to your Virtual Server if applicable), 
you need to Telnet to your server and register the file with the cron system daemon. To do this, simply type: 
    crontab cronjobs 

Or if you used a name other than "cronjobs", substitute the name you selected for the occurrence of "cronjobs" above. 


Note 3:
-------
# use /bin/sh to run commands, no matter what /etc/passwd says
SHELL=/bin/sh
# mail any output to `paul', no matter whose crontab this is
MAILTO=paul
#
# run five minutes after midnight, every day
5 6-18 * * *       /opt/app/oracle/admin/scripts/grepora.sh
# run at 2:15pm on the first of every month -- output mailed to paul
15 14 1 * *     $HOME/bin/monthly
# run at 10 pm on weekdays, annoy Joe
0 22 * * 1-5   mail -s "It's 10pm" joe%Joe,%%Where are your kids?%
23 0-23/2 * * * echo "run 23 minutes after midn, 2am, 4am ..., everyday"
5 4 * * sun     echo "run at 5 after 4 every sunday"

2>&1 means:

It means that standard error is redirected along with standard output. Standard error
could be redirected to a different file, like
ls > toto.txt 2> error.txt If your shell is csh or tcsh, you would redirect standard
output and standard error like this
lt >& toto.txt Csh or tcsh cannot redirect standard error separately.

Note 4:
-------

thread

Q:

> Isn't there a way to refresh cron to pick up changes made using 
> crontab -e? I made the changes but the specified jobs did not run. 
> I'm thinking I need to refresh cron to pick up the changes. Is this 
> true? Thanks. 

A:

Crontab -e should do that for you, that's the whole point of using 
it rather than editing the file yourself. 
Why do you think the job didn't run? 
Post the crontab entry and the script. Give details of the version of 
Tru64 and the patch level. 
Then perhaps we can help you to figure out the real cause of the problem. 
Hope this helps 

A:

I have seen the following problem when editing the cron file for another 
user: 

crontab -e idxxxxxx 

This changed the control file, 
when I verified with crontab -l the contents was correctly shown, 
but the cron daemon did not execute the new contents. 

To solve the problem, I needed to follow the following commands: 

su - idxxxxxx 
crontab -l |crontab 

This seems to work ... since then I prefer the following 

su - idxxxxxx 
crontab -e 

which seems to work also ... 


Note 5:
-------

On AIX it is observed, that if the "daemon=" attribute of a user is set to be false,
this user cannot use crontab, even if the account is placed in cron.allow.

You need to set the attribute to "daemon=true".

* daemon        Defines whether the user can execute programs using the system
*               resource controller (SRC). Possible values: true or false.

Note 6:
-------

If you want to quick test the crontab of a user:

su - user
and put the following in the crontab of that user:

* * * * *  date >/tmp/elog

After checking the /tmp/elog file, which will rapidly fills with dates, don't forget
to remove the crontab entry shown above.


Note 7: the at and atq commands:
--------------------------------

On many unix systems the scheduling "at" command and "atq" commands are available.
With "at", you can schedule commands, and with "atq" you can view all your, or other users, scheduled tasks.

atq- Display the jobs queued to run at specified times


For example, on Solaris:

The at command is used to schedule jobs for execution at a later time. Unlike crontab, which schedules a job to happen at regular intervals, 
a job submitted with at executes once, at the designated time.

To submit an at job, type at followed by the time that you would like the program to execute. You'll see the at> prompt displayed and it's here 
that you enter the at commands. When you are finished entering the at command, press control-d to exit the at prompt 
and submit the job as shown in the following example:

# at 07:45am today
at> who > /tmp/log
at> <Press Control-d>

job 912687240.a at Thu Jun 6 07:14:00

When you submit an at job, it is assigned a job identification number, which becomes its filename along with the .a extension. 
The file is stored in the /var/spool/cron/atjobs directory. In much the same way as it schedules crontab jobs, 
the cron daemon controls the scheduling of at files.





===========================
8. Job control, background:
===========================

To put a sort job (or other job) in background:
# sort < foo > bar &

To show jobs:
# jobs

To show processes:
# ps
# ps -ef | grep ora

Job in foreground -> background:
Ctrl-Z (suspend)
#bg  or bg jobID

Job in background -> foreground:
# fg %jobid

Stop a process:
# kill -9 3535   (3535 is the pid, process id)

Stop a background process you may try this:
# kill -QUIT 3421



-- Kill all processes of a specific users:
-- --------------------------------------- 

To kill all processes of a specific user, enter: 
# ps -u [user-id] -o pid | grep -v PID | xargs kill -9 

Another way: 
Use who to check out your current users and their terminals. Kill all processes related to a specific terminal:
# fuser -k /dev/pts[#] 

Yet another method: 
Su to the user-id you wish to kill all processes of and enter:
# su - [user-id] -c kill -9 -1 

Or su - to that userid, and use the killall command, which is available on most unix'es, like for example AIX.
# killall


So in order to kill all processes of a user:

# kill -9 -1            # not on all unixes

or

# killall               # not on all unixes



The nohup command:
------------------

When working with the UNIX operating system, there will be times when you will want to run commands that are immune 
to log outs or unplanned login session terminations.  This is especially true for UNIX system administrators.  
The UNIX command for handling this job is the nohup (no hangup) command.

Normally when you log out, or your session terminates unexpectedly, the system will kill all processes you have started.  
Starting a command with nohup counters this by arranging for all stopped, running, and background jobs to ignore 
the SIGHUP signal.

The syntax for nohup is:
nohup command [arguments]
 
You may optionally add an ampersand to the end of the command line to run the job in the background:
nohup command [arguments] &

If you do not redirect output from a process kicked off with nohup, both standard output (stdout) and 
standard error (stderr) are sent to a file named nohup.out.  This file will be created in $HOME (your home directory) 
if it cannot be created in the working directory.  Real-time monitoring of what is being written to nohup.out 
can be accomplished with the "tail -f nohup.out" command.

Although the nohup command is extremely valuable to UNIX system administrators, it is also a must-know tool 
for others who run lengthy or critical processes on UNIX systems 

The nohup command runs the command specified by the Command parameter and any related Arg parameters, 
ignoring all hangup (SIGHUP) signals. Use the nohup command to run programs in the background after logging off. 
To run a nohup command in the background, add an & (ampersand) to the end of the command.

Whether or not the nohup command output is redirected to a terminal, the output is appended to the nohup.out file 
in the current directory. If the nohup.out file is not writable in the current directory, the output is redirected 
to the $HOME/nohup.out file. If neither file can be created nor opened for appending, the command specified 
by the Command parameter is not invoked. If the standard error is a terminal, all output written by the 
named command to its standard error is redirected to the same file descriptor as the standard output.

To run a command in the background after you log off, enter: 
$ nohup find / -print &

After you enter this command, the following is displayed: 
670
$ Sending output to nohup.out
The process ID number changes to that of the background process started by & (ampersand). The message Sending 
output to nohup.out informs you that the output from the find / -print command is in the nohup.out file. 
You can log off after you see these messages, even if the find command is still running. 

Example of ps -ef on a AIX5 system:

[LP 1]root@ol16u209:ps -ef
     UID   PID  PPID   C    STIME    TTY  TIME CMD
    root     1     0   0   Oct 17      -  0:00 /etc/init
    root  4198     1   0   Oct 17      -  0:00 /usr/lib/errdemon
    root  5808     1   0   Oct 17      -  1:15 /usr/sbin/syncd 60
  oracle  6880     1   0 10:27:26      -  0:00 ora_lgwr_SPLDEV1
    root  6966     1   0   Oct 17      -  0:00 /usr/ccs/bin/shlap
    root  7942 43364   0   Oct 17      -  0:00 sendmail: accepting connections
 alberts  9036  9864   0 20:41:49      -  0:00 sshd: alberts@pts/0
    root  9864 44426   0 20:40:21      -  0:00 sshd: alberts [priv]
    root 27272 36280   1 20:48:03  pts/0  0:00 ps -ef
  oracle 27856     1   0 10:27:26      -  0:01 ora_smon_SPLDEV1
  oracle 31738     1   0 10:27:26      -  0:00 ora_dbw0_SPLDEV1
  oracle 31756     1   0 10:27:26      -  0:00 ora_reco_SPLDEV1
 alberts 32542  9036   0 20:41:49  pts/0  0:00 -ksh
 maestro 33480 34394   0 05:59:45      -  0:00 /prj/maestro/maestro/bin/batchman -parm 32000
    root 34232 33480   0 05:59:45      -  0:00 /prj/maestro/maestro/bin/jobman
 maestro 34394 45436   0 05:59:45      -  0:00 /prj/maestro/maestro/bin/mailman -parm 32000 -- 2002 OL16U209 CONMAN UNIX 6.
    root 34708     1   0 13:55:51   lft0  0:00 /usr/sbin/getty /dev/console
  oracle 35364     1   0 10:27:26      -  0:01 ora_cjq0_SPLDEV1
  oracle 35660     1   0 10:27:26      -  0:04 ora_pmon_SPLDEV1
    root 36280 32542   0 20:45:06  pts/0  0:00 -ksh
    root 36382 43364   0   Oct 17      -  0:00 /usr/sbin/rsct/bin/IBM.ServiceRMd
    root 36642 43364   0   Oct 17      -  0:00 /usr/sbin/rsct/bin/IBM.CSMAgentRMd
    root 36912 43364   0   Oct 17      -  0:03 /usr/opt/ifor/bin/i4lmd -l /var/ifor/logdb -n clwts
    root 37186 43364   0   Oct 17      -  0:00 /etc/ncs/llbd
    root 37434 43364   0   Oct 17      -  0:17 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg
    root 37738 37434   0   Oct 17      -  0:00 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg
    root 37946     1   0   Oct 17      -  0:00 /opt/hitachi/HNTRLib2/bin/hntr2mon -d
  oracle 38194     1   0   Oct 17      -  0:00 /prj/oracle/product/9.2.0.3/bin/tnslsnr LISTENER -inherit
    root 38468 43364   0   Oct 17      -  0:00 /usr/sbin/rsct/bin/IBM.AuditRMd
    root 38716     1   0   Oct 17      -  0:00 /usr/bin/itesmdem itesrv.ini /etc/IMNSearch/search/
  imnadm 39220     1   0   Oct 17      -  0:00 /usr/IMNSearch/httpdlite/httpdlite -r /etc/IMNSearch/httpdlite/httpdlite.con
    root 39504 36912   0   Oct 17      -  0:00 /usr/opt/ifor/bin/i4lmd -l /var/ifor/logdb -n clwts
    root 39738 43364   0   Oct 17      -  0:01 /usr/DynamicLinkManager/bin/dlmmgr
    root 40512 43364   0   Oct 17      -  0:01 /usr/sbin/rsct/bin/rmcd -r
    root 40784 43364   0   Oct 17      -  0:00 /usr/sbin/rsct/bin/IBM.ERrmd
    root 41062     1   0   Oct 17      -  0:00 /usr/sbin/cron
     was 41306     1   0   Oct 17      -  2:10 /prj/was/java/bin/java -Xmx256m -Dwas.status.socket=32776 -Xms50m -Xbootclas
  oracle 42400     1   0 10:27:26      -  0:02 ora_ckpt_SPLDEV1
    root 42838     1   0   Oct 17      -  0:00 /usr/sbin/uprintfd
    root 43226 43364   0   Oct 17      -  0:00 /usr/sbin/nfsd 3891
    root 43364     1   0   Oct 17      -  0:00 /usr/sbin/srcmstr
    root 43920 43364   0   Oct 17      -  0:00 /usr/sbin/aixmibd
    root 44426 43364   0   Oct 17      -  0:00 /usr/sbin/sshd -D
    root 44668 43364   0   Oct 17      -  0:00 /usr/sbin/portmap
    root 44942 43364   0   Oct 17      -  0:00 /usr/sbin/snmpd
    root 45176 43364   0   Oct 17      -  0:00 /usr/sbin/snmpmibd
 maestro 45436     1   0   Oct 17      -  0:00 /prj/maestro/maestro/bin/netman
    root 45722 43364   0   Oct 17      -  0:00 /usr/sbin/inetd
    root 45940 43364   0   Oct 17      -  0:00 /usr/sbin/muxatmd
    root 46472 43364   0   Oct 17      -  0:00 /usr/sbin/hostmibd
    root 46780 43364   0   Oct 17      -  0:00 /etc/ncs/glbd
    root 46980 43364   0   Oct 17      -  0:00 /usr/sbin/qdaemon
    root 47294     1   0   Oct 17      -  0:00 /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf
    root 47484 43364   0   Oct 17      -  0:00 /usr/sbin/rpc.lockd
  daemon 48014 43364   0   Oct 17      -  0:00 /usr/sbin/rpc.statd
    root 48256 43364   0   Oct 17      -  0:00 /usr/sbin/rpc.mountd
    root 48774 43364   0   Oct 17      -  0:00 /usr/sbin/biod 6
    root 49058 43364   0   Oct 17      -  0:00 /usr/sbin/writesrv
[LP 1]root@ol16u209:


Another example of ps -ef on a AIX5 system:
# ps -ef

     UID     PID    PPID   C    STIME    TTY  TIME CMD
    root       1       0   0   Jan 23      -  0:33 /etc/init
    root   69706       1   0   Jan 23      -  0:00 /usr/lib/errdemon
    root   81940       1   0   Jan 23      -  0:00 /usr/sbin/srcmstr
    root   86120       1   2   Jan 23      - 236:39 /usr/sbin/syncd 60
    root   98414       1   0   Jan 23      -  0:00 /usr/ccs/bin/shlap64
    root  114802   81940   0   Jan 23      -  0:32 /usr/sbin/rsct/bin/IBM.CSMAgentRMd
    root  135366   81940   0   Jan 23      -  0:00 /usr/sbin/sshd -D
    root  139446   81940   0   Jan 23      -  0:07 /usr/sbin/rsct/bin/rmcd -r
    root  143438       1   0   Jan 23      -  0:00 /usr/sbin/uprintfd
    root  147694       1   0   Jan 23      -  0:26 /usr/sbin/cron
    root  155736       1   0   Jan 23      -  0:00 /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf
    root  163996   81940   0   Jan 23      -  0:00 /usr/sbin/rsct/bin/IBM.ERrmd
    root  180226   81940   0   Jan 23      -  0:00 /usr/sbin/rsct/bin/IBM.ServiceRMd
    root  184406   81940   0   Jan 23      -  0:00 /usr/sbin/qdaemon
    root  200806       1   0   Jan 23      -  0:08 /opt/hitachi/HNTRLib2/bin/hntr2mon -d
    root  204906   81940   0   Jan 23      -  0:00 /usr/sbin/rsct/bin/IBM.AuditRMd
    root  217200       1   0   Jan 23      -  0:00 ./mflm_manager
    root  221298   81940   0   Jan 23      -  1:41 /usr/DynamicLinkManager/bin/dlmmgr
    root  614618       1   0   Apr 03   lft0  0:00 -ksh
 reserve 1364024 1548410   0 07:10:10  pts/0  0:00 -ksh
    root 1405140 1626318   1 08:01:38  pts/0  0:00 ps -ef
    root 1511556  614618   2 07:45:52   lft0  0:41 tar -cf /dev/rmt1.1 /spl
 reserve 1548410 1613896   0 07:10:10      -  0:00 sshd: reserve@pts/0
    root 1613896  135366   0 07:10:01      -  0:00 sshd: reserve [priv]
    root 1626318 1364024   1 07:19:13  pts/0  0:00 -ksh



Some more examples:

# nohup somecommand & sleep 1; tail -f preferred-name

# nohup make bzImage & 
# tail -f nohup.out

# nohup make modules 1> modules.out 2> modules.err & 
# tail -f modules.out 



==========================================
9. Backup commands, TAR, and Zipped files:
==========================================


For SOLARIS as well as AIX, and many other unix'es, the following commands can be used:
tar, cpio, dd, gzip/gunzip, compress/uncompress, backup and restore.


Very important:
If you will backup to tape, make sure you know what is your "rewinding" class and "nonrewinding" class
of your tapedevice.


9.1 tar: Short for "Tape Archiver":
===================================

Some examples should explain the usage of "tar" to create backups, or to create 
easy to transport .tar files.

Create a backup to tape device 0hc of file sys01.dbf
# tar -cvf /dev/rmt/0hc /u01/oradata/sys01.dbf
# tar -rvf /dev/rmt/0hc /u02/oradata/data_01.dbf

-c create 
-r append 
-x extract
-v verbose
-t list

Extract the contents of example.tar and display the files as they are extracted.
# tar -xvf example.tar  

Create a tar file named backup.tar from the contents of the directory /home/ftp/pub
# tar -cf backup.tar /home/ftp/pub  

list contents of example.tar to the screen
# tar -tvf example.tar  

to restore the file /home/bcalkins/.profile from the archive:
- First we do a backup: 
# tar -cvf /dev/rmt/0 /home/bcalkins
- And later we do a restore:
# tar -xcf /dev/rmt/0 /home/bcalkins/.profile

If you use an absolute path, you can only restore in "a like" destination directory.
If you use a relative path, you can restore in any directory.
In this case, use tar with a relative pathname, for example if you want to backup /home/bcalkins
change to that directory and use

# tar -cvf backup_oracle_201105.tar ./*


To extract the directory conv:

# tar -xvf /dev/rmt0 /u02/oradata/conv

Example:
--------

mt -f /dev/rmt1  rewind
mt -f /dev/rmt1.1 fsf 6
tar -xvf /dev/rmt1.1 /data/download/expdemo.zip



Most common errors messages with tar:
-------------------------------------

-- 0511-169: A directory checksum error on media: MediaName not equal to Number

Possible Causes
From the command line, you issued the tar command to extract files from an archive that was not created 
with the tar command.

-- 0511-193: An error occurred while reading from the media

Possible Causes
You issued the tar command to read an archive from a tape device that has a different block size 
than when the archive was created.

Solution:

# chdev -l rmt0 -a block_size=0

-- File too large:




Extra note of tar command on AIX:
---------------------------------

If you need to backup multiple large mountpoints to a large tape, you might think you
can use something like:

tar -cvf /dev/rmt1 /spl
tar -rvf /dev/rmt1 /prj
tar -rvf /dev/rmt1 /opt
tar -rvf /dev/rmt1 /usr
tar -rvf /dev/rmt1 /data
tar -rvf /dev/rmt1 /backups
tar -rvf /dev/rmt1 /u01/oradata
tar -rvf /dev/rmt1 /u02/oradata
tar -rvf /dev/rmt1 /u03/oradata
tar -rvf /dev/rmt1 /u04/oradata
tar -rvf /dev/rmt1 /u05/oradata

Actually on AIX this is not OK. The tape will rewind after each tar command, effectively
you will end up with ONLY the last backupstatement.

You should use the non-rewinding class instead, like for example:

tar -cf /dev/rmt1.1 /spl
tar -cf /dev/rmt1.1 /apps
tar -cf /dev/rmt1.1 /prj
tar -cf /dev/rmt1.1 /software
tar -cf /dev/rmt1.1 /opt
tar -cf /dev/rmt1.1 /usr
tar -cf /dev/rmt1.1 /data
tar -cf /dev/rmt1.1 /backups
#tar -cf /dev/rmt1.1 /u01/oradata
#tar -cf /dev/rmt1.1 /u02/oradata
#tar -cf /dev/rmt1.1 /u03/oradata
#tar -cf /dev/rmt1.1 /u04/oradata
#tar -cf /dev/rmt1.1 /u05/oradata

Use this table to decide on which class to use:

The following table shows the names of the rmt special files and their characteristics.

Special_File	Rewind_on_Close	Retension_on_Open Density_Setting 
/dev/rmt*Yes             No                #1 
/dev/rmt*.1No              No                #1 
/dev/rmt*.2Yes             Yes               #1 
/dev/rmt*.3No              Yes               #1 
/dev/rmt*.4Yes             No                #2 
/dev/rmt*.5No              No                #2 
/dev/rmt*.6Yes             Yes               #2 
/dev/rmt*.7No              Yes               #2 



To restore an item from a logical tape, use commands as in the following example:

mt -f /dev/rmt1  rewind
mt -f /dev/rmt1.1 fsf 2  in order to put the pointer to the beginning of block 3.

mt -f /dev/rmt1.1 fsf 7  in order to put the pointer to the beginning of block 8.

Now you can use a command like for example:

tar -xvf /dev/rmt1.1 /backups/oradb/sqlnet.log

Another example:

mt -f /dev/rmt1  rewind
mt -f /dev/rmt1.1 fsf 8
tar -xvf /dev/rmt1.1 /u01/oradata/spltrain/temp01.dbf


Tapedrives on Solaris:
----------------------

Tape dvices on Solaris are named like /dev/rmt/0 or /dev/rmt/1
The default is /dev/rmt0. This also configured in the "/kernel/drv/st.conf" file.
If you need to add support for a tape device, you need to modify this file.

First tape device name: /dev/rmt/0
Second tape device name: /dev/rmt/1

You can also add special character letter to specify density using following format
/dev/rmt/ZX

Z is tape drive number such as 0,1..n 
X can be any one of following (as supported by your device, read the manual of your tape device & controller to see if all of them supported or not): 
l - Low density 
m - Medium density 
h - High density 
u - Ultra density 
c - Compressed density 
n - No rewinding 
For example to specify the first, drive with high-density with no rewinding use device /dev/rmt/0hn.


First drive, rewinding 
 /dev/rmt/0 
 
First drive, nonrewinding 
 /dev/rmt/0n 
 
Second drive, rewinding 
 /dev/rmt/1 
 
Second drive, nonrewinding 
 /dev/rmt/1n 
 




Example Backupscript on AIX:
----------------------------

#!/usr/bin/ksh

# BACKUP-SCRIPT SPL SERVER PSERIES 550
# DIT IS DE PRIMAIRE BACKUP, NAAR DE TAPEROBOT RMT1.
# OPMERKING: ER LOOPT NAAST DEZE BACKUP, OOK NOG EEN BACKUP VAN DE
# /backup DISK NAAR DE INTERNE TAPEDRIVE RMT0.

# OMDAT WE NOG NIET GEHEEL IN BEELD HEBBEN OF WE VOORAF DE BACKUP APPLICATIES MOETEN
# STOPZETTEN, IS DIT SCRIPT NOG IN REVISIE.

# VERSIE: 0.1
# DATUM : 27-12-2005
# DOEL VAN HET SCRIPT:
#   - STOPPEN VAN DE APPLICATIES
#   - VERVOLGENS BACKUP NAAR TAPE
#   - STARTEN VAN DE APPLICATIES

# CONTROLEER VOORAF OF DE TAPELIBRARY GELADEN IS VIA "/opt/backupscripts/load_lib.sh"

BACKUPLOG=/opt/backupscripts/backup_to_rmt1.log
export BACKUPLOG

DAYNAME=`date +%a`;export DAYNAME
DAYNO=`date +%d`;export DAYNO


########################################
# 1. REGISTRATIE STARTTIJD IN EEN LOG  #
########################################

echo "-----------------" >> ${BACKUPLOG}
echo "Start Backup 550:" >> ${BACKUPLOG}
date >> ${BACKUPLOG}


########################################
# 2. STOPPEN APPLICATIES               #
########################################


#STOPPEN VAN ALLE ORACLE DATABASES
su - oracle -c "/opt/backupscripts/stop_oracle.sh"
sleep 30 

#STOPPEN VAN WEBSPHERE
cd /prj/was/bin
./stopServer.sh server1 -username admin01 -password vga88nt
sleep 30 

#SHUTDOWN ETM instances:
su - cissys -c '/spl/SPLDEV1/bin/splenviron.sh -e SPLDEV1 -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLDEV2/bin/splenviron.sh -e SPLDEV2 -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLCONF/bin/splenviron.sh -e SPLCONF -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLPLAY/bin/splenviron.sh -e SPLPLAY -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLTST3/bin/splenviron.sh -e SPLTST3 -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLTST1/bin/splenviron.sh -e SPLTST1 -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLTST2/bin/splenviron.sh -e SPLTST2 -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLDEVP/bin/splenviron.sh -e SPLDEVP -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLPACK/bin/splenviron.sh -e SPLPACK -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLDEVT/bin/splenviron.sh -e SPLDEVT -c "spl.sh -t stop"'
sleep 2



#STOPPEN SSH DEMON
stopsrc -s sshd
sleep 2

date >> /opt/backupscripts/running.log
who >> /opt/backupscripts/running.log

########################################
# 3. BACKUP COMMANDS                   #
########################################


case $DAYNAME in
Tue) tapeutil -f /dev/smc0 move 256 4116
tapeutil -f /dev/smc0 move 4101 256     
;;
Wed) tapeutil -f /dev/smc0 move 256 4117
tapeutil -f /dev/smc0 move 4100 256    
;;
Thu) tapeutil -f /dev/smc0 move 256 4118
tapeutil -f /dev/smc0 move 4099 256      
;;
Fri) tapeutil -f /dev/smc0 move 256 4119
tapeutil -f /dev/smc0 move 4098 256      
;;
Sat) tapeutil -f /dev/smc0 move 256 4120
tapeutil -f /dev/smc0 move 4097 256    
;;
Mon) tapeutil -f /dev/smc0 move 256 4121
tapeutil -f /dev/smc0 move 4096 256
;;
esac

sleep 50

 
echo "Starten van de backup zelf" >> ${BACKUPLOG}
mt -f /dev/rmt1 rewind
tar -cf /dev/rmt1.1 /spl
tar -cf /dev/rmt1.1 /apps
tar -cf /dev/rmt1.1 /prj
tar -cf /dev/rmt1.1 /software
tar -cf /dev/rmt1.1 /opt
tar -cf /dev/rmt1.1 /usr
tar -cf /dev/rmt1.1 /data
tar -cf /dev/rmt1.1 /backups
tar -cf /dev/rmt1.1 /u01/oradata
tar -cf /dev/rmt1.1 /u02/oradata
tar -cf /dev/rmt1.1 /u03/oradata
tar -cf /dev/rmt1.1 /u04/oradata
tar -cf /dev/rmt1.1 /u05/oradata
tar -cf /dev/rmt1.1 /u06/oradata
tar -cf /dev/rmt1.1 /u07/oradata
tar -cf /dev/rmt1.1 /u08/oradata
tar -cf /dev/rmt1.1 /home
tar -cf /dev/rmt1.1 /backups3

sleep 10

# TIJDELIJKE ACTIE
date >> /opt/backupscripts/running.log
ps -ef | grep pmon >> /opt/backupscripts/running.log
ps -ef | grep BBL >> /opt/backupscripts/running.log
ps -ef | grep was >> /opt/backupscripts/running.log
who >> /opt/backupscripts/running.log
defragfs /prj

# EIND TIJDELIJKE ACTIE


########################################
# 4. STARTEN APPLICATIES               #
########################################

#STARTEN SSH DEMON
startsrc -s sshd
sleep 2

#STARTEN VAN ALLE ORACLE DATABASES
su - oracle -c "/opt/backupscripts/start_oracle.sh"
sleep 30

#STARTEN ETM instances:
su - cissys -c '/spl/SPLDEV1/bin/splenviron.sh -e SPLDEV1 -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLDEV2/bin/splenviron.sh -e SPLDEV2 -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLCONF/bin/splenviron.sh -e SPLCONF -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLPLAY/bin/splenviron.sh -e SPLPLAY -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLTST3/bin/splenviron.sh -e SPLTST3 -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLTST1/bin/splenviron.sh -e SPLTST1 -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLTST2/bin/splenviron.sh -e SPLTST2 -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLDEVP/bin/splenviron.sh -e SPLDEVP -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLPACK/bin/splenviron.sh -e SPLPACK -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLDEVT/bin/splenviron.sh -e SPLDEVT -c "spl.sh -t start"'
sleep 2


#STARTEN VAN WEBSPHERE
cd /prj/was/bin
./startServer.sh server1 -username admin01 -password vga88nt

sleep 30


########################################
# 5. REGISTRATIE EINDTIJD IN EEN LOG   #
########################################

#Laten we het tapenummer en einddtijd registreren in de log:

tapeutil -f /dev/smc0 inventory | head -88 | tail -2  >> ${BACKUPLOG}

echo "Einde backup 550:" >> ${BACKUPLOG}
date >> ${BACKUPLOG}



Some examples about day vars:
-----------------------------

DAYNAME=`date +%a`;export DAYNAME
echo $DAYNAME
Thu


DAYNO=`date +%d`;export DAYNO
echo $DAYNO
29

weekday=`date +%a%A`; export weekday
echo $weekday
ThuThursday

weekday=`date +%a-%A`
echo $weekday
Thu-Thursday

       %a
            Displays the locale's abbreviated weekday name.
       %A
            Displays the locale's full weekday name.
       %b
            Displays the locale's abbreviated month name.
       %B
            Displays the locale's full month name.
       %c
            Displays the locale's appropriate date and time representation. This is the default.
       %C
            Displays the first two digits of the four-digit year as a decimal number (00-99). A year is divided by 100 and truncated to an integer.
       %d
            Displays the day of the month as a decimal number (01-31). In a two-digit field, a 0 is used as leading space fill.
       %D
            Displays the date in the format equivalent to %m/%d/%y.
       %e
            Displays the day of the month as a decimal number (1-31). In a two-digit field, a blank space is used as leading space fill.



9.2 compress and uncompress:
============================

# compress -v bigfile.exe
Would compress bigfile.exe and rename that file to bigfile.exe.Z.

# uncompress *.Z            
would uncompress the files *.Z


9.3 gzip:
=========

To compress a file using gzip, execute the following command: 

# gzip filename.tar 

This will become filename.tar.gz

To decompress:

# gzip -d filename.tar.gz
# gunzip filename.tar.gz
# gzip -d users.dbf.gz


9.4 bzip2:
==========

#bzip2 filename.tar
This will become filename.tar.bz2


9.5 dd:
=======

Solaris:
--------

# dd if=<input file> of=<output file> <option=value>

to duplicate a tape:
# dd if=/dev/rmt/0 of=/dev/rmt/1

to clone a disk with the same geometry:
# dd if=/dev/rdsk/c0t1d0s2  of=/dev/rdsk/c0t4d0s2 bs=128

AIX:
----

same command syntax apply to IBM AIX. Here is an AIX pSeries machine with floppydrive example:

clone a diskette:

# dd if=/dev/fd0 of=/tmp/ddcopy
# dd if=/tmp/ddcopy of=/dev/fd0

Note: 

On Linux distros the device associated to the floppy drive is also /dev/fd0
 

9.6 cpio:
=========

solaris:
--------

cpio <mode><option>
copy-out: cpio -o
copy_in : cpio -i
pass    : cpio -p


#  cd /var/bigspace
#  cpio -idmv Linux9i_Disk1.cpio.gz
#  cpio -idmv Linux9i_Disk2.cpio.gz
#  cpio -idmv Linux9i_Disk3.cpio.gz

#  cpio -idmv < 9204_solaris_release.cpio

# cd /work
# ls -R | cpio -ocB > /dev/rmt/0

# cd /work
# cpio -icvdB < /dev/rmt/0     

d will create directories as needed
c will create header information in ascii format for portability
v verbose
c character heading in file

AIX:
----

AIX uses the same syntax. Usually, you should use the following command:

# cpio -idmv < filename.cpio


Copying directories with cpio:
------------------------------

cpio is very good in cloning directories, or making backups, because it copies files and directories
inclusive their ownership and permissions.

Example:
--------

Just cd to the directory that you want to clone and use a command similar to the following examples.

# find . -print | cpio -pdl /u/disk11/jdoe/fiber

# find . -print | cpio -pdm /a/dev

# find . -print | cpio -pdl /home/jim/newdir

# find . -print | cpio -pdmv /backups2/CONV2-0212

# find . -print | cpio -pdmv /backups2/SPLcobAS40

# find . -print | cpio -pdmv /backups2/SPLcobAS40sp2

# find . -print | cpio -pdmv /backups2/runtime/SPLTST2

The p in the flags, stands for pass-through

cd /spl/SPLDEV1
find . -print | cpio -pdmv /spl/SPLDEVT
find . -print | cpio -pdmv /backups2/data

# find . -print | cpio -pdmv /data/documentum/dmadmin/backup_1008/dba_cluster
# find . -print | cpio -pdmv /data/documentum/dmadmin/backup_1008/dmw_et3
# find . -print | cpio -pdmv /data/documentum/dmadmin/backup_1008/dmw_et
# find . -print | cpio -pdmv /data/documentum/dmadmin/backup_1508/dmw_eu
find . -print | cpio -pdmv /data/emcdctm/home2

find . -print | cpio -pdmv /data/documentum/dmadmin/backup_1809/dmw_et
find . -print | cpio -pdmv /data/documentum/dmadmin/backup_1809/dmw_et3




find . -print | cpio -pdmv /data/documentum/dmadmin/appl/l13appl
find . -print | cpio -pdmv /data/documentum/dmadmin/appl/l14appl
find . -print | cpio -pdmv /data/documentum/dmadmin/backup_3110/dmw_et
find . -print | cpio -pdmv /appl/emcdctm/dba_save_311007



Example:
--------

Use cpio copy-pass to copy a directory structure to another location:

# find path -depth -print | cpio -pamVd /new/parent/dir


Example:
--------

Become superuser or assume an equivalent role.
Change to the appropriate directory.

# cd filesystem1

Copy the directory tree from filesystem1 to filesystem2 by using a combination of the find and cpio commands. 

# find . -print -depth | cpio -pdm filesystem2


Example:
--------

Copying directories
Both cpio and tar may be used to copy directories while preserving ownership, permissions, and directory structure.

cpio example:
cd fromdir
find . | cpio -pdumv todir

tar example:
cd fromdir; tar cf - . | (cd todir; tar xfp -)

tar example over a compressed ssh tunnel:
tar cvf - fromdir | gzip -9c | ssh user@host 'cd todir; gzip -cd | tar xpf -'


Errors:
-------

Errors sometimes found with cpio:

cpio: 0511-903
cpio: 0511-904

1.Try using with -c option: cpio -imdcv < filename.cpio
 


9.7 the pax command:
====================

Same for AIX and SOLARIS.

The pax utility supports several archive formats, including tar and cpio.

The syntax for the pax command is as follows:

pax <mode> <options>

-r: Read mode .when -r is specified, pax extracts the filenames and directories found in the archive.
    The archive is read from disk or tape. If an extracted file is a directory, the hierarchy
    is extracted as well. The extracted files are created relative to the current directory.

None: List mode. When neither -r or -w is specified, pax displays the filenames and directories
      found in the archive file. The list is written to standard output.

-w: Write mode. If you want to create an archive, you use -w.
    Pax writes the contents of the file to the standard output in an archive format specified
    by the -x option.

-rw: Copy mode. When both -r and -w are specified, pax copies the specified files to
     the destination directory.


most important options:
-a = append to the end of an existing archive
-b = block size, multiple of 512 bytes
-c = you can specify filepatterns
-f = specifies the pathname of the input or output archive
-p <string> = aemo
              a does not preserve file access time
              e preserve everything: user id, group id, filemode bits, etc..
              m does not preserve file modification times
              o preserve uid and gid
              P preserve filemode bits
-x <format> = specifies the archive format. 
              
Examples:

To copy current directory contents to tape, use -w mode and -f
# pax -w -f /dev/rmt0

To list a verbose table of contents stored on tape rmt0, use None mode and f
# pax -v -f /dev/rmt0


9.8 pkzip25:
============

PKZIP Usage: 

Usage: pkzip25 [command] [options] zipfile [@list] [files...] 

Examples: 

     View .ZIP file contents: pkzip25 zipfile 

     Create a .ZIP file: pkzip25 -add zipfile file(s)... 

     Extract files from .ZIP: pkzip25 -extract zipfile 

These are only basic examples of PKZIP's capability 

About "-extract" switch:

extract  
extract files from a .ZIP file. Its a configurable switch.

-- all - all files in .ZIP file  
-- freshen - only files in the .ZIP file that exist in the target directory and that are "newer" than those files 
   will be extracted  
-- update - files in the .ZIP file which already exist in the target directory and that are "newer" than those files 
   as well as files that are "not" in the target directory will be extracted  

default = all

Example:

# pkzip25 -ext=up save.zip



9.9 SOLARIS: ufsdump and ufsrestore:
====================================

level 0 is an full backup, 1-9 are incremental backups

Examples:
---------

# ufsdump 0ucf /dev/rmt/0 /users
# ufsdump 0ucf sparc1:/dev/rmt/0 /export/home

# ufsrestore f /dev/rmt/0 filename
# ufsrestore rf sparc1:/dev/rmt/0 filename


9.10 AIX: mksysb:
================


The mksysb command creates an installable image of the rootvg. This is synonym to say that mksysb creates
a backup of the operating system (that is, the root volume group). 
You can use this backup to reinstall a system to its original state after it has been corrupted. 
If you create the backup on tape, the tape is bootable and includes the installation programs 
needed to install from the backup.

To generate a system backup and create an /image.data file (generated by the mkszfile command) to a tape device 
named /dev/rmt0, type: 
# mksysb -i /dev/rmt0

To generate a system backup and create an /image.data file with map files (generated by the mkszfile command) 
to a tape device named /dev/rmt1, type: 
# mksysb -m /dev/rmt1


To generate a system backup with a new /image.data file, but exclude the files in directory /home/user1/tmp, 
create the file "/etc/exclude.rootvg" containing the line /home/user1/tmp/, and type: 
# mksysb -i -e /dev/rmt1

This command will backup the /home/user1/tmp directory but not the files it contains.

To generate a system backup file named /mksysb_images/node1 and a new /image.data file for that image, type: 
# mksysb -i /userimage/node1

There will be four images on the mksysb tape, and the fourth image will contain ONLY rootvg JFS or JFS2
mounted file systems. The target tape drive must be local to create a bootable tape. 

The following is a description of mksysb's four images. 

  +---------------------------------------------------------+
  |  Bosboot  |  Mkinsttape  |  Dummy TOC  |    rootvg      |
  |   Image   |     Image    |    Image    |     data       |
  |-----------+--------------+-------------+----------------|
  |<----------- Block size 512 ----------->| Blksz defined  |
  |                                        | by the device  |
  +---------------------------------------------------------+ 




Special notes:
--------------

Note 1: mksysb problem
----------------------

Question:
I'm attempting to restore a mksysb tape to a system that only has 18GB of drive space available for the Rootvg. 
Does the mksysb try to restore these mirrored LVs, or does it just make one copy? 
If it is trying to rebuild the mirror, is there a way that I can get around that? 

Answer:
I had this same problem and received a successful resolution. I place those same tasks here:
1) Create a new image.data file, run mkszfile file.
2) Change the image.data as follows:
a) cd /
b) vi image.data
c) In each lv_data stanza of this file, change the values of the copies
line by one-half (i.e. copies = 2, change to copies = 1)
Also, change the number of Physical Volumes "hdisk0 hdisk1" to "hdisk0".
d) Save this file.
3) Create another mksysb from the command line that will utilize the newly edited image.data file by the command:
mksysb /dev/rmt0 (Do not use smit and do not run with the -i flag,
both will generate a new image.data file
4) Use this new mksysb to restore your system on other box without mirroring. 


Note 2: How to restore specific files from a mksysb tape:								
---------------------------------------------------------
							
$ tctl fsf 3								
$ restore -xvf /dev/rmt0.1 ./your/file/name								
								
For example, if you need to get the vi command back, put the mksysb tape in the tape drive 
(in this case, /dev/rmt0) and do the following:								
								
cd /                         # get to the root directory								
tctl -f /dev/rmt0 rewind     # rewind the tape								
tctl -f /dev/rmt0.1 fsf 3    # move the tape to the third file, no rewind								
restore -xqf /dev/rmt0.1 -s 1 ./usr/bin/vi    # extract the vi binary, no rewind								
								
Further explanation why you must use the fsf 3 (fast forward skip file 3):								
The format of the tape is as follows:								
1. A BOS boot image								
2. A BOS install image								
3. A dummy Table Of Contents								
4. The system backup of the rootvg								
								
So if you just need to restore some files, first forward the tape pointer to position 3, counting from 0.				


Note 3: How to restore specific files from a mksysb FILE
--------------------------------------------------------

See also note 2

view: restore -Tvqf [mksysb file] 
To restore: restore -xvqf [mksysb file] [file name] 


Note 4: How to restore a directory from a mksysb FILE
------------------------------------------------------


Simply using the restore command. 


restore -xvdf <mksysb.image> ./your/directory 


The dot at the front of the path is important. 
The "-d" flag indicates that this is a directory and everything in it should 
be restored. If you omit that, you'll restore an empty directory. 


The directory will be restored underneath whatever directory you're in. So 
if you're in your home directory it might create: 
/home/azhou/your/directory. 


With a mksysb image on disk you don't have any positioning to do, like with 
a tape. 


Note 5: Performing a mksysb migration with CD installation
----------------------------------------------------------

You can perform a mksysb migration with a CD installation of AIXr 5.3

Step 1. Prepare your system for installation:


Prepare for migrating to the AIX 5.3 BOS by completing the following steps:

- Insert the AIX Volume 1 CD into the CD-ROM device. 
- Shut down the target system. If your machine is currently running, power it off by following these steps:
    Log in as the root user. 
    Type shutdown -F. 
  If your system does not automatically power off, place the power switch in the Off (0) position. 
  Attention: You must not turn on the system unit until instructed to do so.

- Turn on all attached external devices. External devices include the following:
   Terminals 
   CD-ROM drives 
   DVD-ROM drives 
   Tape drives 
   Monitors 
   External disk drives 

Turning on the external devices first is necessary so that the system unit can identify each peripheral device 
during the startup (boot) process. 

- If your MKSYSB_MIGRATION_DEVICE is a tape, insert the tape for the mksysb in the tape drive. 
If your MKSYSB_MIGRATION_DEVICE is a CD or DVD, and there is an additional CD or DVD drive on the system 
(other than the one being used to boot AIX), insert the mksysb CD or DVD in the drive to avoid being 
prompted to swap medias. 

- Insert your customized bosinst.data supplemental diskette in the diskette drive. If the system does not 
have a diskette drive, use the network installation method for mksysb migration. 


Step 2. Boot from your installation media:


The following steps migrate your current version of the operating system to AIX 5.3. 
If you are using an ASCII console that was not defined in your previous system, you must define it. 
For more information about defining ASCII consoles, see Step 3. Setting up an ASCII terminal.

Turn the system unit power switch from Off (0) to On (|). 

When the system beeps twice, press F5 on the keyboard (or 5 on an ASCII terminal). If you have a graphics display, 
you will see the keyboard icon on the screen when the beeps occur. If you have an ASCII terminal 
(also called a tty terminal), you will see the word "keyboard" when the beeps occur. 
Note: If your system does not boot using the F5 key (or the 5 key on an ASCII terminal), refer to your 
hardware documentation for information about how to boot your system from an AIX product CD.

The system begins booting from the installation media. The mksysb migration installation proceeds 
as an unattended installation (non-prompted) unless the MKSYSB_MIGRATION_DEVICE is the same CD or DVD drive 
as the one being used to boot and install the system. In this case, the user is prompted to switch 
the product CD for the mksysb CD or DVD(s) to restore the image.data and the /etc/filesystems file. 
After this happens the user is prompted to reinsert the product media and the installation continues. 
When it is time to restore the mksysb image, the same procedure repeats. 

The BOS menus do not currently support mksysb migration, so they cannot be loaded. In a traditional migration, 
if there are errors that can be fixed by prompting the user for information through the menus, 
the BOS menus are loaded. If such errors or problems are encountered during mksysb migration, 
the installation asserts and an error stating that the migration cannot continue displays. 
Depending on the error that caused the assertion, information specific to the error might be displayed. 
If the installation asserts, the LED shows "088".


Note 6: create a mksysb tape MANUALLY
-------------------------------------


THIS NOTE DESCRIBES NOT A SUPPORTED METHOD, AND IS NOT CHECKED..

Here we do not mean the "mksysb -i /dev/rmtx" method, but...:

Question:
I have to clone a standalone 6H1 equipped with a 4mm tape, from
another 6H1 which is node of an SP and which does not own a tape !
The consequence is that my source mksysb is a file that is recorded in
/spdata/sys1/install/aixxxx/images

How will I copy this file to a tape to create the correct mksysb tape
that could be used to restore on my target machine ?

Answer:
using the following method in the case the two server are in the same
AIX level and kernel type (32/64 bits, jfs or jfs2)

- the both servers must communicate over an IP network and have .rhosts
file documented (for using rsh)

cp /var/adm/ras/bosinst.data /bosinst.data
mkszfile

copy these files (bosinst.data and image.data) under "/" on the remote
system

on the server:

tctl -f /dev/rmt0 status
if the block size is not 512:

# chdev -l /dev/rmt0 -a block_size=512
tctl -f /dev/rmt0 rewind
bosboot -a -d /dev/rmt0.1 

(create the boot image on the first file of mksysb)

mkinsttape /dev/rmt0.1 (create the second file on the
mksysb with image.data, bosinst.data, and oher files like drivers and
commands)

echo " Dummy tape TOC" | dd of=/dev/rmt0.1 conv=sync bs=512 > /dev/null
2>&1 (create the third file "dummy toc")


create a named pipe:

mknod /tmp/pipe p

and run the mksysb as this:

dd if=/tmp/pipe | rsh "server_hostname" dd of=/dev/rmt0.1 &
mksysb /tmp/pipe

this last command create the fourth file with "rootvg" in backup/restore
format


Note 7: Creating a root volume group backup on CD or DVD with the ISO9660 format
--------------------------------------------------------------------------------

Follow this procedure to create a root volume group backup on CD or DVD with the ISO9660 format.

You can use Web-based System Manager or SMIT to create a root volume group backup on CD or DVD with the 
ISO9660 format, as follows:

Use the Web-based System Manager Backup and Restore application and select System backup wizard method. 
This method lets you create bootable or non-bootable backups on CD-R, DVD-R, or DVD-RAM media. 
OR

To create a backup to CD, use the smit mkcd fast path. 
To create a backup to DVD, use the smit mkdvd fast path and select ISO9660 (CD format). 

The following procedure shows you how to use SMIT to create a system backup to CD. 
(The SMIT procedure for creating a system backup to an ISO9660 DVD is similar to the CD procedure.) 
Type the smit mkcd fast path. The system asks whether you are using an existing mksysb image. 
Type the name of the CD-R device. (This can be left blank if the Create the CD now? field is set to no.) 
If you are creating a mksysb image, select yes or no for the mksysb creation options, Create map files? 
and Exclude files?. Verify the selections, or change as appropriate. 
The mkcd command always calls the mksysb command with the flags to extend /tmp.

You can specify an existing image.data file or supply a user-defined image.data file. See step 16.

Enter the file system in which to store the mksysb image. This can be a file system that you created in the rootvg, 
in another volume group, or in NFS-mounted file systems with read-write access. If this field is left blank, 
the mkcd command creates the file system, if the file system does not exist, and removes it when the command completes. 

Enter the file systems in which to store the CD or DVD file structure and final CD or DVD images. These can be 
file systems you created in the rootvg, in another volume group, or in NFS-mounted file systems. If these fields 
are left blank, the mkcd command creates these file systems, and removes them when the command completes, 
unless you specify differently in later steps in this procedure. 

If you did not enter any information in the file systems' fields, you can select to have the mkcd command either 
create these file systems in the rootvg, or in another volume group. If the default of rootvg is chosen 
and a mksysb image is being created, the mkcd command adds the file systems to the exclude file and calls 
the mksysb command with the -e exclude files option. 

In the Do you want the CD or DVD to be bootable? field, select yes to have a boot image created on the 
CD or DVD. If you select no, you must boot from a product CD at the same version.release.maintenance level, 
and then select to install the system backup from the system backup CD. 

If you change the Remove final images after creating CD? field to no, the file system for the CD images 
(that you specified earlier in this procedure) remains after the CD has been recorded. 

If you change the Create the CD now? field to no, the file system for the CD images (that you specified earlier 
in this procedure) remains. The settings that you selected in this procedure remain valid, but the CD is not 
created at this time. 

If you intend to use an Install bundle file, type the full path name to the bundle file. The mkcd command copies 
the file into the CD file system. You must have the bundle file already specified in the BUNDLES field, 
either in the bosinst.data file of the mksysb image or in a user-specified bosinst.data file. When this 
option is used to have the bundle file placed on the CD, the location in the BUNDLES field of the bosinst.data 
file must be as follows: 
/../usr/sys/inst.data/user_bundles/bundle_file_name

To place additional packages on the CD or DVD, enter the name of the file that contains the packages list 
in the File with list of packages to copy to CD field. The format of this file is one package name per line. 
If you are planning to install one or more bundles after the mksysb image is restored, follow the directions 
in the previous step to specify the bundle file. You can then use this option to have packages listed 
in the bundle available on the CD. If this option is used, you must also specify the location of installation 
images in the next step.

Enter the location of installation images that are to be copied to the CD file system (if any) in the Location 
of packages to copy to CD field. This field is required if additional packages are to be placed on the CD 
(see the previous step). The location can be a directory or CD device. 

You can specify the full path name to a customization script in the Customization script field. If given, 
the mkcd command copies the script to the CD file system. You must have the CUSTOMIZATION_FILE field already set 
in the bosinst.data file in the mksysb image or else use a user-specified bosinst.data file with the CUSTOMIZATION_FILE field set. The mkcd command copies this file to the RAM file system. Therefore, the path in the CUSTOMIZATION_FILE field must be as follows: 
/../filename

You can use your own bosinst.data file, rather than the one in the mksysb image, by typing the full path name 
of your bosinst.data file in the User supplied bosinst.data file field. 
To turn on debugging for the mkcd command, set Debug output? to yes. The debug output goes to the smit.log. 
You can use your own image.data file, rather than the image.data file in the mksysb image, by typing the 
full path name of your image.data file for the User supplied image.data file field. 


Note 8: 0301-150 bosboot: Invalid or no boot device specified!
--------------------------------------------------------------


== Technote:

APAR status
Closed as program error.

Error description 

On a system, that does not have tape support
installed, running mkszfile will show the
following error:
0301-150 bosboot: Invalid or no boot device
specified.

Local fix 
Install device support for scsi tape devices.

Problem summary 
Error message when creating backup if devices.scsi.tape.rte
not installed even if the system does not have a tape drive.

Problem conclusion 
Redirect message to /dev/null.

Temporary fix 
Ignore message.

Comments 
APAR information 
APAR number IY52551 IY95261
Reported component name AIX 5L POWER V5 
Reported component ID 5765E6200 
Reported release 520 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2004-01-12 
Closed date 2004-01-12 
Last modified date 2004-02-27 


== Technote:

APAR status
Closed as program error.

Error description 
If /dev/ipldevice is missing, mksfile will show the
bosboot usage statement.

  0301-150 bosboot: Invalid or no boot device
           specified!
Local fix 
Problem summary 
If /dev/ipldevice is missing, mksfile will show the
bosboot usage statement.

  0301-150 bosboot: Invalid or no boot device
           specified!

Problem conclusion 
Do not run bosboot against /dev/ipldevice.

Temporary fix 
Comments 

APAR information 
APAR number IY95261 
Reported component name AIX 5.3 
Reported component ID 5765G0300 
Reported release 530 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2007-02-22 
Closed date 2007-02-22 
Last modified date 2007-06-06 

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Publications Referenced


Fix information 
Fixed component name AIX 5.3 
Fixed component ID 5765G0300 


== thread:

Q:

> 
> Someone out there knows the fix for this one; if you get a moment, would you 
> mind giving me the fix? 
> 
> 
> # mksysb -i /dev/rmt0 
> 
> /dev/ipldevice not found 
> 

A:

The ipldevice file is probably deleted from your /dev directory, or 
point to wrong 
entry. The '/dev/ipldevice' file is (re)created in boot time 2nd 
phase. For additional 
information look into /sbin/rc.boot script... The ipldevice entry 
type is hardlink. Usually point to /dev/rhdiskN, assuming that boot 
device is hdiskN. 
Check your system and you should got similar ... 
find /dev -links 2 -ls 
.... 
8305 0 crw------- 2 root system 14, 1 Feb 20 2005 /dev/rhdisk0 
8305 0 crw------- 2 root system 14, 1 Feb 20 2005 /dev/ipldevice 
... 
(The first cloumn of the output is the inode number) 

So, you can recreate the wrong, or missing ipdevice file. 
'bootinfo -b' says the physical boot device name. 
For exapmle: 
ln -f /dev/rhdisk0 /dev/ipldevice 

I hope this will solve your bosboot problem. 


Q:

I was installing Atape driver and noticed bosboot failure when installp 
calls bosboot with /dev/ipldevice. Messages below: 

0503-409 installp: bosboot verification starting... 
0503-497 installp: An error occurred during bosboot verification 
processing. 

Inspection of /dev showed no ipldevice file 

I was able to easily recreate the /dev/ipldevice using 

ln /dev/rhdisk0 /dev/ipldevice 

then successfully install the Atape driver software. 

After reboot /dev/ipldevice is missing again???. 

Environment is p5 520 AIX 5.3 ML1 
mirrored internal drives hdisk0 and hdisk1 in rootvg 

I have 5.3 ML2 (but have not applied yet) 
I don't see any APAR's in ML2 regarding /dev/ipldevice problems.

A:

Are you using EMC disk? There is a known problem with the later 
Powerpath versions where the powerpath startup script removes the 
/dev/ipldevice file if there is more than one device listed in the 
bootlist. 

A:

Yes, running EMC PowerPath 4.3 for AIX, with EMC Clariion CX600 Fibre 
disks attached to SAN. I always boot from, and mirror the OS on IBM 
internal disks. We order 4 internal IBM drives. Two for primary OS and 
mirror, the other two for alt_disk and mirrors. 

Thanks for the tip. I will investigate at EMC Powerlink site for fix. I 
know PowerPath 4.4 for AIX is out, but still pretty new.


A:

ipldevice is a link to the rawdevice (rhdisk0 , not hdisk0) 


-----Original Message----- 
From: IBM AIX Discussion List [mailto:aix-l@Princeton.EDU] On Behalf Of 
Robert Miller 
Sent: Wednesday, April 07, 2004 6:13 PM 
To: aix-l@Princeton.EDU 
Subject: Re: 64 Bit Kernel 


It may be one of those odd IBMisms where they want to call something a 
certain name so they put it in as a link to the actual critter... 

Looking on my box, the /dev/ipldevice has the same device major and 
minor numbers as hdisk0 - tho it is interesting that ipldevice is a 
character device, where a drive is usually a block device: 


mybox:rmiller$ ls -l /dev/ipl* 
crw------- 2 root system 23, 0 Jan 15 2002 /dev/ipldevice 
mybox:rmiller$ ls -l /dev/hdisk0 
brw------- 1 root system 23, 0 Sep 13 2002 /dev/hdisk0 


A:

> Hi, 

> AIX 5.3 
> I have a machine where /dev/ipldevice doesn't exit 
> I can reboot it safely ? 
> How I can I re-create it ? 

> Thanks in advance 

I did this today, and there is probably a more accepted way. 
I made a hard link from my rhdiskX device to /dev/ipldevice. 

If your boot device is /dev/hdisk0, then the command line would be as 
follows: 

ln /dev/rhdisk0 /dev/ipldevice 

Again, there is probably a more acceptable way to achieve this, but it 
worked for me. 


== thread:

how to recover from an invalid or no boot device error in AIX 
Description

When running the command "bosboot -ad /dev/ipldevice" in IBM AIX, you get the following error:

0301-150 bosboot: Invalid or no boot device specified!

A device specified with the bosboot -d command is not valid. The bosboot command was unable to finish processing 
because it could not locate the required boot device. The installp command calls the bosboot command 
with /dev/ipldevice. If this error does occur, it is probably because /dev/ipldevice does not exist. 
/dev/ipldevice is a link to the boot disk. 

To determine if the link to the boot device is missing or incorrect :

1) Verify the link exists:

# ls -l /dev/ipldevice
ls: 0653-341 The file /dev/ipldevice does not exist.

2) In this case, it does not exist. To identify the boot disk, enter "lslv -m hd5". The boot disk name displays. 

# lslv -m hd5
hd5:N/A
LP PP1 PV1 PP2 PV2 PP3 PV3
0001 0001 hdisk4 0001 hdisk1 

In this example the boot disk name is hdisk4 and hdisk1.

3) Create a link between the boot device indicated and the /dev/ipldevice file. Enter: 

# ln /dev/boot_device_name /dev/ipldevice
(An example of boot_device_name is rhdisk0.)

In my case, I ran:

# ln /dev/rhdisk4 /dev/ipldevice

4) Now run the bosboot command again:

# bosboot -ad /dev/ipldevice 
Example

lslv -m hd5; ln /dev/rhdisk4 /dev/ipldevice; bosboot -ad /dev/ipldevice 


Q:

p595 LPAR no longer sees SAN boot disk 

Hello, we have a wierd and urgent problem, with a few of our p595 LPARs running AIX 5.3. The LPARs ran AIX 5.3 TL 7 
and booted off EMC SAN disks, using EMC Powerpath. Every boot we run "pprootdev on" and 
"pprootdev fix". We can issue "bosboot -a" and we can reboot the machines.

Now, on two occasions, right after the update to AIX 5.3 Technology Level 9, Service Pack 3 
the system fails to reboot. When starting the partition to the SMS menu you can see the correct (rootvg) devices 
being scanned, but the devices are NOT listed as a possible boot device.

When booting off a NIM server and trying to restore an mksysb, the entire rootvg (all disks in it) 
is invisible and cannot be selected. Some user-defined volume groups are also "missing".

When giving the partition access to a 'new' EMC disk, this new disk shows up and can be used to restore 
the mksysb. When that is complete, the original disks show up properly (using lspv etc.) and seem perfectly alright.

Anyone ran into this same problem? Any idea's, suggestions, fixes??


A:

Being able to see a device but being unable to access the data area sounds like a SCSI disk reservation problem.



Note 9: Other mksysb errors on AIX 5.3:
---------------------------------------

It turns out, that on AIX 5.3, on certain ML/TL levels (below TL 6), an mksysb error turns up,
if you have other volume groups defined other than rootvg, while there is NO filesystem created on
those Volume groups.

Solution: create a filesystem, even only a "test" or "dummy" filesystem, on those VG's.


>> thread 1:

Q:

Hi 

can't find any information about "backup structure of volume group, vios". included service: 
"savevgstruct vgname" working with errors: 
# lsvg 
rootvg 
vg_dev 
datavg_dbs 
# /usr/ios/cli/ioscli savevgstruct vg_dev 

Creating information file for volume group vg_dev.. 

Some error messages may contain invalid information 
for the Virtual I/O Server environment. 

cat: 0652-050 Cannot open /tmp/vgdata/vg_dev/fs_data_tmp. 

# ls -al /tmp/vgdata/vg_dev/ 
total 16 
drwxr-xr-x 2 root staff 256 Apr 02 08:38 . 
drwxrwxr-x 5 root system 256 Apr 02 08:20 .. 
-rw-r--r-- 1 root staff 2002 Apr 02 08:35 filesystems 
-rw-r--r-- 1 root staff 1537 Apr 02 08:35 vg_dev.data 
# oslevel -r 
5300-05 
# df -k | grep tmp 
/dev/hd3 1310720 1309000 1% 42 1% /tmp 


A:

I had this issue as well with VIO 1.3. I called IBM support 
about it and it is a known issue. The APAR is IY87935. The fix 
will not be released until AIX 5.3 TL 6, which is due out in 
June. It occurs when you run savevgstruct on a user defined 
volume group that contains volumes where at least one does not 
have a filesystem defined on it. The workaround is to define a 
filesystem on every volume in the user defined volume group.


>> thread 2:

IBM APAR Note:

http://www-1.ibm.com/support/docview.wss?uid=isg1IY87935

IY87935: MKVGDATA/SAVEVG CAN FAIL


APAR status
Closed as program error.

Error description 
The mkvgdata command when executed on a volume group that does
not have any mounted filesystems:

  # savevg -f /home/vgbackup -i vg00

  Creating information file for volume group vg00..cat:
  0652-050 Cannot open /tmp/vgdata/vg00/fs_data_tmp.

  /usr/bin/savevg 33 :  BACKUPSHRINKSIZE = 16 + FSSHRINKSIZE :
  0403-009 The specified number is not valid for this command.

Local fix 

Problem summary 
The mkvgdata command when executed on a volume group that does
not have any mounted filesystems:

  # savevg -f /home/vgbackup -i vg00

  Creating information file for volume group vg00..cat:
  0652-050 Cannot open /tmp/vgdata/vg00/fs_data_tmp.

  /usr/bin/savevg 33 :  BACKUPSHRINKSIZE = 16 + FSSHRINKSIZE :
  0403-009 The specified number is not valid for this command.

Problem conclusion 
Check variable.

Temporary fix 

Comments 

APAR information 
APAR number IY87935 
Reported component name AIX 5.3 
Reported component ID 5765G0300 
Reported release 530 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2006-08-09 
Closed date 2006-08-09 
Last modified date 2006-08-09 






9.11 AIX: the backup and restore commands:
------------------------------------------

The backup command creates copies of your files on a backup medium, such as a magnetic tape or diskette. 
But you can also backup to diskspace.
The copies are in one of the two backup formats:

- Specific files and directories, backed up by name using the -i flag. 
- Entire file system backed up by i-node, not using the -i flag, 
  but instead using the Level and FileSystem parameters.

Unless you specify another backupmedia with the -f parameter, the backup command automatically
writes its output to /dev/rfd0 which is the diskette drive.

(1) Backing up the user directory "userdirectory":
# cd /userdirectory
# find . -depth | backup -i -f /dev/rmt0            # or use find . -print

(2) Incremental backups:
You can create full and incremental backups of filesystems as well, as shown in the following example.
When the -u flag is used with the backup command, the system will do an incremental backup 
according to the -level number specified. For example, a level 5 backup will only back up the
data that has changed after the level 4 was made.
Levels can range from 0 to 9.

Example;

On Sunday:
# backup -0 -uf /dev/rmt0 /data
On Monday:
# backup -1 -uf /dev/rmt0 /data
..
..
On Saturday:
# backup -6 -uf /dev/rmt0 /data

Due to the -u parameter, information about the backups is written to the /etc/dumpdates file.

To backup the / (root) file system, enter: 
# backup  -0 -u -f /dev/rmt0 /

Note that we do noy use the -i flag, but instead backup an entire fs "/". 

Other examples:
---------------

To backup all the files and subdirectories in current directory using relative pathnames, use
# find . -print | backup -if /dev/rmt0

To backup the files /bosinst.data and /signature to the diskette, use
# ls ./bosinst.dat ./signature | backup -iqv

How to restore a file:
----------------------

Suppose we want to restore the /etc/host file, because its missing.

# tctl -f /dev/rmt0 rewind                  # - rewind tape
# restore -x -d -v -q -s4 -f /dev/rmt0.1 ./etc/hosts

Another example:

# restore -qvxf /dev/rmt0.1 "./etc/passwd"     Restore /etc/passwd file 
# restore -s4 -qTvf /dev/rmt0.1                Lists contents of a mksysb tape 

More on checking and perform the restore command:
-------------------------------------------------

Check (or list) the backup can be done by using a command similar to the following example:

# restore –Tqf /save/backup_ddmmyy.backup 1>/dev/null 2>&1;echo $?

Because the echo commands returns $?, that would mean that 0 as a result means that the listing the backup
is succesfull.
If you do not use the echo, the above command just shows the complete listing of the backup contents.
It does not do the actual restore, it just shows a listing.

If you actually want to restore the backup, use the following command:

restore –xdvqf /save/backup_ddmmyy.backup 1>/dev/null 2>&1;echo $?




9.12 AIX: savevg and restvg:
----------------------------

To backup, or clone, a VG, you can use the 

- mksysb command for the rootvg
- savevg command for other user VG's

To backup a user Volume Group (VG, see also sections 30 and 31) you can use savevg to backup a VG
and restvg to restore a VG.

# lsvg                                # - shows a list of online VG's
rootvg
uservg

# savevg -if /dev/rmt0 uservg         # - now backup the uservg


9.13 AIX: tctl:
---------------

Purpose
Gives subcommands to a streaming tape device.

Syntax
tctl [  -f Device ] [  eof | weof | fsf | bsf | fsr | bsr | rewind | offline |  rewoffl | erase | retension | reset | status ] [ Count ]

tctl [  -b BlockSize ] [  -f Device ] [  -p BufferSize ] [  -v ] [  -n ] [  -B ] {  read | write }

Description
The tctl command gives subcommands to a streaming tape device. If you do not specify the Device variable 
with the -f flag, the TAPE environment variable is used. If the environment variable does not exist, 
the tctl command uses the /dev/rmt0.1 device. (When the tctl command gives the status subcommand, 
the default device is /dev/rmt0.) The Device variable must specify a raw (not block) tape device. 
The Count parameter specifies the number of end-of-file markers, number of file marks, or number of records. 
If the Count parameter is not specified, the default count is 1.

Examples
To rewind the rmt1 tape device, enter: 
tctl  -f /dev/rmt1  rewind

To move forward two file marks on the default tape device, enter: 
tctl  fsf 2

To write two end-of-file markers on the tape in /dev/rmt0.6, enter: 
tctl  -f /dev/rmt0.6  weof 2

To read a tape device formatted in 80-byte blocks and put the result in a file, enter: 
tctl  -b 80  read > file

To read variable-length records from a tape device formatted in 80-byte blocks and put the result in a file, enter: 
tctl  -b 80  -n  read > file

To write variable-length records to a tape device using a buffer size of 1024 byes, enter: 
cat file | tctl  -b 1024  -n  -f/dev/rmt1  write

To write to a tape device in 512-byte blocks and use a 5120-byte buffer for standard input, enter: 
cat file | tctl  -v  -f /dev/rmt1  -p 5120  -b 512  write


Note: The only valid block sizes for quarter-inch (QIC) tape drives are 0 and 512.
To write over one of several backups on an 8 mm tape, position the tape at the start of the backup file 
and issue these commands: 
tctl  bsf 1

tctl  eof 1


9.14 AIX mt command:
--------------------


Purpose
Gives subcommands to streaming tape device.

Syntax
mt [  -f TapeName ] Subcommand [ Count ]

Description
The mt command gives subcommands to a streaming tape device. If you do not specify the -f flag 
with the TapeName parameter, the TAPE environment variable is used. If the environment variable 
does not exist, the mt command uses the /dev/rmt0.1 device. The TapeName parameter must be a raw (not block) 
tape device. You can specify more than one operation with the Count parameter.


Subcommands

eof, weof Writes the number of end-of-file markers specified by the Count parameter at the 
          current position on the tape. 
fsf       Moves the tape forward the number of files specified by the Count parameter and positions 
          it to the beginning of the next file. 
bsf       Moves the tape backwards the number of files specified by the Count parameter and positions 
          it to the beginning of the last file skipped. If using the bsf subcommand would cause the tape head 
          to move back past the beginning of the tape, then the tape will be rewound, and the mt command will return EIO. 
fsr       Moves the tape forward the number of records specified by the Count parameter. 
bsr       Moves the tape backwards the number of records specified by the Count parameter. 
rewoff1, rewind Rewinds the tape. The Count parameter is ignored. 
status    Prints status information about the specified tape device. The output of the status command 
          may change in future implementations 

Examples
To rewind the rmt1 tape device, enter: 

mt -f /dev/rmt1 rewind
To move forward two files on the default tape device, enter: 

mt fsf 2
To write two end-of-file markers on the tape in the /dev/rmt0.6 file, enter: 

mt -f /dev/rmt0.6 weof 2


9.14 AIX tapeutil command:
--------------------------

tapeutil -f <devicename> <commands>
- A program which came with the tape library to control it's working. Called without arguments gives a menu. 
Is useful for doing things like moving tapes from the slot to the drive. e.g.

$ tapeutil -f /dev/smc0 move -s 10 -d 23 

which moves the tape in slot 10 to the drive (obviously, this will depend on your own individual tape library, 
may I suggest the manual?). 

The fileset you need to install for 'tapeutil' command is:
Atape.driver 7.1.5.0.

Example:
--------

We are using 3583 automated tape library for backups.for tapeutil command u need to have a file atape.sys 
on ur system.to identify the positioning of tape drives and source just type tapeutil it will give 
u a number of options.choose element information to identify the source and tape drive numbers.
In our case the tape drives numbers are 256 and 257 and the source number to insert the tape is 16.
we usually give the following commands to load and move the tape.

Loading Tape:-
tapeutil -f /dev/smc0 move -s 16 -d 256
(to insert the tape in tapedrive 1,where 16 is source and 256 is destination)
to take the backup:-

find filesystem1 filesystem2 | backup -iqvf /dev/rmt1

((filessystem name without mount point slash))

after taking the backup and unloading tape:-

tapeutil -f /dev/rmt1 unload

tapeutil -f /dev/smc0 move -s 256 -d 16

(first unload the tape then move it to source destination)

this might help u to use the taputil command in taking backup.

Example:
--------

In order to move tapes in and out of the Library here is what I do.

First  I unload the tape with the command  #tapeutil -f /dev/rmtx unload
Where x is 0,1,2,3...
then I move the tape from external slot (16) using the media changer, not the tape drive.

#tapeutil -f /dev/smcx move 256 16
The above command moves the tape in your first tape drive (256) to the external slot.
Note that you can also move from the internal slots to the external slot or the tape drive.
To move the tape back from the external slot, I just switch 256 and 16 parameters.


Example:
--------

The code I use to list the I/O station slots is:

/usr/bin/tapeutil -f /dev/smc0 inventory | grep -p Station | egrep
'Station|Volume' | awk '{
if($1 =3D=3D "Import/Export") ioslot=3D$4;
if($1 =3D=3D "Volume") {
      if(NF =3D=3D 4) volser=3D$4;
      else volser=3D"-open-";
      print ioslot, volser;
}}'

The tapeutil command to move a tape is:

/usr/bin/tapeutil -f /dev/smc0 move <fromslot> <toslot>

For example:  /usr/bin/tapeutil -f /dev/smc0 move 773 1037

You can get the slot numbers, and volsers in them, with the command:
/usr/bin/tapeutil -f /dev/smc0 inventory

To find an open slot just look for a slot with a blank "Volume Tag".

One little hitch, however.  If a tape is currently mounted, the "tapeut=il inventory" command will show a
slot as open ("Volume Tag" is blank), but TSM will have it reserved for=
 the
mounted tape.  So what I did
in my script is to check the TSM device configuration file for each ope=
n
slot that I find and if that slot number
appears in it then I skip that slot and go on to the next one.


Example:
--------

#!/bin/ksh
DEVICE=$1
HOST=$2
TAPE=$3
case $TAPE in
2) tapeutil -f /dev/smc0 move 23 10
      tapeutil -f /dev/smc0 move 11 23
;;
3) tapeutil -f /dev/smc0 move 23 11
      tapeutil -f /dev/smc0 move 12 23
;;
4) tapeutil -f /dev/smc0 move 23 12
      tapeutil -f /dev/smc0 move 13 23
;;
5) tapeutil -f /dev/smc0 move 23 13
      tapeutil -f /dev/smc0 move 14 23
;;
esac

Example:
--------

tapeutil -f /dev/rmt1 unload 
tapeutil -f /dev/smc0 move 257 16 
tapeutil -f /dev/smc0 move -s 256 -d 16
tapeutil -f /dev/smc0 move 257 1025 
tapeutil -f /dev/smc0 move 16 257 

tapeutil -f /dev/smc0 exchange 34 16 40
tapeutil -f /dev/smc0 inventory | more
tctl -f/dev/rmt0 rewoffl
tapeutil -f/dev/smc0 elementinfo
tapeutil -f /dev/scm0 inventory



Example:
--------

tapeutil -f /dev/rmt1 unload 
sleep 20

DAYNO=`date +%d`;export DAYNO

case $DAYNO in
01) tapeutil -f /dev/smc0 move 23 10
      tapeutil -f /dev/smc0 move 11 23
;;
02) tapeutil -f /dev/smc0 move 23 10
      tapeutil -f /dev/smc0 move 11 23
;;
03) tapeutil -f /dev/smc0 move 23 10
      tapeutil -f /dev/smc0 move 11 23
;;
04) tapeutil -f /dev/smc0 move 23 10
      tapeutil -f /dev/smc0 move 11 23
;;
05) tapeutil -f /dev/smc0 move 23 10
      tapeutil -f /dev/smc0 move 11 23
;;
06) tapeutil -f /dev/smc0 move 23 10
      tapeutil -f /dev/smc0 move 11 23
;;
07) tapeutil -f /dev/smc0 move 23 10
      tapeutil -f /dev/smc0 move 11 23
;;
esac

Example:
--------

tapeutil -f /dev/rmt1 unload 
sleep 20

DAYNAME=`date +%a`;export DAYNAME

case $DAYNAME in
Sun) tapeutil -f /dev/smc0 move 256 4098
tapeutil -f /dev/smc0 move 4099 256      
;;
Mon) tapeutil -f /dev/smc0 move 256 4099
tapeutil -f /dev/smc0 move 4100 256    
;;
Tue) tapeutil -f /dev/smc0 move 256 4100
tapeutil -f /dev/smc0 move 4113 256      
;;
Wed) tapeutil -f /dev/smc0 move 256 4113
tapeutil -f /dev/smc0 move 4114 256     
;;
Thu) tapeutil -f /dev/smc0 move 256 4114
tapeutil -f /dev/smc0 move 4109 256    
;;
Fri) tapeutil -f /dev/smc0 move 256 4109
tapeutil -f /dev/smc0 move 4124 256      
;;
Sat) tapeutil -f /dev/smc0 move 256 4124
tapeutil -f /dev/smc0 move 4110 256      
;;
esac

tapeutil -f /dev/smc0 move 256 4098
tapeutil -f /dev/smc0 move 4099 256

Example:
--------



tapeutil -f /dev/smc0 move 16 4096
sleep 10
tapeutil -f /dev/smc0 move 17 4097
sleep 10
tapeutil -f /dev/smc0 move 18 4098
sleep 10
tapeutil -f /dev/smc0 move 19 4099
sleep 10
tapeutil -f /dev/smc0 move 20 4100
sleep 10
tapeutil -f /dev/smc0 move 21 4101
sleep 10

Example:
--------

mt -f /dev/rmt1  rewind
mt -f /dev/rmt1.1 fsf 6
tar -xvf /dev/rmt1.1 /data/download/expdemo.zip
SPL bld


About Ts3310:
-------------

Abstract 
Configuration Information for IBM TS3310 (IBM TotalStorage 3576)  
  
Content 

IBM TS3310 (IBM TotalStorage 3576)

Drive Addresses Storage Slot Addresses Changer Address Entry/Exit Slot Address 
256-261         4096-4223              1               16-21 

Notes:

1. Barcodes are required. Without a barcode label, a volume will show as unknown media.

2. ELEMent=AUTODetect in the DEFINE/UPDATE DRIVE command is supported.

3. Device identification and firmware used during validation 
Library ID: IBM 3576-MTL --- Firmware: 0.62

4. The IBM device driver is required. The IBM device drivers are available at ftp://ftp.software.ibm.com/storage/devdrvr.

5. The library is available with IBM LTO Generation 3 drives.

6. For more information on IBM TS3310, see TS3310 Tape Library.

 


Example:
--------

First, list the tape device names: 
lsdev -Cc tape
Assume it returns smc0 for the library, and rmt0 and rmt1 for the tape drives, and all devices are Available. 

Next, take an inventory of the library. 
tapeutil -f /dev/smc0 inventory | more
Assume the inventory returns two drives with element numbers 256 and 257 and shows a tape stored in slot 1025. 

Then, start moving the tape to each drive in turn, and verify which device name it is associated with 
by running tctl or mt rewoffl. If it returns without error, the device name matches the element number. 

Move the tape from the tape slot to the first drive: 
tapeutil -f /dev/smc0 move 1025 256
tctl -f/dev/rmt0 rewoffl
If the command returns with no errors, then element # 256 matches device name /dev/rmt0. 

Move the tape to the next drive 
tapeutil -f /dev/smc0 move 256 257
tctl -f/dev/rmt1 rewoffl
If the command returns with no errors, then element # 257 matches device name /dev/rmt1

Move the tape back to the storage slot it came from: 
tapeutil -f /dev/smc0 move 257 1025 

If at any point, the tctl command returns with errors, then try another device name until it returns without errors. 

NOTE: the 'rewoffl' flag on tctl simply rewinds and ejects the tape from the drive. 


9.15 Recover from AIX OS failure:
---------------------------------

Recover from OS failure.								
								
Contents:								
1. How to view the bootlist:								
2. How to change the bootlist:								
3. How to make a device bootable:								
4. How to make a backup of the OS:								
5. Shutdown a pSeries AIX system in the most secure way:								
6. How to restore specific files from a mksysb tape:								
7. Recovery of rootvg								
								

1. How to view the bootlist:								
								
At boottime, once the POST is completed, the system will search the boot list for a								
bootable image. The system will attempt to boot from the first entry in the bootlist.								
Its always a good idea to see what the OS thinks are the bootable devices and the order of what the OS 								
thinks it should use. Use the bootlist command to view the order:								
								
# bootlist -m normal -o								
								
As the first item returned, you will see hdisk0, the bootable harddisk.								
								
If you need to check the bootlist in "service mode", for example if you want to boot from tape to restore the rootvg, use								
								
# bootlist -m service -o								
								
								
2. How to change the bootlist:								
								
The bootlist, in normal operations, can be changed using the same command as used in section 1, for example								
								
# bootlist -m normal hdisk0 cd0								
								
This command makes sure the hdisk0 is the first device used to boot the system.								
								
If you want to change the bootlist for the system in service mode, you can change the list in order to use rmt0								
if you need to restore the rootvg.								
								
# bootlist -m service rmt0								
								
								
3. How to make a device bootable:								
								
To make a device bootable, use the bosboot command:								
								
# bosboot -ad /dev/ipldevice								
								
So, if hdisk0 must be bootable, or you want to be sure its bootable, use								
								
# bosboot -ad /dev/hdisk0								
								
								
4. How to make a backup of the OS:								
								
The mksysb command creates an installable image of the rootvg. This is synonym to say that mksysb creates								
a backup of the operating system (that is, the root volume group). 								
								
You can use this backup to reinstall a system to its original state after it has been corrupted. 								
If you create the backup on tape, the tape is bootable and includes the installation programs 								
needed to install from the backup.								
								
To generate a system backup and create an /image.data file (generated by the mkszfile command) to a tape device 								
named /dev/rmt0, type: 								
								
# mksysb -i /dev/rmt0								
								
If a backup tape was created with the -e switch, like in:								
								
# mksysb -i -e /dev/rmt0								
								
then a number of directories are NOT included in the backup. These exclusions are listed in the "/etc/exclude.rootvg" file.								
								
The mksysb command should be used regularly. It must certainly be done after installing apps or devices.								
In normal conditions, the OS does not change, and a bootable tape should be created at some frequency.								
								
								
5. Shutdown a pSeries AIX system in the most secure way:								
								
1. Shut down all applications in a controlled way.								
2. Make sure no users are on the system.								
3. Use the shutdown command:								
								
shutdown -r		to reboot the system						
shutdown -m		to reboot in maintenance mode						
								
								
6. How to restore specific files from a mksysb tape:								
								
$ tctl fsf 3								
$ restore -xvf /dev/rmt0.1 ./your/file/name								
								
For example, if you need to get the vi command back, put the mksysb tape in the tape drive (in this case, /dev/rmt0) 								
and do the following:								
								
cd /                         # get to the root directory								
tctl -f /dev/rmt0 rewind     # rewind the tape								
tctl -f /dev/rmt0.1 fsf 3    # move the tape to the third file, no rewind								
restore -xqf /dev/rmt0.1 -s 1 ./usr/bin/vi    # extract the vi binary, no rewind								
								
Further explanation why you must use the fsf 3 (fast forward skip file 3):								
The format of the tape is as follows:								
1. A BOS boot image								
2. A BOS install image								
3. A dummy Table Of Contents								
4. The system backup of the rootvg								
								
So if you just need to restore some files, first forward the tape pointer to position 3, counting from 0.								
								
								
7. Recovery of rootvg								
								
7.1 Check if the system can boot from tape:
# bootinfo -e

If a 1 is returned, the system can boot from tape, if a 0 is returned a boot from tape is not supported.

7.2 Recover the rootvg:

One possible method is the following:
1. Check whether the tape is in front of the disk with the bootlist command:
   # bootlist -m normal -o
2. Insert the mksysb tape
3. Power on the machine. The system will boot from the tape.
4. The Installation and Maintenance Menu will be displayed.



                      Welcome to Base Operating System
                      Installation and Maintenance

Type the number of your choice and press Enter.  Choice is indicated by >>>.

>>> 1 Start Install Now with Default Settings

    2 Change/Show Installation Settings and Install

    3 Start Maintenance Mode for System Recovery


Type 3 and press enter to start maintenance mode.
   The next screen you should see is :-

                    Maintenance 

Type the number of your choice and press Enter.

>>> 1 Access a Root Volume Group 
    2 Copy a System Dump to Removable Media
    3 Access Advanced Maintenance Functions
    4 Install from a System Backup

>>> Choice [1]: 

Type 4 and press enter to install from a system backup.
   The next screen you should see is :-

                    Choose Tape Drive

Type the number of the tape drive containing the system backup to be
installed and press Enter.

      Tape Drive                     Path Name

>>> 1 tape/scsi/ost                  /dev/rmt0

>>> Choice [1]:  

Type the number that corresponds to the tape drive that the mysysb tape 
   is in and press enter.
   The next screen you should see is :-

                      Welcome to Base Operating System
                      Installation and Maintenance

Type the number of your choice and press Enter.  Choice is indicated by >>>.

>>> 1 Start Install Now with Default Settings

    2 Change/Show Installation Settings and Install

    3 Start Maintenance Mode for System Recovery


                       +-----------------------------------------------------
    88  Help ?         |Select 1 or 2 to install from tape device /dev/rmt0
    99  Previous Menu  |
                       | 
>>> Choice [1]: 

You can now follow your normal mksysb restore procedures.


9.16 HP-UX make_net_recovery:
----------------------------- 

Backup & Restore HPUX systems using Ignite

YKv467er
----------------------------------------------------------

To protect your system data, create an operating system recovery image to be used 
in the event of cold-install or update problems. The Ignite-UX server has two commands 
you can use to create an operating system recovery image:

-make_net_recovery

Use this command to create an operating system recovery image and store it on an 
Ignite-UX server on the network. This command works on any system that has Ignite-UX installed.

-make_tape_recovery

Use this command to create an operating system recovery image on a bootable recovery tape. 
This command works on any system that has a local tape drive and Ignite-UX installed. 
This command also works on any system without an Ignite-UX server.



===============
Note 1:
===============

it's a short howto to do an ignite backup on hpux. 
i think ignite backups are one of the best way to do a backup on hpux, 
because you can make a full restore because the tape is bootable and you can do 
a single file restore, if you need it.

Example: make the backup:

# make_tape_recovery -x inc_entire=vg00 -I -v -a /dev/rmt/4mn 

# make_tape_recovery -x inc_entire=vg00 -x inc_entire=vg01

this command makes a bootable ignite backup from vg00 and vg01 to the default tape device (/dev/rmt/0mn). 
A few interesting options are: 
-a, you can define a tape device; -d, you can define a description; -t, you can define a tape label.

The 'n' at the end of a device name like 4mn, means no-rewind. 

other examples:

# /opt/ignite/bin/make_tape_recovery -Av

where A specifies the entire root disk or volume group and v is for verbose mode. 
Also, you can specify more than one volume group with the -x option.

If you intend to use a tape drive other than the default (/dev/rmt/0m), modify the command 
to point to the device you want to use, for example, a tape drive at /dev/rmt/3mn:

# /opt/ignite/bin/make_tape_recovery -Av -a /dev/rmt/3mn

Restore:

To recover a failed system disk or volume group after an operating system recovery tape 
has been made, simply load the recovery tape, boot the system and interrupt 
the boot sequence to redirect to the tape drive. Allow the install process to complete. 
Do not intervene. The system will reboot and, because map files for all associated 
volume groups have been saved on the tape, any other existing volume groups are 
imported and mounted automatically. Data that is not in the root volume group 
must be backed up and recovered using normal backup utilities.

verify the backup:

the more important thing then to do the backup is to check if it's ok. 
in my experience this is the best way.

first rewind the tape with:

# mt rew

then restore the boot part of the tape to a file to check if it's fine with lifls.

# dd if=/dev/rmt/0mn of=/tmp/bootimage bs=2k

# lifls -l /tmp/bootimage

second lets check the content:

rewind the tape:

# mt rew

forward the tape to the second file


# mt fsf 1

list all files on the tape and write it to file. afterwards you can compare the file list with 
the reference which you choose.

# tar -tvf /dev/rmt/0mn > /tmp/files_on_tape.txt 

there different opinions how to verify that all files are on the tape which 
you want to backup. many people compare it only with the ignite 
flist (/var/opt/ignite/recovery/latest/flist), this file contains a list of 
all backuped files by ignite. i prefer to compare it with the real filesystem, 
with the real files which i want to backup. so i do find over the filesystem which 
I have backuped and compare this with the tape. mostly there are some discrepancies, 
of some temporary unimportant files between the filesystem and the tape. 
decide by your self which way you prefer and and which tools you use for it. 
i just use some unix standard tools like find, sort, diff, sdiff.....

restore backup:

to do a full restore just put the tape in and boot the server from the tape. 
If it's finished check if the amount of the mirror copies at the lvols are correct, 
this is one thing which is normally not fine recovered.

to restore a single file put in the tape do

# mt fsf 1

and then use the normal tar command to restore what you need.


===============
Note 2:
===============

how to make IGNITE-BACKUP in HP-UX?

First check tape is insert or not
# ioscan -fnC tape
To check the tape status
# mt -f /dev/rmt/0mn status
Make tape recovery
In order to copy ALL of vg00 - use the following command: 

# make_tape_recovery -v -A -I -x inc_entire=vg00 -a /dev/rmt/0mn 




===============
Note 3:
===============

Hi I have a spare disk for my server so was thinking of configuring for backup / restore usage. 
Can we actually use ignite backup/restore to/from disk? 

Use "make_net_recovery" ....
 
Ignite's primary usage is as a disaster recovery tool. It is designed to rebuild 
the root volume of a server so that you can then go on and use your 
regular backup/restore software to recover all your data.

Based on that assumption, that it is a recovery tool, it doesn't really make sense 
to use it to back a server up to itself. People tend to think of Ignite as a backup/restore tool 
when that is hardly the case, at least as originally (and best) intended


Select a server, install the Ignite package there.

You can use the GUI interface ( ignite ) to then add a system and kick of an image creation. 
Using the GUI first simplifies the config and installation of the ignite utility on the target.

Once done you can run "make_net_recovery" on the target as often as you like. 
From there it will use the configs you specified during the GUI setup.

For booting a new system from the ignite server it depends on the type of hardware, 
PARISC is diff than IPF. Either way you boot from the lan specifying the ignite server 
and from there it is a standard ignite installation.

For cloning, there a number of ways, golden image for one. 
I would read the Ignite chapter on golden images as you need to copy 
your instalation media to the Ignite server. 

A simple trick for cloning one system to another is as follows ( some details are in the Ignite doc).

Say ServerA is the source image and ServerB is the new system.

create a directory in /var/opt/ignite/clients/0xMACOFNEWSERVER
chown it to bin:bin
ln -s /var/opt/ignite/clients/0xMACOFNEWSERVER hostname

cd /var/opt/ignite/clients/ServerA
find CINDEX recovery |cpio -pmdv ../ServerB

Boot ServerB from LAN. Select Ignite server installation, 
fill in the ignite server information, hostname, IP etc...

Ignite will now recover serverB using the archive of ServerA. Change the hostname,IP, filesystems, 
etc, via the Ignite installer menus.

There are a number of options and alternate methods. Let us know if you have questions.

===============
Note 4:
===============

HP-UX have a software to backup the OS called “Ignite”. It bundled with OS installed. 
The easy way to make a OS backup is using tape data cartridge. Just plug the tape to tape library, 
and run command “make_tape_recovery -AI”. 

To restore OS using recovery tape, go to MP (management processor) and boot to tape (Sequential media) 
and follow the intructions.

To simplify manage the archive of OS backup, we can utilize “Ignite Server”. 
We can make OS backup to disk (via network) in another server (Ignite Server). 
The command is 

# make_net_recovery -s ignite_server -x inc_entire=vg00

To recovery OS using ignite server (Client and Ignite server in same IP subnet):

1. Boot the failed system to Ignite server (boot lan install or boot lan.ip_ignite_server install)

2. Do not interact with ISL

3. Select “Install HP-UX” and follow the instructions to fill about server configurations (network, hostname, etc)

4. Select recovery configurations to use

If clients to use is different subnet, we must use a boot helper server. 
the boot helper server have same subnet with client server. to setting up boot helper:

* for example: IP Ignite server is 10.2.2.1 & GW 10.2.2.254, 
OS to be restored is restore01 and target client is client01 (10.2.3.1 & GW 10.2.3.254)

1. Make sure the boot helper server have Ignite software with same version with client & Ignite server

2. Point the installation to Ignite server

# instl_adm -t 10.2.2.1

3. To verify the correct configuration in boot helper, run command instl_adm -d

4. Specify the temporary ip address at boot helper that client can use to boot. Edit /etc/opt/ignite/instl_boottab

5. Copy the CINDEX & recovery directory from source (OS to be restored) to target client (server use to restore).

at Ignite Server:

# cd /var/opt/ignite/clients/restore01

# find CINDEX recovery | cpio -pvdma ../client01

# edit CINDEX file at target client (/var/opt/ignite/clients/client01/CINDEX)

- make a full path for system_cfg, control_cfg & archive_cfg. (/var/opt/ignite/clients/client01/../…..)

6. Boot to boot helper or direct to ignite server

* refer to step mentioned above.

===============
Note 5:
===============

make_recovery -AC -d /dev/rmt/0m 

according to hp-ux documentation; 

make_recovery(103): The make_recovery command will be replaced by make_tape_recovery. 
Both commands are supported in this release (Ignite-UX Revision 3.2). Please read 
the man page make_tape_recovery(1M) for usage of the new command. 
In a future release, make_recovery will be replaced by a script that calls make_tape_recovery


===============
Note 6:
===============

1. Boot the system
2. When it asks for a key interuption for interupting the boot sequence press any key.
3. This will give you ISL prompt.
4. Do a search for the devices. i.e SEA command.
5. Check which is a tape device (it will be mentioned as a Sequencial Device).
6. On the ISL Prompt give Command : BO (Tape device Name i.e. P1 , P2 etc.).
7. Press N for IPL.
8. It will boot from the tape.
9. Here you can select interactive or non-interactive recovery 
(this depends on if you have given options while creating the backup).  
For a restore I used the non-interactive mode.

A menu appears. Choose "Install HP-UX". This will recover your system. 
If you are restoring to the same system you made the ignite tape from, you can use 
the non-interactive recovery. If you are restoring to a different system, it is best to use the advanced installation.



===============
Note 7:
===============

On PA RISC:


- To boot HP-UX in single-user mode:

ISL> hpux -is boot /stand/vmunix


- To boot HP-UX at the default run level:

ISL> hpux boot /stand/vmunix


Show boot device related information:

# lvlnboot -v `vgdisplay | grep "VG Name" | awk '{print $3}'`
# setboot
# kmpath
#cat /etc/fstab  and notice the "/" and "/stand" partitions/disks
# lifls -l device
# diskinfo device


===============
Note 8:
===============


There are two ways you can recover from a tape with make_net_recovery. The method you choose depends on your needs.

- Use make_medialif
This method is useful when you want to create a totally self-contained recovery tape. The tape will be bootable 
and will contain everything needed to recover your system, including the archive of your system. During recovery, 
no access to an Ignite-UX server is needed. Using make_medialif is described beginning on 
"Create a Bootable Archive Tape via the Network" and also on the Ignite-UX server in the file: 
/opt/ignite/share/doc/makenetrec.txt

- Use make_boot_tape
This method is useful when you do not have the ability to boot the target machine via the network, but are still 
able to access the Ignite-UX server via the network for your archive and configuration data. This could happen 
if your machine does not support network boot or if the target machine is not on the same subnet as the 
Ignite-UX server. In these cases, use make_boot_tape to create a bootable tape with just enough information 
to boot and connect with the Ignite-UX server. The configuration files and archive are then retrieved from the 
Ignite-UX server. See the make_boot_tape(1M) manpage for details. 


-- make_boot_tape:

make_boot_tape(1M)                                       make_boot_tape(1M)

 NAME
      make_boot_tape - make a bootable tape to connect to an Ignite-UX
      server

 SYNOPSIS
      /opt/ignite/bin/make_boot_tape [-d device-file-for-tape] [-f config-
           file] [-t tmpdir] [-v]

      /opt/ignite/bin/make_boot_tape [-d device-file-for-tape] [-g gateway]
           [-m netmask] [-t tmpdir] [-v]

 DESCRIPTION
      The tape created by make_boot_tape is a bootable tape that contains
      just enough information to boot the system and then connect to the
      Ignite-UX server where the tape was created.  Once the target system
      has connected with the Ignite-UX server, it can be installed or
      recovered using Ignite-UX.  The tape is not a fully self-contained
      install tape; an Ignite-UX server must also be present.  The
      configuration information and software to be installed on the target
      machine reside on the Ignite-UX server, not on the tape.  If you need
      to build a fully self-contained recovery tape, see make_recovery(1m)
      or make_media_lif(1m).

      make_boot_tape is used in situations when you have target machines
      that cannot boot via the network from the Ignite-UX server.  This
      happens either because the machine does not support booting from the
      network or because it is not on the same subnet as the Ignite-UX
      server.  In this case, booting from a tape generated by make_boot_tape
      means you do not need to set up a boot helper system.  A tape created
      by make_boot_tape can be used to kick off a normal Ignite-UX
      installation.  It can also be used to recover from recovery
      configurations saved on the Ignite-UX server.

      There is no "target-specific" information on the boot tape.  Only
      information about the Ignite-UX server is placed on the tape.  Thus,
      it is possible to initiate an installation of any target machine from
      the same boot tape provided that the same Ignite-UX server is used.
      Likewise, the target machine can be installed with any operating
      system configuration that is available on the Ignite-UX server.

      Typically, the make_boot_tape command is run from the Ignite-UX server
      that you wish to connect with when booting from the tape later on.

      A key file that contains configuration information is called
      INSTALLFS. This file exists on the Ignite-UX server at
      /opt/ignite/boot/INSTALLFS and is also present on the tape created by
      make_boot_tape. See instl_adm(4) for details on the configuration file
      syntax.  Unless the -f option is used, the configuration information
      already present in the INSTALLFS file is used on the tape as well.
      The make_boot_tape command will never alter the INSTALLFS file on the
      Ignite-UX server; it will only change the copy that is placed on the
      tape.

Examples:
---------

      Create a boot tape on the default tape drive (/dev/rmt/0m).

          # make_boot_tape

      Create a boot tape on a specified (non-default) tape drive. Create a
      DDS1 device file for the tape drive first.  Show as much information
      about the tape creation as is possible.

           ioscan -fC tape     # to get the hardware path
           mksf -v -H <hardware path> -b DDS1 -n -a
           make_boot_tape -d /dev/<devfile created by mksf> -v

      Create a boot tape and replace the configuration information contained
      in the INSTALLFS file.  Use the /tmp directory for all temporary files
      instead of the default /var/tmp.

          # instl_adm -d > tmp_config_file
           ## edit tmp_config_file as appropriate
          # make_boot_tape -f tmp_config_file -t /tmp

      Create a boot tape and specify a different gateway IP address.  Set
      the netmask value as well. All other configuration information is from
      what is already in /opt/ignite/boot/INSTALLFS.

          # make_boot_tape -g 15.23.34.123 -m 255.255.248.0




9.17 /etc/dumpdates
-------------------

On some unixes the /etc/dumpdates file exists, for example, Solaris.

Purpose of the /etc/dumpdates File
The ufsdump command, when used with the -u option, maintains and updates the /etc/dumpdates file. 
Each line in the /etc/dumpdates file shows the following information:

The file system backed up
The dump level of the last backup
The day, date, and time of the backup

For example:

# cat /etc/dumpdates
/dev/rdsk/c0t0d0s0               0 Wed Jul 28 16:13:52 2004
/dev/rdsk/c0t0d0s7               0 Thu Jul 29 10:36:13 2004
/dev/rdsk/c0t0d0s7               9 Thu Jul 29 10:37:12 2004 


When you do an incremental backup, the ufsdump command checks the /etc/dumpdates file to find the date 
of the most recent backup of the next lower dump level. Then, this command copies to the media all files that were modified 
since the date of that lower-level backup. After the backup is complete, a new information line, which describes the backup 
you just completed, replaces the information line for the previous backup at that level. 

Use the /etc/dumpdates file to verify that backups are being done. This verification is particularly important 
if you are having equipment problems. If a backup cannot be completed because of equipment failure, the backup 
is not recorded in the /etc/dumpdates file.

If you need to restore an entire disk, check the /etc/dumpdates file for a list of the most recent dates and levels 
of backups so that you can determine which tapes you need to restore the entire file system.


9.18 UFS snapshot on Solaris
----------------------------

UFS Snapshots Overview
The Solaris release includes the fssnap command for backing up file systems while the file system is mounted. You can use 
the fssnap command to create a read-only snapshot of a file system. A snapshot is a file system's temporary image that is 
intended for backup operations.

When the fssnap command is run, it creates a virtual device and a backing-store file. You can back up the virtual device, 
which looks and acts like a real device, with any of the existing Solaris backup commands. The backing-store file is a bitmap file 
that contains copies of presnapshot data that has been modified since the snapshot was taken.

Why Use UFS Snapshots?
The UFS snapshots feature enables you to keep the file system mounted and the system in multiuser mode during backups. 
Previously, you were advised to bring the system to single-user mode to keep the file system inactive when you used 
the ufsdump command to perform backups. You can also use additional Solaris backup commands, such as tar and cpio, 
to back up a UFS snapshot for more reliable backups.

The fssnap command gives administrators of nonenterprise-level systems the power of enterprise-level tools, 
such as Sun StorEdgeT Instant Image, without the large storage demands.

The UFS snapshots feature is similar to the Instant Image product. Although UFS snapshots can make copies of large file systems, 
Instant Image is better suited for enterprise-level systems. UFS snapshots is better suited for smaller systems. Instant Image allocates 
space equal to the size of the entire file system that is being captured. However, the backing-store file that is created by UFS snapshots 
occupies only as much disk space as needed.

Example of how to use it:

# fssnap -F ufs -o bs=/backing-store-file /file-system

Obviously, the backing-store file must reside on a different file system than the file system that is being captured 
using UFS snapshots.

The following example shows how to create a snapshot of the /usr file system. 
The backing-store file is /scratch/usr.back.file. The virtual device is /dev/fssnap/1.

# fssnap -F ufs -o bs=/scratch/usr.back.file /usr
/dev/fssnap/1
 
You can display the current snapshots on the system by using the fssnap -i option. If you specify a file system, 
you see detailed information about that snapshot. If you don't specify a file system, you see information about all 
of the current UFS snapshots and their corresponding virtual devices.

List all current snapshots:

For example:

# /usr/lib/fs/ufs/fssnap -i
Snapshot number               : 0
Block Device                  : /dev/fssnap/0
Raw Device                    : /dev/rfssnap/0
Mount point                   : /usr
Device state                  : idle
Backing store path            : /var/tmp/snapshot3
Backing store size            : 256 KB
Maximum backing store size    : Unlimited
Snapshot create time          : Wed Oct 08 10:38:25 2003
Copy-on-write granularity     : 32 KB
Snapshot number               : 1
Block Device                  : /dev/fssnap/1
Raw Device                    : /dev/rfssnap/1
Mount point                   : /
Device state                  : idle
Backing store path            : /tmp/bs.home
Backing store size            : 448 KB
Maximum backing store size    : Unlimited
Snapshot create time          : Wed Oct 08 10:39:29 2003
Copy-on-write granularity     : 32 KB

 

19.19 Recovery of the root filesystem on Solaris:
=================================================

Note 1:
------


Restoring the root (/) File System

-- To restore the / (root) file system, boot from the Solaris CD-ROM and then run ufsrestore.

If / (root), /usr, or the /var file system is unusable because of some type of corruption the system will not boot.

The following procedure demonstrates how to restore the / (root) file system which is assumed to be on boot disk c0t0d0s0.

1. Insert the Solaris 8 Software CD 1, and boot the CD-ROM with the single-user mode option. 

ok boot cdrom -s

2. Create the new file system structure.

# newfs /dev/rdsk/c0t0d0s0

3. Mount the file system to an empty mount point directory, /a and change to that directory.

# mount /dev/dsk/c0t0d0s0 /a
# cd /a

4. Restore the / (root) file system from its backup tape.

# ufsrestore rf /dev/rmt/0

Note - Remember to always restore a file system starting with the level 0 backup tape and continuing with the next lowest level 
tape up through the highest level tape.

5. Remove the restoresymtable file.

# rm restoresymtable

6. Install the bootblk in sectors 1-15 of the boot disk. Change to the directory containing the bootblk, and run the installboot command.

# cd /usr/platform/`uname -m`/lib/fs/ufs
# installboot bootblk /dev/rdsk/c0t0d0s0 


7. Unmount the new file system.

# cd /
# umount /a

8. Use the fsck command to check the restored file system.

# fsck /dev/rdsk/c0t0d0s0

9. Reboot the system.

# init 6

10. Perform a full backup of the file system. For example:

# ufsdump 0uf /dev/rmt/0 /dev/rdsk/c0t0d0s0

Note - Always back up the newly created file system, as ufsrestore repositions the files and changes the inode allocation. 

Restoring the /usr and /var File Systems 


-- To restore the /usr and /var file systems repeat the steps described above, except step 6. 
This step is required only when restoring the (/) root file system.

To restore a regular file system, (for example, /export/home, or /opt) back to disk, repeat the steps described above, except steps 1, 6, and 9.

Example

# newfs /dev/rdsk/c#t#d#s#
# mount /dev/dsk/c#t#d#s# /mnt
# cd /mnt
# ufsrestore rf /dev/rmt/#
# rm restoresymtable
# cd /
# umount /mnt
# fsck /dev/rdsk/c#t#d#s#
# ufsdump 0uf /dev/rmt/# /dev/rdsk/c#t#d#s#

 
Note 2:
-------



=============
10. uuencode:
=============

Unix to Unix Encoding. A method for converting files from Binary to ASCII so that they can be sent across 
the Internet via e-mail. 

Encode binary file (to uuencoded ASCII file) 

uuencode file remotefile 
uudecode file 

Example: 

Encode binary file
uuencode example example.en 

Decode encoded file
uudecode example.en 
 


uuencode converts a binary file into an encoded representation that can be sent using mail(1) . 
It encodes the contents of source-file, or the standard input if no source-file argument is given. 
The decode_pathname argument is required. The decode_pathname is included in the encoded file's header 
as the name of the file into which uudecode is to place the binary (decoded) data. 
uuencode also includes the permission modes of source-file, (except setuid , setgid, and sticky-bits), 
so that decode_pathname is recreated with those same permission modes. 

example:
The following example packages up a source tree, compresses it, uuencodes it and mails it to 
a user on another system. When uudecode is run on the target system, the file ``src_tree.tar.Z'' 
will be created which may then be uncompressed and extracted into the original tree. 

# tar cf - src_tree | compress | uuencode src_tree.tar.Z | mail sys1!sys2!user 

example:
uuencode <file_a> <file_b> > <uufile>                                  |
| note: here, file_a is encoded and a new file named uufile is produced  |
|       when you decode file uufile a file named file_b is produced      |

# uuencode dipl.doc dipl.doc >dipl.uu
Hier wird die Datei dipl.doc (z.B. ein WinWord-Dokument) in die Datei dipl.uu umgewandelt. Dabei legen wir fest, 
dasz die Datei nach dem Decodieren wieder dipl.doc heiszen soll. 

example:
uuencode long_name.tar.Z arc.trz > arc.uue


11. grep command:
=================

# grep Sally people
# grep "Sally Smith" people
# grep -v "^$" people.old > people
# grep -v "^ *$" people.old > people    # deletes all blank lines
# grep "S.* D.*" people.old > people


12. sort command:
=================

sort files by size, largest first...
# ls -al | sort +4 -r | more   

# sort +1 -2 people
# sort +2b people
# sort +2n +1 people
# sort +1 -2 *people > everybody
# sort -u +1 hardpeople softpeople > everybody  # -u=unique
# sort -t: +5 /etc/passw                        # -t field sep.

cp /etc/hosts /etc/hosts.`date +%o%b%d`


13. SED:
========

Can be used to replace a character sting with a different string.

# sed s/string/newstring file

#sed s/Smith/White/ people.old > people
#sed "s/Sally Smith/Sally White/" people.old > people

Note: depending on your shell and system, in most cases, you might need to enclose s/string/newstring by a " or a '.


you can also use a regular expression, for instance we can put a left margin of 5
spaces on the people file

# sed "s/^/     /" people.old > people
# sed "s/[0-9]*$//" people.old > people        (remove numbers)
# sed -e "s/^V^M//" filename > outputfilename 

The character after the s is the delimiter. It is conventionally a slash, because this is what ed, more, and vi use. 
It can be anything you want, however. If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin - 
you could use the backslash to quote the slash: 

sed 's/\/usr\/local\/bin/\/common\/bin/' <old >new

or use _ as a delimter

sed 's_/usr/local/bin_/common/bin_' <old >new


Example:
--------

Suppose the file cdc_LEG.sql contains the following:

    spool Publisher.06.PublisherDefineChangeTable.tdba_cdc.cdc_LEG.log ;

    connect / as sysdba ;

    grant all on rm_live.LEG to tdba_cdc ;

    prompt   User: tdba_cdc ;
    connect tdba_cdc ;

    begin
      dbms_cdc_publish.create_change_table
      ( owner             => 'tdba_cdc'
      , change_table_name => 'cdc_LEG'
      , change_set_name   => 'BODI_CDC_SET'
      , source_schema     => 'rm_live'
      , source_table      => 'LEG'
      , column_type_list  => '  IDFLT NUMBER(9)   ,  IDLEG NUMBER(9)   ,  LEGDATE DATE   ,  IDLEGDATA NUMBER(9)   ,  CANCELLED CHAR(1)   ,  IDWORKSET NUMBER(9)   ,  IDTEXTTTS NUMBER(9)   ,  IDSEGMENTDATACOMBINE NUMBER(9) '
      , capture_values    => 'both'
      , source_colmap     => 'y'
      , target_colmap     => 'y'
      , options_string    => 'tablespace  tdba_cdc'
      ) ;
    end ;
    /

    grant select on tdba_cdc.cdc_LEG to bodi_cdc ;


Now we want to replace the "connect tdba_cdc" by "connect tdba_cdc/tdba_cdc"

Try:

#sed 's!connect tdba_cdc!connect tdba_cdc/tdba_cdc!' cdc_LEG.sql > cdc_LEG.txt
#sed 's/playroca/accproca!' cdc_LEG.sql > cdc_LEG.txt

gives:

    spool Publisher.06.PublisherDefineChangeTable.tdba_cdc.cdc_LEG.log ;

    connect / as sysdba ;

    grant all on rm_live.LEG to tdba_cdc ;

    prompt   User: tdba_cdc ;
    connect tdba_cdc/tdba_cdc ;

    begin
      dbms_cdc_publish.create_change_table
      ( owner             => 'tdba_cdc'
      , change_table_name => 'cdc_LEG'
      , change_set_name   => 'BODI_CDC_SET'
      , source_schema     => 'rm_live'
      , source_table      => 'LEG'
      , column_type_list  => '  IDFLT NUMBER(9)   ,  IDLEG NUMBER(9)   ,  LEGDATE DATE   ,  IDLEGDATA NUMBER(9)   ,  CANCELLED CHAR(1)   ,  IDWORKSET NUMBER(9)   ,  IDTEXTTTS NUMBER(9)   ,  IDSEGMENTDATACOMBINE NUMBER(9) '
      , capture_values    => 'both'
      , source_colmap     => 'y'
      , target_colmap     => 'y'
      , options_string    => 'tablespace  tdba_cdc'
      ) ;
    end ;
    /

    grant select on tdba_cdc.cdc_LEG to bodi_cdc


If you have a lot of those files, use something like

for file in `ls`
do
   sed 's!connect tdba_cdc!connect tdba_cdc/tdba_cdc!' $file > $file.sql
done


for file in `ls`
do
   echo $file
done


for file in `ls`
do
 echo "connect / as sysdba;" >> $file
done

for file in `ls`
do
   sed 's!quit!;!' $file > $file.sql
done

Other example:
--------------

If you want sed to remove a space at either side of a field, like  Albert van der Sel , Antapex.org, 5 , 20
you could use:

sed 's/[ ]*,[ ]*/,/g'
or
sed -e 's/[ ]*,[ ]*/,/g' -e 's/^[ ]*//' -e 's/[ ]*$//' file1 > file2


sed -e 's#\(00/00/0000\)[, ][, ]*$#\1,,,,,,,,,,,,,,,,,#g' file


Most common error:

Message sed: 0602-404 Function __ cannot be parsed. 
If you were trying to use the sed "substitute" command, e.g. s/a/b/, you may have forgotton the trailing delimiter. 



14. AWK:
========

When lines containing `foo' are found, they are printed, because `print $0' means print the current line:
  # awk '/foo/ { print $0 }' BBS-list

looks for all files in the ls listing that matches Nov and it prints the total of bytes:
  # ls -l | awk '$5 == "Nov" { sum += $4 }
               END { print sum }'

only print the lines containing Smith from file people:
  # awk /Smith/ people                                   

# awk '/gold/' coins.txt
# awk '/gold/ {print $0}' coins.txt
# awk '/gold/ {print $5,$6,$7,$8}' coins.txt
# awk '{if ($3 < 1980) print $3, "    ",$5,$6,$7,$8}' coins.txt


# awk '/Smith/ {print $1 "-" $3}' people
# ls -l /home | awk '{total += $5}; END {print total}'
# ls -lR /home | awk '{total += $5}; END {print total}'


Example:
--------

Suppose you have a text file with lines much longer than, for example, 72 characters,
and you want to have a file with lines with a maximum of 72 chars, then you might use awk
in the following way:

-- Shell file r13.sh:

#!/bin/bash

DIR=/cygdrive/c/exports
FILE=result24.txt

awk -f r13.awk ${DIR}/${FILE} > ${DIR}/${FILE}.new

-- r13.awk

BEGIN { maxlength=72 }
{ 
  l=length();
  if (l > 72) { 
    i=(l/72)
    for (j=0; j<i; j++) {
      printf "%s\r\n",substr($0, (j*72)+1, maxlength)
    }
  } else { 
    printf "%s\r\n",$0
  }
}  



15. tr command:
===============

Used for translating characters in a file. tr works on standard input, so if you want
to take input from a file you have to redirect standard input so that it comes from that file.

Suppose we want to replace all characters in the 
range a-z by the characters A-Z

# tr "[a-z]" "[A-Z]" < people

squeeze  muliple occurences osf a character (e.g. a space) in one
# tr -s " " people.old > people

remove blank lines:
# tr -s "\012" < people.old > people


to remove the evil microsoft carriage return.
# tr -d '\015' < original.file > new.file

# cat filename1 | tr -d "^V^M" > newfile 

#! /bin/sh
#           
#  recursive dark side repair technique
#   eliminates spaces in file names from current directory down
#    useful for supporting systems where clueless vendors promote NT
#
for name in `find . -depth -print`
do
	na=`echo "$name" | tr ' ' '_'`
	if [ "$na" != "$name" ]
	then
		echo "$name" 
	fi
done

note:

> I have finally competed setting up the samba server and setup the share
> between NT and Samba server.
> 
> However, when I open a unix text file in Windows NT using notepad, i see
> many funny characters and the text file is not in order (Just like when I
> ftp the unix text file out into NT in binary format) ...I think this has to
> be something to do with whether the file transfer is in Binary format or
> ASCII ... Is there a parameter to set for this ? I have checked the
> documents ... but couldn't find anything on this ...
> 

This is a FAQ, but it brief, it's like this. Unix uses a single newline
character to end a line ("\n"), while DOS/Win/NT use a
carriage-return/newline pair ("\r\n"). FTP in ASCII mode translates
these for you. FTP in binary mode, or other forms of file transfer, such
as Samba, leave the file unaltered. Doing so would be extremely
dangerous, as there's no clear way to isolate which files should be
translated

You can get Windows editors that understand Unix line-end conventions
(Ultra Edit is one), or you can use DOS line endings on the files, which
will then look odd from the Unix side. You can stop using notepad, and
use Wordpad instead, which will deal appropriately with Unix line
endings.

You can convert a DOS format text file to Unix with this:-

tr -d '\r' < dosfile.txt > unixfile.txt

The best solution to this seems to be using a Windows editor that can
handle working with Unix line endings.

HTH

Mike.

Note:

There are two ways of moving to a new line...carriage return, which is chr(13), 
and new line which is chr(10).  In windows you're supposed to use a sequence 
of a carriage return followed by a new line.  
For example, in VB you can use Wrap$=Chr$(13)+Chr$(10)  which creates a wrap character.


16. cut and paste:
==================

cutting columns:

# cut -c17, 18, 19 people
# cut -c17- people > phones
# cut -c1-16 people > names

cutting fields:

#cut -d" " -f1,2 people > names            # -d field seperator

paste:

# paste -d" " firstname lastname phones > people



17. mknod:
==========

mknod creates a FIFO (named pipe), character special file, or block special file with the specified name. 
A special file is a triple (boolean, integer, integer) stored in the filesystem. 
The boolean chooses between character special file and block special file. 
The two integers are the major and minor device number.

Thus, a special file takes almost no place on disk, and is used only for communication 
with the operating system, not for data storage. Often special files refer to hardware devices 
(disk, tape, tty, printer) or to operating system services (/dev/null, /dev/random).

Block special files usually are disk-like devices 
(where data can be accessed given a block number, and e.g. it is meaningful to have a block cache). 
All other devices are character special files. 
(Long ago the distinction was a different one: 
I/O to a character special file would be unbuffered, to a block special file buffered.)

The mknod command is what creates files of this type.

The argument following name specifies the type of file to make:

p   for a FIFO 
b   for a block (buffered) special file 
c   for a character (unbuffered) special file 

When making a block or character special file, the major and minor device numbers must be given 
after the file type (in decimal, or in octal with leading 0; the GNU version also allows hexadecimal 
with leading 0x). By default, the mode of created files is 0666 (`a/rw') minus the bits set in the umask.  


In /dev we find logical devices, created by the mknod command.
# mknod /dev/kbd c 11 0
# mknod /dev/sunmouse c 10 6
# mknod /dev/fb0 c 29 0


create a pipe in /dev called 'rworldlp'

# mknod /dev/rworldlp p; chmod a+rw /dev/rworldlp


If one cannot afford to buy extra disk space one can run the export and compress 
utilities simultaneously. 
This will prevent the need to get enough space for both the export file AND the 
compressed export file. Eg: 

	# Make a pipe
	mknod expdat.dmp p            # or mkfifo pipe
	# Start compress sucking on the pipe in background
	compress < expdat.dmp > expdat.dmp.Z &
	# Wait a second or two before kicking off the export
	sleep 5
	# Start the export
	exp scott/tiger file=expdat.dmp


Create a compressed export on the fly. 

        # create a named pipe
        mknod exp.pipe p
        # read the pipe - output to zip file in the background
        gzip < exp.pipe > scott.exp.gz &
        # feed the pipe
        exp userid=scott/tiger file=exp.pipe ...



Extended Example:
-----------------


# Load the cron environment
. ~/cronjobs/.profile.cron
##################################################################
compareVersionDBMS 10.2.0.1.0 10.2.0.2.0 10.2.0.3.0
##################################################################
wantedSchemas=""
wantedDatabase=""
wantedInputDir=""
##################################################################
if [ $# -ne 0 ]
then
  function showSyntaxParam
  { ( Comment  "\t[-db=<Database>] [-dir=<InputDir>] [-schema=<Schema,schema,schema>]"
    )
  }
  for param in $*
  do
    echo ${param} \
      | awk 'BEGIN {FS="="}{print $1,$2}' \
      | read flag value
    if [ "${flag}" != "-h" ] && [ "${value}" = "" ]
    then
      Error "Empty value for ${flag}"
      showSyntaxSystemParam
      showSyntaxParam
    else
      case ${flag} in
        -schema)     wantedSchemas=${value}                       ;;
        -dir)        wantedInputDir=${value}                     ;;
        *)           checkSystemParam ${param} || showSyntaxParam ;;
      esac
    fi
  done
fi
##################################################################
BlankLine
Comment "Selected options are:"
Comment "  Database         :   -db=${wantedDatabase}"
Comment "  Schema's         :   -schema=${wantedSchemas}"
Comment "  Input directory  :   -dir=${wantedInputDir}"
Line
##################################################################
if [ ${continue} = true ]
then
  if [ "${wantedDatabase}" = "" ]
  then
    BlankLine
    Error "No database specified"
    BlankLine
  else
    echo ${wantedDatabase} \
    | grep -i prod \
    | wc -l \
    | read prod
    if [ ${prod} -ne 0 ]
    then
      BlankLine
      Error "Production environment not allowed!!"
      BlankLine
    else
      moveLogFile ${wantedDatabase}
    fi
  fi
  #
  if [ "${wantedInputDir}" = "" ]
  then
    BlankLine
    Error "No input directory specified"
    BlankLine
  else
    if [ ! -d ${wantedInputDir} ]
    then
      BlankLine
      Error "Input directory ${wantedInputDir} doesn't exists"
      BlankLine
    fi
  fi
  #
  if [ "${wantedSchemas}" = "" ]
  then
    BlankLine
    Error "No schema's to load"
    BlankLine
  fi
fi
##################################################################
wantedSchemas=`echo ${wantedSchemas} | sed 's/,/ /g'`
if [ ${continue} = true ]
then
  for schema in ${wantedSchemas}
  do
    impPipeFile=${wantedInputDir}/${schema}.${currentUser}.load.pipe
    impLogFile=${wantedInputDir}/${schema}.${currentUser}.load.log
    impCompressFile=${wantedInputDir}/${schema}.data.Z
    #
    if [ ${continue} = true ]
    then
      Message "Check file permissions"
      if [ ! -w ${wantedInputDir} ]
      then
        Error "Unable to write in ${wantedInputDir}"
      fi
    fi
    #
    if [ ${continue} = true ]
    then
      rm ${impPipeFile}     2> /dev/null
      rm ${impLogFile}      2> /dev/null
    fi
    #
    if [ ${continue} = true ]
    then
      Message "Load schema ${schema} into database ${wantedDatabase}"
    fi
    #
    if [ ${continue} = true ]
    then
      Message "Create pipe for load"
      CmdCapture "mknod ${impPipeFile} p"
    fi
    #
    if [ ${continue} = true ]
    then
      if [ ! -f ${impCompressFile} ]
      then
        BlankLine
        Error "File not found: ${impCompressFile}"
        BlankLine
      fi
    fi
    #
    if [ ${continue} = true ]
    then
      Message "Start uncompression into background"
      uncompress -c < ${impCompressFile} > ${impPipeFile} &
      #
      Message "Start import"
      imp \"sys/change_on_install as sysdba\" file=${impPipeFile} log=${impLogFile} full=y statistics=always >/dev/null 2>/dev/null
      #
      Message     "Output of import"
      CmdCapture  "cat ${impLogFile}"
      #
      Message "Allowed warnings are:"
      Comment "  IMP-00017 IMP-00041 IMP-00003 ORA-14063 ORA-14048 ORA-02270"
      cat ${impLogFile} \
      | egrep '^ORA-|^ERROR|^IMP-' \
      | egrep -v 'IMP-00017|IMP-00041|IMP-00003|ORA-14063|ORA-14048|ORA-02270' \
      | wc -l \
      | read count
      if [ ${count} -ne 0 ]
      then
        Error "Problem with import !!"
      else
        Message "Import succesful"
      fi
    fi
    #
    rm ${impPipeFile} 2> /dev/null
    if [ ${continue} = true ]
    then
      Line
    fi
  done
fi

##################################################################
finish
##################################################################



18. Links:
==========

A symbolic link is a pointer or an alias to another file. The command 

# ln -s fromfile /other/directory/tolink


makes the file fromfile appear to exist at /other/directory/tolink simultaneously. 
The file is not copied, it merely appears to be a part of the file tree in two places. 
Symbolic links can be made to both files and directories. 

The usage of the link command is. 

%ln -s ActualFilename LinkFileName

Where -s indicates a symbolic link. ActualFilename is the name of the file which is to be linked to, 
and LinkFileName is the name by which the file should be known. 

You should use full paths in the command.

Example:
--------

Suppose we have the file "mvdat" in:

/opt/myprog

So if we take a look there

albert@starboss:/opt/myprog $ ls -al

total 8
drwxr-x---    2 root  system           256 Apr 21 10:12 .
drwxrwxrwx    3 root  system          4096 Apr 21 09:59 ..
-r-xr-xr-x    1 albert  beab_krn      9544 Apr 21 10:12 mvdat 

Now we want a symbolic link of that file in "/apps/myapps/bin",
as if the file also exists at that place. In fact we only make a link there. 
We can do that in the following way:

albert@starboss:/opt/myprog $ ln -s mvdat /apps/myapps/bin/mvdat


Other examples:
---------------

This example shows copying three files from a directory into the current working directory. 

    [2]%cp ~team/IntroProgs/MoreUltimateAnswer/more*
    [3]%ls -l more*
    -rw-rw-r--   1 mrblobby  mrblobby    632 Sep 21 18:12 moreultimateanswer.adb
    -rw-rw-r--   1 mrblobby  mrblobby   1218 Sep 21 18:19 moreultimatepack.adb
    -rw-rw-r--   1 mrblobby  mrblobby    784 Sep 21 18:16 moreultimatepack.ads

The three files take a total of 2634 bytes. The equivalent ln commands would be: 


    [2]%ln -s ~team/IntroProgs/MoreUltimateAnswer/moreultimateanswer.adb .
    [3]%ln -s ~team/IntroProgs/MoreUltimateAnswer/moreultimatepack.adb .
    [4]%ln -s ~team/IntroProgs/MoreUltimateAnswer/moreultimatepack.adb .
    [5]%ls -l
    lrwxrwxrwx   1  mrblobby  mrblobby     35 Sep 22 08:50 moreultimateanswer.adb ->
                     /users/team/IntroProgs/MorUltimateAnswer/moreultimateanswer.adb
    lrwxrwxrwx   1  mrblobby  mrblobby     37 Sep 22 08:49 moreultimatepack.adb ->                       
                     /users/team/IntroProgs/MorUltimateAnswer/moreultimatepack.adb
    lrwxrwxrwx   1   mrblobby  mrblobby    37 Sep 22 08:50 moreultimatepack.ads ->
                     /users/team/IntroProgs/MorUltimateAnswer/moreultimatepack.ads


     The ln utility creates a new directory entry (linked file) which has the
     same modes as the original file.  It is useful for maintaining multiple
     copies of a file in many places at once without using up storage for the
     copies; instead, a link ``points'' to the original copy.  There are two
     types of links; hard links and symbolic links.  How a link points to a
     file is one of the differences between a hard and symbolic link.

     By default, ln makes ``hard'' links.  A hard link to a file is indistin-
     guishable from the original directory entry; any changes to a file are
     effectively independent of the name used to reference the file.  Hard
     links may not normally refer to directories and may not span file sys-
     tems.

     A symbolic link contains the name of the file to which it is linked.  The
     referenced file is used when an open(2) operation is performed on the
     link.  A stat(2) on a symbolic link will return the linked-to file; an
     lstat(2) must be done to obtain information about the link.  The
     readlink(2) call may be used to read the contents of a symbolic link.
     Symbolic links may span file systems, refer to directories, and refer to
     non-existent files.




19. Relink van Oracle:
======================

info:

  showrev -p
  pkginfo -i

relink:

  mk -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk install
  mk -f $ORACLE_HOME/svrmgr/lib/ins_svrmgr.mk install
  mk -f $ORACLE_HOME/network/lib/ins_network.mk install



20. trace:
==========



20.1 truss:
-----------

A quick one: The "truss" tool on many unixes:

Here is a quick one to trace a shell script, or executable program: using "truss".

The "truss" tool is available on many unix platforms. It has many options, but a very usefull command
to trace the system calls that a script or program does is:


$ truss -o /tmp/myprg.log myprg


In this example, truss will log in the file "/tmp/myprg.log" while it traces the program "myprg".

Ofcourse, you can choose another path and logfile to trace to.

The upper command is quite good for tracing a shell script, or program, that starts up, does some work,
and then terminates. If an error occurs during runtime, it's likely that you find some pointers 
in the logfile that truss made for you.

This tool has so many options, for example, you can focus your trace on a certain library etc..
Anyway, even the upper example of truss can already be very helpfull.

So, for example, if you find in the log that truss has produced, the error "EACCES" which is
"errno 13 = Permission denied", that would really be helpfull. Obviously, your shell script or
program tries to access a certain object, to which it has insufficient permisions, 
and thus may fail. 

Be warned though, that some errno's might be found multiple times, while it's actually not
something to worry about. For example "ENOENT= No such file or directory" might be found
quite often. Here, your script or program seems to be unable to find a file or directory.
Well, if it's related to the $PATH environment variable, it could be quite reasonable.
Your shell will search your $PATH from beginning, to the end, until the object has been found.
Thus, it's quite possible that some ENOENT errors occurred.  



20.2 truss on Solaris:
----------------------

  truss -aef -o /tmp/trace svrmgrl

To trace what a Unix process is doing enter: 

  truss -rall -wall -p <PID>
  truss -p $ lsnrctl dbsnmp_start

NOTE: The "truss" command works on SUN and Sequent. Use "tusc" on HP-UX, "strace" on Linux, 
"trace" on SCO Unix or call your system administrator to find the equivalent command on your system. 
Monitor your Unix system: 

Solaris:

Truss is used to trace the system/library calls (not user calls) and signals made/received 
by a new or existing process. It sends the output to stderr. 


NOTE: Trussing a process throttles that process to your display speed. Use -wall and -rall sparingly. 
Truss usage 

    truss  -a  -e  -f  -rall  -wall  -p  
    truss  -a  -e  -f  -rall  -wall  

    -a        Show arguments passed to the exec system calls
    -e        Show environment variables passed to the exec system calls
    -f        Show forked processes 
                (they will have a different pid: in column 1)
    -rall     Show all read data (default is 32 bytes)
    -wall     Show all written data (default is 32 bytes)
    -p        Hook to an existing process (must be owner or root)
    <program> Specify a program to run
  
Truss examples 
  # truss -rall -wall -f -p <PID>
  # truss -rall -wall lsnrctl start
  # truss -aef lsnrctl dbsnmp_start


20.2 syscalls command on AIX:
-----------------------------

1. syscalls Command 

Purpose 
Provides system call tracing and counting for specific processes and the system. 

Syntax 
To Create or Destroy Buffer: 
syscalls [ [ -enable  bytes ]| -disable  ] 

To Print System Call Counts: 
syscalls -c 

To Print System Call Events or Start Tracing: 
syscalls [ -o  filename ] [ -t  ] { [ [ -p pid ] -start | -stop  ] | -x  program } 

Description 
The syscalls (system call tracing) command, captures system call entry and exit events by individual processes 
or all processes on the system. The syscalls command can also maintain counts for all system calls 
made over long periods of time. 

Notes: 
System call events are logged in a shared-memory trace buffer. The same shared memory identifier may be used 
by other processes resulting in a collision. In such circumstances, the -enable flag needs to be issued. 
The syscalls command does not use the trace daemon. 
The system crashes if ipcrm -M sharedmemid is run after syscalls has been run. 
Run stem -shmkill instead of running ipcrm -M to remove the shared memory segment.

Flags 
-c  Prints a summary of system call counts for all processes. The counters are not reset.  

-disable  Destroys the system call buffer and disables system call tracing and counting.  

-enable bytes  Creates the system call trace buffer. If this flag is not used, the syscalls command 
 creates a buffer of the default size of 819,200 bytes. Use this flag if events are not being logged 
 in the buffer. This is the result of a collision with another process using the same shared memory buffer ID.  

-o filename  Prints output to filename rather than standard out.  

-p pid  When used with the -start flag, only events for processes with this pid will be logged 
   in the syscalls buffer. When used with the -stop option, syscalls filters the data in the buffer 
   and only prints output for this pid.  

-start  Resets the trace buffer pointer. This option enables the buffer if it does not exist and resets 
        the counters to zero.  

-stop  Stops the logging of system call events and prints the contents of the buffer.  

-t  Prints the time associated with each system call event alongside the event.  

-x program  Runs program while logging events for only that process. The buffer is enabled if needed.  


Security 
Access Control: You must be root or a member of the perf group to run this command. 

Examples 
To collect system calls for a particular program, enter: 
syscalls -x /bin/ps
Output similar to the following appears: 
   PID    TTY  TIME CMD
 19841  pts/4  0:01 /bin/ksh 
 23715  pts/4  0:00 syscalls -x /bin/ps 
 30720  pts/4  0:00 /bin/ps 
 34972  pts/4  0:01 ksh
   PID   System Call          
 30720           .kfork  Exit , return=0  Call preceded tracing.
 30720          .getpid  () = 30720
 30720       .sigaction  (2, 2ff7eba8, 2ff7ebbc) = 0
 30720       .sigaction  (3, 2ff7eba8, 2ff7ebcc) = 0
 30720     .sigprocmask  (0, 2ff7ebac, 2ff7ebdc) = 0
 30720       .sigaction  (20, 2ff7eba8, 2ff7ebe8) = 0
 30720           .kfork  () = 31233
 30720        .kwaitpid  (2ff7ebfc, 31233, 0, 0) = 31233
 30720       .sigaction  (2, 2ff7ebbc, 0) = 0
 30720       .sigaction  (3, 2ff7ebcc, 0) = 0
 30720       .sigaction  (20, 2ff7ebe8, 0) = 0
 30720     .sigprocmask  (2, 2ff7ebdc, 0) = 0
 30720         .getuidx  (4) = 0
 30720         .getuidx  (2) = 0
 30720         .getuidx  (1) = 0
 30720         .getgidx  (4) = 0
 30720         .getgidx  (2) = 0
 30720         .getgidx  (1) = 0
 30720           ._load  NoFormat, (0x2ff7ef54, 0x0, 0x0, 0x2ff7ff58) = 537227760
 30720            .sbrk  (65536) = 537235456
 30720          .getpid  () = 30720

To produce a count of system calls made by all processes, enter: 
syscalls -start
followed by entering: 
syscalls -c
Output similar to the following appears: 
 System Call Counts for all processes
       5041      .lseek
       4950      .kreadv
        744      .sigaction
        366      .close
        338      .sbrk
        190      .kioctl
        120      .getuidx
        116      .kwritev
        108      .kfcntl
        105      .getgidx
         95      .kwaitpid
         92      .gettimer
         92      .select
         70      .getpid
         70      .sigprocmask
         52      .execve
         51      ._exit
         51      .kfork
         35      .open
         35      ._load
         33      .pipe
         33      .incinterval
         28      .sigreturn
         27      .access
         16      .brk 
         15      .times
         15      .privcheck
         15      .gettimerid
         10      .statx
          9      .STEM_R10string
          4      .sysconfig
          3      .P2counters_accum
          3      .shmget
          3      .shmat
          2      .setpgid
          2      .shmctl
          2      .kioctl
          1      .Patch_Demux_Addr_2
          1      .Patch_Demux_Addr_High
          1      .STEM_R3R4string
          1      .shmdt
          1      .Stem_KEX_copy_demux_entry
          1      .STEM_R3R4string
          1      .Patch_Demux_Addr_1
          1      .pause
          1      .accessx
Files 
/usr/bin/syscalls  Contains the syscalls command.  


20.3 truss command on AIX:
--------------------------

AIX 5.1,5.2,5.3


The truss command is also available for SVR4 UNIX-based environments. This command is useful for tracing 
system calls in one or more processes. In AIX 5.2, all base system call parameter types are now recognized. 
In AIX 5.1, only about 40 system calls were recognized. 

Truss is a /proc based debugging tool that executes and traces a command, or traces an existing process. 
It prints names of all system calls made with their arguments and return code. System call parameters are 
displayed symbolically. It prints information about all signals received by a process. The AIX 5.2 version 
supports library calls tracing. For each call, it prints parameters and return codes. 
It can also trace a subset of libraries and a subset of routines in a given library. The timestamps on each line 
are also supported.

In AIX 5.2, truss is packaged with bos.sysmgt.serv_aid, which is installable from the AIX base installation media. 
See the command reference for details and examples, or use the information below. 


A good and simple way to use truss is using a command like shown in section 20.1:

# truss - o path_to_logfile executable_to_truss

For example:

# truss -o /tmp/myprg.log myprg

The upper command is quite good for tracing a shell script, or program, that starts up, does some work,
and then terminates. If an error occurs during runtime, it's likely that you find some pointers 
in the logfile that truss made for you.


Further notes:

-a Displays the parameter strings that are passed in each executed system call. 

# truss -a  sleep

execve("/usr/bin/sleep", 0x2FF22980, 0x2FF22988)  argc: 1
argv: sleep
sbrk(0x00000000)                                = 0x200007A4
sbrk(0x00010010)                                = 0x200007B0
getuidx(4)                                             = 0
.
.
__loadx(0x01000080, 0x2FF1E790, 0x00003E80, 0x2FF22720, 0x00000000) = 
   0xD0077130 access("/usr/lib/nls/msg/en_US/sleep.cat", 0)   = 0
_getpid()                                       = 31196
open("/usr/lib/nls/msg/en_US/sleep.cat", O_RDONLY) = 3
kioctl(3, 22528, 0x00000000, 0x00000000)        Err#25 ENOTTY
kfcntl(3, F_SETFD, 0x00000001)                  = 0
kioctl(3, 22528, 0x00000000, 0x00000000)        Err#25 ENOTTY
kread(3, "\0\001 £\001\001 I S O 8".., 4096)    = 123
lseek(3, 0, 1)                                  = 123
lseek(3, 0, 1)                                  = 123
lseek(3, 0, 1)                     	             = 123
_getpid()                         	             = 31196
lseek(3, 0, 1)                     	             = 123
Usage: sleep Seconds
kwrite(2, " U s a g e :   s l e e p".., 21) 	    = 21
kfcntl(1, F_GETFL, 0x00000000)           	    = 2
kfcntl(2, F_GETFL, 0x00000000)           	    = 2
_exit(2)


-c Counts traced system calls, faults, and signals rather than displaying trace results line by line. 
A summary report is produced after the traced command terminates or when truss is interrupted. 
If the -f flag is also used, the counts include all traced Syscalls, Faults, and Signals for child processes. 
 
# truss -c ls

	syscall			seconds   	calls  errors
execve				.00		1
__loadx			.00	      17
_exit				.00		1
close				.00		2
kwrite				.00		5
lseek                     	.00		1
setpid                   	.00	       1
getuidx                   	.00	      19
getdirent                 	.00	       3
kioctl                    	.00	       3
open                      	.00	       1
statx                     	.00	       2
getgidx                   	.00	      18
sbrk                      	.00	       4
access                    	.00	       1
kfcntl                    	.00	       6
                         	----	     ---    ---
sys totals:               	.01	      85      0
usr time:                 	.00
elapsed:                  	.01


More truss examples:
--------------------

truss -o /tmp/tst -p 307214

root@zd93l14:/tmp#cat tst
                                                = 0
_nsleep(0x4128B8E0, 0x4128B958)                 = 0
_nsleep(0x4128B8E0, 0x4128B958)                 = 0
_nsleep(0x4128B8E0, 0x4128B958)                 = 0
_nsleep(0x4128B8E0, 0x4128B958)                 = 0
thread_tsleep(0, 0xF033159C, 0x00000000, 0x43548E38) = 0
thread_tsleep(0, 0xF0331594, 0x00000000, 0x434C3E38) = 0
thread_tsleep(0, 0xF033158C, 0x00000000, 0x4343FE38) = 0
thread_tsleep(0, 0xF0331584, 0x00000000, 0x433BBE38) = 0
thread_tsleep(0, 0xF0331574, 0x00000000, 0x432B2E38) = 0
thread_tsleep(0, 0xF033156C, 0x00000000, 0x4322EE38) = 0
thread_tsleep(0, 0xF0331564, 0x00000000, 0x431AAE38) = 0
thread_tsleep(0, 0xF0331554, 0x00000000, 0x42F99E38) = 0
thread_tsleep(0, 0xF033154C, 0x00000000, 0x4301DE38) = 0
thread_tsleep(0, 0xF0331534, 0x00000000, 0x42E90E38) = 0
thread_tsleep(0, 0xF033152C, 0x00000000, 0x42E0CE38) = 0
thread_tsleep(0, 0xF033157C, 0x00000000, 0x43337E38) = 0
thread_tsleep(0, 0xF0331544, 0x00000000, 0x42F14E38) = 0
                                                = 0
thread_tsleep(0, 0xF033153C, 0x00000000, 0x42D03E38) = 0
_nsleep(0x4128B8E0, 0x4128B958)                 = 0



20.4 man pages for truss AIX:
-----------------------------

Purpose

Traces a process's system calls, dynamically loaded user level function calls,
received signals, and incurred machine faults.

Syntax

truss [ -f] [ -c] [ -a] [ -l ] [ -d ] [ -D ] [ -e] [ -i] [ { -t | -x} [!]
Syscall [...] ] [ -s [!] Signal [...] ] [ { -m }[!] Fault [...]] [ { -r | -w}
[!] FileDescriptor [...] ] [ { -u } [!]LibraryName [...]:: [!]FunctionName [ ...
] ] [ -o Outfile] {Command| -p pid [. . .]}

Description

The truss command executes a specified command, or attaches to listed process
IDs, and produces a trace of the system calls, received signals, and machine
faults a process incurs. Each line of the trace output reports either the Fault
or Signal name, or the Syscall name with parameters and return values. The
subroutines defined in system libraries are not necessarily the exact system
calls made to the kernel. The truss command does not report these subroutines,
but rather, the underlying system calls they make. When possible, system call
parameters are displayed symbolically using definitions from relevant system
header files. For path name pointer parameters, truss displays the string being
pointed to. By default, undefined system calls are displayed with their name,
all eight possible argments and the return value in hexadecimal format.

When the -o flag is used with truss, or if standard error is redirected to a
non-terminal file, truss ignores the hangup, interrupt, and signals processes.
This facilitates the tracing of interactive programs which catch interrupt and
quit signals from the terminal.

If the trace output remains directed to the terminal, or if existing processes
are traced (using the -p flag), then truss responds to hangup, interrupt, and
quit signals by releasing all traced processes and exiting. This enables the
user to terminate excessive trace output and to release previously existing
processes. Released processes continue to function normally.

Flags

-a Displays the parameter strings which are passed in each executed system call.

-c Counts traced system calls, faults, and signals rather than displaying trace
results line by line. A summary report is produced after the traced command
terminates or when truss is interrupted. If the -f flag is also used, the counts
include all traced Syscalls, Faults, and Signals for child processes.

-d A timestamp will be included with each line of output. Time displayed is in
seconds relative to the beginning of the trace. The first line of the trace
output will show the base time from which the individual time stamps are
measured. By default timestamps are not displayed.

-D Delta time is displayed on each line of output. The delta time represents the
elapsed time for the LWP that incurred the event since the last reported event
incurred by that thread. By default delta times are not displayed.

-e Displays the environment strings which are passed in each executed system
call.

-f Follows all children created by the fork system call and includes their
signals, faults, and system calls in the trace output. Normally, only the
first-level command or process is traced. When the -f flag is specified, the
process id is included with each line of trace output to show which process
executed the system call or received the signal.

-i Keeps interruptible sleeping system calls from being displayed. Certain
system calls on terminal devices or pipes, such as open and kread, can sleep for
indefinite periods and are interruptible. Normally, truss reports such sleeping
system calls if they remain asleep for more than one second. The system call is
then reported a second time when it completes. The -i flag causes such system
calls to be reported only once, upon completion.

-l Display the id (thread id) of the responsible LWP process along with truss
output. By default LWP id is not displayed in the output.

-m [!]Fault Traces the machine faults in the process. Machine faults to trace
must be separated from each other by a comma. Faults may be specified by name or
number (see the sys/procfs.h header file). If the list begins with the "!"
symbol, the specified faults are excluded from being traced and are not
displayed with the trace output. The default is -mall -m!fltpage.

-o Outfile Designates the file to be used for the trace output. By default, the
output goes to standard error.

-p Interprets the parameters to truss as a list of process ids for an existing
process rather than as a command to be executed. truss takes control of each
process and begins tracing it, provided that the user id and group id of the
process match those of the user or that the user is a privileged user.

-r [!] FileDescriptor Displays the full contents of the I/O buffer for each read
on any of the specified file descriptors. The output is formatted 32 bytes per
line and shows each byte either as an ASCII character (preceded by one blank) or
as a two-character C language escape sequence for control characters, such as
horizontal tab (\t) and newline (\n). If ASCII interpretation is not possible,
the byte is shown in two-character hexadecimal representation. The first 16
bytes of the I/O buffer for each traced read are shown, even in the absence of
the -r flag. The default is -r!all.

-s [!] Signal Permits listing Signals to trace or exclude. Those signals
specified in a list (separated by a comma) are traced. The trace output reports
the receipt of each specified signal even if the signal is being ignored, but
not blocked, by the process. Blocked signals are not received until the process
releases them. Signals may be specified by name or number (see sys/signal.h). If
the list begins with the "!" symbol, the listed signals are excluded from being
displayed with the trace output. The default is -s all.

-t [!] Syscall Includes or excludes system calls from the trace process. System
calls to be traced must be specified in a list and separated by commas. If the
list begins with an "!" symbol, the specified system calls are excluded from the
trace output. The default is -tall.

-u [!] [LibraryName [...]::[!]FunctionName [...] ]

Traces dynamically loaded user level function calls from user libraries. The
LibraryName is a comma-separated list of library names. The FunctionName is a
comma-separated list of function names. In both cases the names can include
name-matching metacharacters *, ?, [] with the same meanings as interpreted by
the shell but as applied to the library/function name spaces, and not to files.

A leading ! on either list specifies an exclusion list of names of libraries or
functions not to be traced. Excluding a library excludes all functions in that
library. Any function list following a library exclusion list is ignored.
Multiple -u options may be specified and they are honored left-to-right. By
default no library/function calls are traced.

-w [!] FileDescriptor Displays the contents of the I/O buffer for each write on
any of the listed file descriptors (see -r). The default is -w!all.

-x [!] Syscall Displays data from the specified parameters of traced sytem calls
in raw format, usually hexadecimal, rather than symbolically. The default is
-x!all.

Examples

  1. To produce a trace of the find command on the terminal, type:

     truss find . -print >find.out

  2. To trace the lseek, close, statx, and open system calls, type:

     truss -t lseek,close,statx,open find . -print > find.out

  3. To display thread id along with regular output for find command, enter:
     truss -l find . -print >find.out

  4. To display timestamps along with regular output for find command, enter:
     truss -d find . -print >find.out

  5. To display delta times along with regular output for find command, enter:
     truss -D find . -print >find.out

  6. To trace the malloc() function call and exclude the strlen() function call
     in the libc.a library while running the ls command, enter:
     truss -u libc.a::malloc,!strlen ls

  7. To trace all function calls in the libc.a library with names starting with
     "m" while running the ls command, enter:
     truss -u libc.a::m*,!strlen ls

  8. To trace all function calls from the library libcurses.a and exclude calls
     from libc.a while running executable foo, enter:
     truss -u libcurses.a,!libc.a::* foo

  9. To trace the refresh() function call from libcurses.a and the malloc()
     function call from libc.a while running the executable foo, enter:
      truss -u libc.a::malloc -u libcurses.a::refresh foo



20.5 Note: How to trace an AIX machine: 
---------------------------------------

The trace facility and commands are provided as part of the Software Trace Service Aids fileset
named bos.sysmgt.trace.

To see if this fileset is installed, use the following command:

# lslpp -l | grep bos.sysmgt.trace


Taking a trace:
---------------

The events traced are referenced by hook identifiers.
Each hook ID uniquely refers to a particular activity that can be traced.

When tracing, you can select the hook IDs of interest and exclude others that are
not relevant to your problem. A trace hook ID is a 3 digit hexidecimal number
that identifies an event being traced. 
Trace hook IDs are defined in the "/usr/include/sys/trchkid.h" file.

The currently defined trace hook IDs can be listed using the trcrpt command:

# trcrpt -j | sort | pg

001 TRACE ON
002 TRACE OFF
003 TRACE HEADER
004 TRACEID IS ZERO
005 LOGFILE WRAPAROUND
006 TRACEBUFFER WRAPAROUND
..
..

The trace daemon configures a trace session and starts the collection of system events. 
The data collected by the trace function is recorded in the trace log. A report from the trace log 
can be generated with the trcrpt command.

When invoked with the  -a, -x, or -X flags, the trace daemon is run asynchronously (i.e. as a background task).
Otherwise, it is run interactively and prompts you for subcommands.


Some trace examples:

# trace -adf -C all -r PURR -o trace.raw
# trace -Jfop fact proc procd filephys filepfsv filepvl filepvld locks -A786578 -Pp -a
# trace -Jfop fact proc procd filephys filepfsv filepvl filepvld locks -Pp -a
# trace -Jfop fact proc procd filephys filepfsv filepvl filepvld locks -Pp -a


Some trcrpt examples:

Examples
       1    To format the trace log file and print the result, enter:

            trcrpt | qprt
       2    To send a trace report to the /tmp/newfile file, enter:

            trcrpt -o /tmp/newfile
       3    To display process IDs and exec path names in the trace report, enter:

            trcrpt pid=on,exec=on -O /tmp/newfile 
       4    To create trace ID histogram data, enter:

            trcrpt -O hist=on
       5    To produce a list of all event groups, enter:

            trcrpt -G
            The format of this report is shown under the trcevgrp command.
       6    To generate back-to-back LMT reports from the common and rare buffers, specify:

            trcrpt -M all
       7    If, in the above example, the LMT files reside at /tmp/mydir, and we want the LMT traces to be merged, 
            specify:

            trcrpt -m -M all:/tmp/mydir
       8    To merge the system trace with the scdisk.hdisk0 component trace, specify:

            trcrpt -m -l scdisk.hdisk0 /var/adm/ras/trcfile
       9    To merge LMT with the system trace while not eliminating duplicate events, specify:

            trcrpt -O removedups=off -m -M all /var/adm/ras/trcfile
       10   To merge all component traces in /tmp/mydir with the LMT traces in the default LMT directory 
            while showing the source file for each trace event, specify:

            trcrpt -O filename=on -m -M all /tmp/mydir
            Note: This is equivalent to:

            trcrpt -O filename=on -m -M all -l all:/tmp/mydir

            Note: If the traces are from a 64-bit kernel, duplicate entries will be removed. However, 
            on the 32-bit kernel,
            duplicate entries will not be removed since we do not know the CPU IDs of the entries in the 
            components traces.


Another example of the usage of trace:
--------------------------------------


>> Obtaining a Sample Trace File

Trace data accumulates rapidly. We want to bracket the data collection as closely around the area of interest 
as possible. One technique for doing this is to issue several commands on the same command line. For example:

$ trace -a -k "20e,20f" -o ./trcraw ; cp ../bin/track /tmp/junk ; trcstop

captures the execution of the cp command. We have used two features of the trace command. The -k "20e,20f" option 
suppresses the collection of events from the lockl and unlockl functions. These calls are numerous and add volume 
to the report without adding understanding at the level we're interested in. The -o ./trc_raw option causes the 
raw trace output file to be written in our local directory.

Note: This example is more educational if the input file is not already cached in system memory. Choose as the source 
file any file that is about 50KB and has not been touched recently.


>> Formatting the Sample Trace

We use the following form of the trcrpt command for our report:

$ trcrpt -O "exec=on,pid=on" trcraw > /tmp/cp.rpt

This reports both the fully qualified name of the file that is execed and the process ID that is assigned to it.

A quick look at the report file shows us that there are numerous VMM page assign and delete events in the trace, 
like the following sequence: 

1B1 ksh            8525          0.003109888       0.162816                   VMM page delete:      V.S=00
00.150E ppage=1F7F
                                                                               delete_in_progress proce
ss_private working_storage

1B0 ksh            8525          0.003141376       0.031488                   VMM page assign:      V.S=00
00.2F33 ppage=1F7F                                                           delete_in_progress process_private working_
storage

We are not interested in this level of VMM activity detail at the moment, so we reformat the trace with:

$ trcrpt -k "1b0,1b1" -O "exec=on,pid=on" trcraw > cp.rpt2

The -k "1b0,1b1" option suppresses the unwanted VMM events in the formatted output. It saves us from having 
to retrace the workload to suppress unwanted events. We could have used the -k function of trcrpt instead of 
that of the trace command to suppress the lockl and unlockl events, if we had believed that we might need 
to look at the lock activity at some point. If we had been interested in only a small set of events, 
we could have specified -d "hookid1,hookid2" to produce a report with only those events. Since the hook ID 
is the left-most column of the report, you can quickly compile a list of hooks to include or exclude.

A comprehensive list of Trace hook IDs is defined in /usr/include/sys/trchkid.h.

>> Reading a Trace Report

The header of the trace report tells you when and where the trace was taken, as well as the command that was 
used to produce it:

Fri Nov 19 12:12:49 1993
System: AIX ptool Node: 3
Machine: 000168281000
Internet Address: 00000000 0.0.0.0
trace -ak 20e 20f -o -o ./trc_raw

The body of the report, if displayed in a small enough font, looks as follows:

ID  PROCESS NAME   PID           ELAPSED_SEC     DELTA_MSEC   APPL    SYSCALL KERNEL  INTERRUPT
101 ksh            8525          0.005833472       0.107008           kfork
101 ksh            7214          0.012820224       0.031744           execve
134 cp             7214          0.014451456       0.030464           exec cp ../bin/trk/junk

In cp.rpt you can see the following phenomena:

The fork, exec, and page fault activities of the cp process 
The opening of the input file for reading and the creation of the /tmp/junk file 
The successive read/write system calls to accomplish the copy 
The process cp becoming blocked while waiting for I/O completion, and the wait process being dispatched 
How logical-volume requests are translated to physical-volume requests 
The files are mapped rather than buffered in traditional kernel buffers, and the read accesses cause page faults that must be resolved by the Virtual Memory Manager. 
The Virtual Memory Manager senses sequential access and begins to prefetch the file pages. 
The size of the prefetch becomes larger as sequential access continues. 

When possible, the disk device driver coalesces multiple file requests into one I/O request to the drive.
The trace output looks a little overwhelming at first. This is a good example to use as a learning aid. 
If you can discern the activities described, you are well on your way to being able to use the trace facility 
to diagnose system-performance problems.

>> Filtering of the Trace Report

The full detail of the trace data may not be required. You can choose specific events of interest to be shown. 
For example, it is sometimes useful to find the number of times a certain event occurred. To answer the question 
"How many opens occurred in the copy example?" first find the event ID for the open system call. 
This can be done as follows:

$ trcrpt -j | grep -i open

You should be able to see that event ID 15b is the open event. Now, process the data from the copy example as follows:

$ trcrpt -d 15b -O "exec=on" trc_raw

The report is written to standard output, and you can determine the number of open subroutines that occurred. 
If you want to see only the open subroutines that were performed by the cp process, run the report command 
again using the following:

$ trcrpt -d 15b -p cp -O "exec=on" trc_raw

$ trcrpt -o /tmp/newfile




A Wrapper around trace:
-----------------------

Simple instructions for using the AIX trace facility

>> Five aix commands are used: 

-trace 
-trcon 
-trcoff 
-trcstop 
-trcrpt 

These are described in AIX Commands Reference, Volume 5, but hopefully you won't have to dig into that. 
Scripts to download
I've provided wrappers for the trace and trcrpt commands since there are various command-line parameters to specify. 

-atrace 
-atrcrpt 

>> Contents atrace:

# To change from the default trace file, set TRCFILE to
# the name of the raw trace file name here; this should 
# match the name of the raw trace file in atrcrpt.
# Don't do this on AIX 4.3.3 ML 10, where you'll need
# to use the default trace file, /usr/adm/ras/trcfile
#TRCFILE="-o /tmp/raw"

# trace categories not to collect
IGNORE_VMM="1b0,1b1,1b2,1b3,1b5,1b7,1b8,1b9,1ba,1bb,1bc,1bd,1be"
IGNORE_LOCK=20e,20f
IGNORE_PCI=2e6,2e7,2e8
IGNORE_SCSI=221,223
IGNORE_OTHER=100,10b,116,119,11f,180,234,254,2dc,402,405,469,7ff

IGNORE="$IGNORE_VMM,$IGNORE_LOCK,$IGNORE_PCI,$IGNORE_SCSI,$IGNORE_LVM,$IGNORE_OTHER"

trace -a -d -k $IGNORE $TRCFILE

>> Contents atrcrpt:

# To change from the default trace file, set TRCFILE to
# the name of the raw trace file name here; this should 
# match the name of the raw trace file in atrace.
# Don't do this on AIX 4.3.3 ML 10, where you'll need
# to use the default trace file, /usr/adm/ras/trcfile
# TRCFILE=/tmp/raw

# edit formatted trace file name here
FMTFILE=/tmp/fmt

trcrpt -O pid=on,tid=on,timestamp=1 $TRCFILE >$FMTFILE


Setup instructions

edit atrace and atrcrpt and ensure that names of files for raw and formatted trace are appropriate 
Please see the comments in the scripts about 4.3.3 ML 10 being broken for trcrpt, such that the default file name 
needs to be used. You may find that specifying non-default filenames does not have the desired effect. 
make atrace and atrcrpt executable via chmod 

Data collection

./atrace                 (this is my wrapper for the trace command)
trcon
(at this point we're collecting the trace; wait for a bit of time to
trace whatever the failure is)
trcoff
trcstop
./atrcrpt                (this is my wrapper for formatting the report)

After running atrcrpt, the formatted report will be in file /tmp/fmt. 

Sample section of formatted trace
Note that failing system calls generally show "error Esomething" in the race, as highlighted below. 
The second column is the process id and the third column is the thread id. Once you see something of interest 
in the trace, you may want to use grep to pull out all records for that process id, since in general the trace 
is interleaved with the activity of all the processes in the system. 

101 14690    19239              statx LR = D0174110
107 14690    19239                      lookuppn: /usr/HTTPServer/htdocs/en_US/manual/ibm/index.htmlxxxxxxxxxxx
107 14690    19239                      lookuppn: file not found
104 14690    19239              return from statx. error ENOENT [79 usec]
101 14690    19239              statx LR = D0174110
107 14690    19239                      lookuppn: /usr/HTTPServer/htdocs/en_US/manual/ibm
104 14690    19239              return from statx [36 usec]


Note about an AIX trace on Websphere:
-------------------------------------

In addition to the WebSpherer MQ trace, WebSphere MQ for AIXr users can use the standard AIX system trace. 
AIX system tracing is a two-step process: 

>> Gathering the data 
>> Formatting the results 

WebSphere MQ uses two trace hook identifiers: 

X'30D' 
This event is recorded by WebSphere MQ on entry to or exit from a subroutine. 
X'30E' 
This event is recorded by WebSphere MQ to trace data such as that being sent or received across a 
communications network. Trace provides detailed execution tracing to help you to analyze problems. 
IBMr service support personnel might ask for a problem to be re-created with trace enabled. The files produced 
by trace can be very large so it is important to qualify a trace, where possible. For example, you can optionally 
qualify a trace by time and by component.

There are two ways to run trace: 

>> Interactively. 

The following sequence of commands runs an interactive trace on the program myprog and ends the trace. 

trace -j30D,30E -o trace.file
->!myprog
->q

>> Asynchronously. 

The following sequence of commands runs an asynchronous trace on the program myprog and ends the trace. 
trace -a -j30D,30E -o trace.file
myprog
trcstop

You can format the trace file with the command: 
trcrpt -t /usr/mqm/lib/amqtrc.fmt trace.file > report.file
report.file is the name of the file where you want to put the formatted trace output.




20.6 Nice example: Tracing with truss on AIX:
---------------------------------------------

Application tracing displays the calls that an application makes to external libraries and the kernel. 
These calls give the application access to the network, the file system, and the display. By watching 
the calls and their results, you can get some idea of what the application "expects", 
which can lead to a solution.

Each UNIXr system provides its own commands for tracing. This article introduces you to truss, which Solaris 
and AIXr support. On Linuxr, you perform tracing with the strace command. Although the command-line parameters 
might be slightly different, application tracing on other UNIX flavors might go by the names ptrace, 
ktrace, trace, and tusc.

>> A classic file permissions problem

One class of problems that plagues systems administrators is file permissions. An application likely has to open 
certain files to do its work. If the open operation fails, the application should let the administrator know. 
However, developers often forget to check the result of functions or, to add to the confusion, perform the check, 
but don't adequately handle the error. For example, here's the output of an application that's failing to open:

$ ./openapp
This should never happen!


After running the fictitious openapp application, I received the unhelpful (and false) error message, 
This should never happen!. This is a perfect time to introduce truss. Listing 1 shows the same application 
run under the truss command, which shows all the function calls that this program made to outside libraries.


Listing 1. Openapp run under truss

$ truss ./openapp
execve("openapp", 0xFFBFFDEC, 0xFFBFFDF4)  argc = 1
getcwd("/export/home/sean", 1015)               = 0
stat("/export/home/sean/openapp", 0xFFBFFBC8)   = 0
open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
stat("/opt/csw/lib/libc.so.1", 0xFFBFF6F8)      Err#2 ENOENT
stat("/lib/libc.so.1", 0xFFBFF6F8)              = 0
resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/lib/libc.so.1", O_RDONLY)                = 3
memcntl(0xFF280000, 139692, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)                                        = 0
getcontext(0xFFBFF8C0)
getrlimit(RLIMIT_STACK, 0xFFBFF8A0)             = 0
getpid()                                        = 7895 [7894]
setustack(0xFF3A2088)
open("/etc/configfile", O_RDONLY)               Err#13 EACCES [file_dac_read]
ioctl(1, TCGETA, 0xFFBFEF14)                    = 0



fstat64(1, 0xFFBFEE30)                          = 0
stat("/platform/SUNW,Sun-Blade-100/lib/libc_psr.so.1", 0xFFBFEAB0) = 0
open("/platform/SUNW,Sun-Blade-100/lib/libc_psr.so.1", O_RDONLY) = 3
close(3)                                        = 0
This should never happen!
write(1, " T h i s   s h o u l d  ".., 26)      = 26
_exit(3)
 


Each line of the output represents a function call that the application made along with the return value, 
if applicable. (You don't need to know each function call, but for more information, you can call up the 
man page for the function, such as with the command man open.) To find the call that is potentially 
causing the problem, it's often easiest to start at the end (or as close as possible to where 
the problems start). For example, you know that the application outputs This should never happen!, 
which appears near the end of the output. Chances are that if you find this message and work your way up 
through the truss command output, you'll come across the problem.

Scrolling up from the error message, notice the line beginning with open("/etc/configfile"..., 
which not only looks relevant but also seems to return an error of Err#13 EACCES. Looking at the man page 
for the open() function (with man open), it's evident that the purpose of the function is to open a file 
-- in this case, /etc/configfile -- and that a return value of EACCES means that the problem is related 
to permissions. Sure enough, a look at /etc/configfile shows that the user doesn't have permissions to read 
the file. A quick chmod later, and the application is running properly.

The output of Listing 1 shows two other calls, open() and stat(), that return an error. Many of the calls 
toward the beginning of the application, including the other two errors, are added by the operating system 
as it runs the application. Only experience will tell when the errors are benign and when they aren't. 
In this case, the two errors and the three lines that follow them are trying to find the location of libc.so.1, 
which they eventually do. You'll see more about shared library problems later.



>> The application doesn't start

Sometimes, an application fails to start properly; but rather than exiting, it just hangs. This behavior is often 
a symptom of contention for a resource (such as two processes competing for a file lock), or the application 
is looking for something that is not coming back. This latter class of problems could be almost anything, 
such as a name lookup that's taking a long time to resolve, or a file that should be found in a certain spot but 
isn't there. In any case, watching the application under truss should reveal the culprit.

While the first code example showed an obvious link between the system call causing the problem and the file, 
the example you're about to see requires a bit more sleuthing. Listing 2 shows a misbehaving application 
called Getlock run under truss.


Listing 2. Getlock run under truss

$ truss ./getlock
execve("getlock", 0xFFBFFDFC, 0xFFBFFE04)  argc = 1
getcwd("/export/home/sean", 1015)               = 0
resolvepath("/export/home/sean/getlock", "/export/home/sean/getlock", 1023) = 25
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
stat("/export/home/sean/getlock", 0xFFBFFBD8)   = 0
open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
stat("/opt/csw/lib/libc.so.1", 0xFFBFF708)      Err#2 ENOENT
stat("/lib/libc.so.1", 0xFFBFF708)              = 0
resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/lib/libc.so.1", O_RDONLY)                = 3
close(3)                                        = 0
getcontext(0xFFBFF8D0)
getrlimit(RLIMIT_STACK, 0xFFBFF8B0)             = 0
getpid()                                        = 10715 [10714]
setustack(0xFF3A2088)
open("/tmp/lockfile", O_WRONLY|O_CREAT, 0755)   = 3
getpid()                                        = 10715 [10714]
fcntl(3, F_SETLKW, 0xFFBFFD60)  (sleeping...)
 


The final call, fcntl(), is marked as sleeping, because the function is blocking. This means that the function 
is waiting for something to happen, and the kernel has put the process to sleep until the event occurs. To determine 
what the event is, you must look at fcntl().

The man page for fcntl() (man fcntl) describes the function simply as "file control" on Solaris and 
"manipulate file descriptor" on Linux. In all cases, fcntl() requires a file descriptor, which is an integer 
describing a file the process has opened, a command that specifies the action to be taken on the file descriptor, 
and finally any arguments required for the specific function. In the example in Listing 2, the file descriptor is 3, 
and the command is F_SETLKW. (The 0xFFBFFD60 is a pointer to a data structure, which doesn't concern us now.) 
Digging further, the man page states that F_SETLKW opens a lock on the file and waits until the lock can be obtained.

From the first example involving the open() system call, you saw that a successful call returns a file descriptor. 
In the truss output of Listing 2, there are two cases in which the result of open() returns 3. 
Because file descriptors are reused after they are closed, the relevant open() is the one just above fcntl(), 
which is for /tmp/lockfile. A utility like lsof lists any processes holding open a file. Failing that, 
you could trace through /proc to find the process with the open file. However, as is usually the case, 
a file is locked for a good reason, such as limiting the number of instances of the application or configuring 
the application to run in a user-specific directory.


>> Attaching to a running process

Sometimes, an application is already running when a problem occurs. Being able to run an already-running process 
under truss would be helpful. For example, notice that in the output of the Top application, a certain process 
has been consuming 95 percent of the CPU for quite some time, as shown in Listing 3.


Listing 3. Top output showing a CPU-intensive process

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 11063 sean       1   0    0 1872K  952K run     87.9H 94.68% udpsend
 


The -p option to truss allows the owner of the process, or root, to attach to a running process and view 
the system call activity. The process id (PID) is required. In the example shown in Listing 3, the PID is 11063. 
Listing 4 shows the system call activity of the application in question.


Listing 4. truss output after attaching to a running process

$ truss -p 11063:

sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
sendto(3, " a b c", 3, 0, 0xFFBFFD58, 16)       = 3
. repeats ...
 

The sendto() function's man page (man sendto) shows that this function is used to send a message from a socket 
-- typically, a network connection. The output of truss shows the file descriptor (the first 3) and the data 
being sent (abc). Indeed, capturing a sample of network traffic with the snoop or tcpdump tool shows a large amount 
of traffic being directed to a particular host, which is likely not the result of a properly behaving application.

Note that truss was not able to show the creation of file descriptor 3, because you had attached after the descriptor 
was created. This is one limitation of attaching to a running process and the reason why you should gather 
other information using a tool, such as a packet analyzer before jumping to conclusions.

This example might seem somewhat contrived (and technically it was, because I wrote the udpsend application 
to demonstrate how to use truss), but it is based on a real situation. I was investigating a process running 
on a UNIX-based appliance that had a CPU-bound process. Tracing the application showed the same packet activity. 
Tracing with a network analyzer showed the packets were being directed to a host on the Internet. After escalating 
with the vendor, I determined that the problem was their application failing to perform proper error checking 
on a binary configuration file. The file had somehow become corrupted. As a result, the application interpreted 
the file incorrectly and repeatedly hammered a random IP address with User Datagram Protocol (UDP) datagrams. 
After I replaced the file, the process behaved as expected.



>> Filtering output


After a while, you'll get the knack of what to look for. While it's possible to use the grep command to go through 
the output, it's easier to configure truss to focus only on certain calls. This practice is common if you're trying 
to determine how an application works, such as which configuration files the application is using. In this case, 
the open() and stat() system calls point to any files the application is trying to open.

You use open() to open a file, but you use stat() to find information about a file. Often, an application looks for 
a file with a series of stat() calls, and then opens the file it wants.

For truss, you add filtering system calls with the -t option. For strace under Linux, you use -e. In either case, 
you pass a comma-separated list of system calls to be shown on the command line. By prefixing the list with the 
exclamation mark (!), the given calls are filtered out of the output. Listing 5 shows a fictitious application 
looking for a configuration file.


Listing 5. truss output filtered to show only stat() and open() functions

$ truss -tstat,open ./app
stat("/export/home/sean/app", 0xFFBFFBD0)   = 0
open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
stat("/opt/csw/lib/libc.so.1", 0xFFBFF700)      Err#2 ENOENT
stat("/lib/libc.so.1", 0xFFBFF700)              = 0
open("/lib/libc.so.1", O_RDONLY)                = 3
stat("/export/home/sean/.config", 0xFFBFFCF0)   Err#2 ENOENT
stat("/etc/app/configfile", 0xFFBFFCF0)         Err#2 ENOENT
stat("/etc/configfile", 0xFFBFFCF0)             = 0
open("/etc/configfile", O_RDONLY)               = 3
 

The final four lines are the key here. The stat() function for /export/home/sean/.config results in ENOENT, 
which means that the file wasn't found. The code then tries /etc/app/configfile before it finds the correct 
information in /etc/configfile. The significance of first checking in the user's home directory is that you 
can override the configuration by user.



>> Final thoughts

Whether your operating system uses truss, strace, trace, or something else, the ability to peer into an application's 
behavior is a powerful tool for problem solving. The methodology can be summed up as follows:

Describe the problem. 
Trace the application. 
Start at the spot at which the problem occurs and work backward through the system calls to identify the problem. 
Use the man pages for help on interpreting the system calls. 
Correct the behavior and test. 
Tracing application behavior is a powerful troubleshooting tool, because you're observing the system calls 
that the application makes to the operating system. When the usual problem-solving methods fail, turn to 
application tracing.




20.7. snap command on AIX:
--------------------------

The snap command gathers system configuration information and compresses the information into a pax file. 
The information gathered with the snap command may be required to identify and resolve system problems.

In normal conditions, the command "snap -gc" should be sufficient. The pax file will be stored in /tmp/ibmsupt

# snap -gc 

create the following file:

/tmp/ibmsupt/snap.pax.Z


Further info:

snap Command

Purpose

       Gathers system configuration information.

Syntax

       snap [ -a ] [ -A ] [ -b ] [ -B ] [ -c ] [ -C ] [ -D ] [ -f ] [ -g ] [ -G ] [ -i ] [ -k ] [ -l ] [ -L ][ -n ] [ -N ] 
       [ -p ] [ -r ] [ -R  ] [ -s ] [ -S ] [ -t ] [ -T  Filename ] [ -w  ] [ -o OutputDevice ] [ -d Dir ] [ -v Component ] 
       [ -O FileSplitSize ] [ -P Files ]
       [ script1 script2 ... | All | file:filepath ]

       snap [ -a ] [ -A ] [ -b ] [ -B ] [ -c ] [ -C ] [ -D ] [ -f ] [ -g ] [ -G ] [ -i ] [ -k ] [ -l ] [ -L ][ -n ] [ -N ] 
       [ -p ] [ -r ] [ -R  ] [ -s ] [ -S ] [ -t ] [ -T  Filename ] [ -o OutputDevice ] [ -d Dir ] [ -v Component ] 
        [ -O FileSplitSize ] [ -P Files ] [
       script1 script2 ... | All | file:filepath ]

       snap -e [ -m Nodelist ] [ -d Dir ]

Description

       The snap command gathers system configuration information and compresses the information into a pax file. The file may then be
       written to a device such as tape or DVD, or transmitted to a remote system. The information gathered with the snap command might be
       required to identify and resolve system problems. Note: Root user authority is required to execute the snap command. Use the snap -o
       /dev/cd0 command to copy the compressed image to DVD. Use the snap -o /dev/rmt0 command to copy the image to tape.

       Use the snap -o /dev/rfd0 command to copy the compressed image to diskette. Use the snap -o /dev/rmt0 command to copy the image to
       tape.

       Approximately 8MB of temporary disk space is required to collect all system information, including contents of the error log. If you
       do not gather all system information with the snap -a command, less disk space may be required (depending on the options selected).
       Note: If you intend to use a tape to send a snap image to IBM(R) for software support, the tape must be one of the following formats:
       *    8mm, 2.3 Gb capacity
       *    8mm, 5.0 Gb capacity
       *    4mm, 4.0 Gb capacity

       Using other formats prevents or delays IBM software support from being able to examine the contents.

       The snap -g command gathers general system information, including the following:
       *    Error report
       *    Copy of the customized Object Data Manager (ODM) database
       *    Trace file
       *    User environment
       *    Amount of physical memory and paging space
       *    Device and attribute information
       *    Security user information

       The output of the snap -g command is written to the /tmp/ibmsupt/general/general.snap file.

       The snap command checks for available space in the /tmp/ibmsupt directory, the default directory for snap command output. You can
       write the output to another directory by using the -d flag. If there is not enough space to hold the snap command output, you must
       expand the file system.

       Each execution of the snap command appends information to previously created files. Use the -r flag to remove previously gathered and
       saved information.

       Flags:

       -a
            Gathers all system configuration information. This option requires approximately 8MB of temporary disk space.
       -A
            Gathers asynchronous (TTY) information.
       -b
            Gathers SSA information.
       -B
            Bypasses collection of SSA adapter dumps. The -B flag only works when the -b flag is also specified; otherwise, the -B flag is
            ignored.
       -c
            Creates a compressed pax image (snap.pax.Z file) of all files in the /tmp/ibmsupt directory tree or other named output
            directory. Note: Information not gathered with this option should be copied to the snap directory tree before using the -c flag.
            If a test case is needed to demonstrate the system problem, copy the test case to the /tmp/ibmsupt/testcase directory before
            compressing the pax file.
       -C
            Retrieves all the files in the fwdump_dir directory. The files are placed in the "general" subdirectory. The -C snap option
            behaves the same as -P*.
       -D
            Gathers dump and /unix information. The primary dump device is used. Notes:
              1    If bosboot -k was used to specify the running kernel to be other than /unix, the incorrect kernel is gathered. Make sure
                   that /unix is , or is linked to, the kernel in use when the dump was taken.
              2    If the dump file is copied to the host machine, the snap command does not collect the dump image in the /tmp/ibmsupt/dump
                   directory. Instead, it creates a link in the dump directory to the actual dump image.
       -d AbsolutePath
            Identifies the optional snap command output directory (/tmp/ibmsupt is the default). You must specify the absolute path.
       -e
            Gathers HACMP(TM) specific information. Note: HACMP specific data is collected from all nodes belonging to the cluster . This
            flag cannot be used with any other flags except -m and -d.
       -f
            Gathers file system information.
       -g
            Gathers the output of the lslpp -hac command, which is required to recreate exact operating system environments. Writes output
            to the /tmp/ibmsupt/general/lslpp.hBc file. Also collects general system information and writes the output to the
            /tmp/ibmsupt/general/general.snap file.
       -G
            Includes predefined Object Data Manager (ODM) files in general information collected with the -g flag.
       -i
            Gathers installation debug vital product data (VPD) information.




strace example on Linux:
------------------------

One main trace utility on most Linux distro's, is the "strace" command.
You can use it with many parameters, but the "-o outputfile" is very important, in order to save the output to a file.

Use it like:

# strace -o logfile <command_or_program_you_want_to_trace> 

Because strace will show you the systemcalls and signals, you can use it to reveal whether a program cannot
find a file, or does not have permissions to read (or write to) a file. In such a case, a program might fail.

Example:

Suppose we have a file called "/etc/security.conf". Now we run a utility to read the file (like cat, pg, more, less etc..)
as a normal user, which user does not have permissions to read the file. Let's trace that event to a logfile, and see
what we can discover.

$ strace -o strace_example.log less /etc/security.conf

A trace file can get pretty long, but you should just browse it and be alert on what seems to be an error reported.
So, if we take a look in the logfile "strace_example.log"

..
..
open("/etc/security.conf", O_RDONLY|O_LARGEFILE) = -1 EACCES (Permission denied)
write(2, "/etc/security.conf: Permission denied\n", 32) = 32
..
..

We can clearly see, that our program failed due to lack of read permission.





=============
21. Logfiles:
=============


21.1 Solaris:
=============

Unix message files record all system problems like disk errors, swap errors, NFS problems, etc. 
Monitor the following files on your system to detect system problems: 

  tail -f /var/adm/syslog
  tail -f /var/adm/messages
  tail -f /var/log/syslog

You can also use the dmesg command.
Messages are recorded by the syslogd demon.

Diagnostics can be done from the OK prompt after a reboot, like probe-scsci, show-devs, show-disks, test memory etc..
You can also use SunVTS tool to run diagnostics. SunVTS is Suns's Validation Test package.

System dumps:
You can manage system dumps by using the dumpadm command.


Userlogins are recorded in /var/adm/utmpx
Solaris 8,9 does not use wtmp or utmp


Logfiles:
---------

/var/adm/messages
The syslogd daemon logs its findings into this file

/var/adm/lastlog
This file holds the most recent login time for each user of the system

/var/adm/utmpx
This database file contains user access and accounting information for commands such as
who, write, login. The utmpx file is where information such as the terminal and login time
are stored, and if you use the who command, it will retrieve that information.

/var/adm/wtmpx
This file contains the history of user access and accounting information, for the utmpx database.
The "last" command will use this file, to show you the historical login and logout info, since the last reboot.

/var/adm/sulog
This file shows you which users has used the su command, to switch to another user.

/var/adm/acct
If accounting is enabled, accounting information is recorded in that file.

/var/adm/loginlog
If it is important for you to track whether users are trying to log in to your user accounts, 
you can create a /var/adm/loginlog file with read and write permissions for root only. After you create the loginlog file, 
all failed login activity is written to this file automatically after five failed attempts. The five-try limit avoids recording 
failed attempts that are the result of typographical errors.

The loginlog file contains one entry for each failed attempt. Each entry contains the user's login name, 
tty device, and time of the attempt.



AIX:
----

Periodical the following files have to be decreased in size. You can use cat /dev/null command

Example: cat /dev/null >/var/adm/sulog

/var/adm/sulog 
/var/adm/cron/log 
/var/adm/wtmp 
/etc/security/failedlogin 

Notes about the errorlog, thats the file /var/adm/ras/errlog.

Do NOT use cat /dev/null to clear the errorlog. 
Use instead the following procedure:

# /usr/lib/errstop   (stop the error daemon)
move the errlog file
# /usr/lib/errstart  (start the error daemon)



errdemon:
---------

On most UNIX systems, information and errors from system events and processes are managed by the 
syslog daemon (syslogd); depending on settings in the configuration file /etc/syslog.conf, messages are passed 
from the operating system, daemons, and applications to the console, to log files, or to nowhere at all. 
AIX includes the syslog daemon, and it is used in the same way that other UNIX-based operating systems use it. 
In addition to syslog, though, AIX also contains another facility for the management of hardware, operating system, 
and application messages and errors. This facility, while simple in its operation, provides unique and valuable 
insight into the health and happiness of an AIX system.

The AIX error logging facility components are part of the bos.rte and the bos.sysmgt.serv_aid packages, 
both of which are automatically placed on the system as part of the base operating system installation. 

Unlike the syslog daemon, which performs no logging at all in its default configuration as shipped, 
the error logging facility requires no configuration before it can provide useful information about the system. 
The errdemon is started during system initialization and continuously monitors the special file /dev/error 
for new entries sent by either the kernel or by applications. The label of each new entry is checked 
against the contents of the Error Record Template Repository, and if a match is found, additional information 
about the system environment or hardware status is added, before the entry is posted to the error log.

The actual file in which error entries are stored is configurable; the default is /var/adm/ras/errlog. 
That file is in a binary format and so should never be truncated or zeroed out manually. The errlog file 
is a circular log, storing as many entries as can fit within its defined size. A memory buffer is set 
by the errdemon process, and newly arrived entries are put into the buffer before they are written to the log 
to minimize the possibility of a lost entry. The name and size of the error log file and the size of the memory buffer 
may be viewed with the errdemon command:


[aixhost:root:/] # /usr/lib/errdemon -l
Error Log Attributes
--------------------------------------------
Log File                /var/adm/ras/errlog
Log Size                1048576 bytes
Memory Buffer Size      8192 bytes

The parameters displayed may be changed by running the errdemon command with other flags, documented 
in the errdemon man page. The default sizes and values have always been sufficient on our systems, 
so I've never had reason to change them.

Due to use of a circular log file, it is not necessary (or even possible) to rotate the error log. 
Without intervention, errors will remain in the log indefinitely, or until the log fills up with new entries. 
As shipped, however, the crontab for the root user contains two entries that are executed daily, 
removing hardware errors that are older than 90 days, and all other errors that are older than 30 days.


0 11  *  *  * /usr/bin/errclear -d S,O 30
0 12  *  *  * /usr/bin/errclear -d H 90


The errdemon deamon constantly checks the /dev/error special file, and when new data
is written, the deamon conducts a series of operations.

- To determine the path to your system's error logfile, run the command:
# /usr/lib/errdemon -l
Error Log Attributes
Log File          /var/adm/ras/errlog
Log Size          1048576 bytes
Memory            8192 bytes

- To change the maximum size of the error log file, enter:
# /usr/lib/errdemon -s 200000


You can generate the error reports using smitty or through the errpt command.

# smitty errpt       gives you a dialog screen where you can select types of information.

# errpt -a
# errpt - d H

# errpt -a|pg      Produces a detailed report for each entry in the error log 
# errpt -aN hdisk1 Displays an error log for ALL errors occurred on this drive. If more than a few errors 
                   occur within a 24 hour period, execute the CERTIFY process under DIAGNOSTICS to determine 
                   if a PV is becoming marginal. 
 

If you use the errpt without any options, it generates a summary report. 
If used with the -a option, a detailed report is created.
You can also display errors of a particular class, for example for the Hardware class.

Examples using errpt:
---------------------

To display a complete summary report, enter: 

errpt
To display a complete detailed report, enter: 
errpt  -a

To display a detailed report of all errors logged for the error identifier E19E094F, enter: 
errpt  -a  -j E19E094F

To display a detailed report of all errors logged in the past 24 hours, enter: 
errpt  -a  -s mmddhhmmyy

where the mmddhhmmyy string equals the current month, day, hour, minute, and year, minus 24 hours. 
To list error-record templates for which logging is turned off for any error-log entries, enter: 
errpt  -t  -F log=0

To view all entries from the alternate error-log file /var/adm/ras/errlog.alternate, enter: 
errpt  -i /var/adm/ras/errlog.alternate

To view all hardware entries from the alternate error-log file /var/adm/ras/errlog.alternate, enter: 
errpt  -i /var/adm/ras/errlog.alternate -d H

To display a detailed report of all errors logged for the error label ERRLOG_ON, enter: 
errpt  -a  -J ERRLOG_ON

To display a detailed report of all errors and group duplicate errors, enter: 

errpt -aD
To display a detailed report of all errors logged for the error labels DISK_ERR1 and DISK_ERR2 during 
the month of August, enter: 
errpt -a -J DISK_ERR1,DISK_ERR2 -s 0801000004 -e 0831235904"

errclear:

Deletes entries in the error log

Example: errclear 0 (Truncates the errlog to 0 bytes)


Example errorreport:
--------------------

Example 1:
----------

P550:/home/reserve $ errpt

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
0EC00096   0130224507 P U SYSPFS         STORAGE SUBSYSTEM FAILURE
0EC00096   0130224007 P U SYSPFS         STORAGE SUBSYSTEM FAILURE
0EC00096   0130224007 P U SYSPFS         STORAGE SUBSYSTEM FAILURE
0EC00096   0130223507 P U SYSPFS         STORAGE SUBSYSTEM FAILURE
F7DDA124   0130223507 U H LVDD           PHYSICAL VOLUME DECLARED MISSING
52715FA5   0130223507 U H LVDD           FAILED TO WRITE VOLUME GROUP STATUS AREA
CAD234BE   0130223507 U H LVDD           QUORUM LOST, VOLUME GROUP CLOSING
613E5F38   0130223507 P H LVDD           I/O ERROR DETECTED BY LVM
613E5F38   0130223507 P H LVDD           I/O ERROR DETECTED BY LVM
613E5F38   0130223507 P H LVDD           I/O ERROR DETECTED BY LVM
0873CF9F   0130191907 T S pts/4          TTYHOG OVER-RUN
0EC00096   0130162407 P U SYSPFS         STORAGE SUBSYSTEM FAILURE
51E537B5   0130161807 P H sysplanar0     platform_dump saved to file
291D64C3   0130161807 I H sysplanar0     platform_dump indicator event
291D64C3   0130161807 I H sysplanar0     platform_dump indicator event
BFE4C025   0130161807 P H sysplanar0     UNDETERMINED ERROR
51E537B5   0130161707 P H sysplanar0     platform_dump saved to file
291D64C3   0130161707 I H sysplanar0     platform_dump indicator event
291D64C3   0130161707 I H sysplanar0     platform_dump indicator event
51E537B5   0130161707 P H sysplanar0     platform_dump saved to file
291D64C3   0130161707 I H sysplanar0     platform_dump indicator event
291D64C3   0130161707 I H sysplanar0     platform_dump indicator event
BFE4C025   0130161607 P H sysplanar0     UNDETERMINED ERROR
BFE4C025   0130161407 P H sysplanar0     UNDETERMINED ERROR
BFE4C025   0130161307 P H sysplanar0     UNDETERMINED ERROR
BFE4C025   0130161307 P H sysplanar0     UNDETERMINED ERROR
BFE4C025   0130161207 P H sysplanar0     UNDETERMINED ERROR
BFE4C025   0130161207 P H sysplanar0     UNDETERMINED ERROR
0EC00096   0130161207 P U SYSPFS         STORAGE SUBSYSTEM FAILURE
BFE4C025   0130161107 P H sysplanar0     UNDETERMINED ERROR
D2A1B43E   0130161107 P U SYSPFS         FILE SYSTEM CORRUPTION
D2A1B43E   0130161107 P U SYSPFS         FILE SYSTEM CORRUPTION
CD546B25   0130161107 I O SYSPFS         FILE SYSTEM RECOVERY REQUIRED
CD546B25   0130161107 I O SYSPFS         FILE SYSTEM RECOVERY REQUIRED
1ED0A744   0130161107 P U SYSPFS         FILE SYSTEM LOGGING SUSPENDED
CD546B25   0130161107 I O SYSPFS         FILE SYSTEM RECOVERY REQUIRED
D2A1B43E   0130161107 P U SYSPFS         FILE SYSTEM CORRUPTION
1ED0A744   0130161107 P U SYSPFS         FILE SYSTEM LOGGING SUSPENDED
F7DDA124   0130161107 U H LVDD           PHYSICAL VOLUME DECLARED MISSING
52715FA5   0130161107 U H LVDD           FAILED TO WRITE VOLUME GROUP STATUS AREA
CAD234BE   0130161107 U H LVDD           QUORUM LOST, VOLUME GROUP CLOSING
613E5F38   0130161107 P H LVDD           I/O ERROR DETECTED BY LVM
EAA3D429   0130161107 U S LVDD           PHYSICAL PARTITION MARKED STALE
613E5F38   0130161107 P H LVDD           I/O ERROR DETECTED BY LVM
613E5F38   0130161107 P H LVDD           I/O ERROR DETECTED BY LVM
41BF2110   0130161107 U H LVDD           MIRROR WRITE CACHE WRITE FAILED
613E5F38   0130161107 P H LVDD           I/O ERROR DETECTED BY LVM
CAD234BE   0130161107 U H LVDD           QUORUM LOST, VOLUME GROUP CLOSING
F7DDA124   0130161107 U H LVDD           PHYSICAL VOLUME DECLARED MISSING
41BF2110   0130161107 U H LVDD           MIRROR WRITE CACHE WRITE FAILED
613E5F38   0130161107 P H LVDD           I/O ERROR DETECTED BY LVM
6472E03B   0130161107 P H sysplanar0     EEH permanent error for adapter
FEC31570   0130161107 P H sisscsia2      UNDETERMINED ERROR
C14C511C   0130161107 T H scsi5          ADAPTER ERROR
BFE4C025   0130161107 P H sysplanar0     UNDETERMINED ERROR
FE2DEE00   0130144307 P S SYSXAIXIF      DUPLICATE IP ADDRESS DETECTED IN THE NET
FE2DEE00   0130143207 P S SYSXAIXIF      DUPLICATE IP ADDRESS DETECTED IN THE NET
B6048838   0129100507 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
B6048838   0129100307 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED


You might create a script called alert.sh and call it from your .profile

#!/usr/bin/ksh
cd ~
rm -rf /root/alert.log
echo "Important alerts in errorlog: " >> /root/alert.log
errpt | grep -i STORAGE >> /root/alert.log
errpt | grep -i QUORUM >> /root/alert.log
errpt | grep -i ADAPTER >> /root/alert.log
errpt | grep -i VOLUME >> /root/alert.log
errpt | grep -i PHYSICAL >> /root/alert.log
errpt | grep -i STALE >> /root/alert.log
errpt | grep -i DISK >> /root/alert.log
errpt | grep -i LVM >> /root/alert.log
errpt | grep -i LVD >> /root/alert.log
errpt | grep -i UNABLE >> /root/alert.log
errpt | grep -i USER >> /root/alert.log
errpt | grep -i CORRUPT >> /root/alert.log
cat /root/alert.log


if [ `cat alert.log|wc -l` -eq 1 ]
then
   echo "No critical errors found."
fi

echo " "
echo "Filesystems that might need attention, e.g. %used:"
df -k |awk '{print $4,$7}' |grep -v "Filesystem"|grep -v tmp  > /tmp/tmp.txt
cat /tmp/tmp.txt | sort -n | tail -3



Example 2:
----------

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
173C787F   0710072007 I S topsvcs        Possible malfunction on local adapter
90D3329C   0710072007 P S topsvcs        NIM read/write error
AE3E3FAD   0710064907 I O SYSJ2          FSCK FOUND ERRORS
AE3E3FAD   0710064907 I O SYSJ2          FSCK FOUND ERRORS
AE3E3FAD   0710064907 I O SYSJ2          FSCK FOUND ERRORS
AE3E3FAD   0710064907 I O SYSJ2          FSCK FOUND ERRORS
AE3E3FAD   0710064907 I O SYSJ2          FSCK FOUND ERRORS
AE3E3FAD   0710064907 I O SYSJ2          FSCK FOUND ERRORS
AE3E3FAD   0710064907 I O SYSJ2          FSCK FOUND ERRORS
C1348779   0710061107 I O SYSJ2          LOG I/O ERROR
C1348779   0710061107 I O SYSJ2          LOG I/O ERROR
C1348779   0710061107 I O SYSJ2          LOG I/O ERROR
EAA3D429   0710061007 U S LVDD           PHYSICAL PARTITION MARKED STALE


IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
12337A8D   0723152107 T S DR_KER_MEM     Affected memory not available for DR rem



Some notes on disk related errors:
----------------------------------

DISK_ERR4 is bad block relocation. Not a serious error. 
DISK_ERR2 is a hardware error as opposed to a media or corrected read error on disk. This could be serious.


EAA3D429   0121151108 U S LVDD           PHYSICAL PARTITION MARKED STALE



Note 1:
-------

thread 1:

Q:

Has anyone seen these errors before? We're running 6239 fc cards on a 
CX600. AIX level is 52-03 with the latest patches for devices.pci.df1000f7 
as well. 


I didn't know that these adapters still used devices.pci.df1000f7 as part 
of their device driver set, but aparently they do. We're mostly seeing 
ERR4s on bootup and occassionaly throughout the day. They're TEMP but 
should I be concerned about this? Any help would be greatly appreciated! 

LABEL: SC_DISK_ERR4 
IDENTIFIER: DCB47997 

A:

DISK_ERR_4 are simply bad-block relocation errors. They are quite normal. 
However, I heard that if you get more than 8 in an 8-hour period, you 
should get the disk replaced as it is showing signs of impending failure. 


thread 2:

Q:

> Has anyone corrected this issue? SC_DISK_ERR2 with EMC Powerpath = 
> filesets listed below? I am using a CX-500.=20 
> 


A:

 got those errors before using a CX700 and it turned out to be a 
firmware problem on the fibre adapter, model 6259. EMC recommended the 
92X1 firmware and to find out IBM found problems with timeouts to the 
drives and recommended going back a level to 81X1. 

A:

We have the same problem as well. EMC say its a firmware error on the 
FC adapters

A:

This is how to fix these errors, downgrading firware is not recommended. 

Correcting SCSI_DISK_ERR2's in the AIX Errpt Log - Navisphere Failover 
Wizard 

1. In the Navisphere main screen, select tools and then click the 
Failover Setup Wizard. Click next to continue. 

2. From the drop-down list select the host server you wish to 
modify and click next 

3. Highlight the CX-500 and click next 

4. Under the specify settings box be sure to select 1 for the 
failover setting and disable for array commpath. Click next to process. 
5. The next screen is the opportunity to review your selections 
(host, failover mode and array commpath); click next to commit 
6. The following screen displays a warning message to alert you are 
committing these changes. Click yes to process. 

7. Next login to the AIX command prompt as root and perform the 
following commands to complete stopping the SCSI_DISK_ERR2. 
a. lsdev -Cc disk | grep LUNZ 

(Filter for disks with LUNZ in the description) 
b. rmdev -dl hdisk(#)'s 

(Note the disks and remove them from the ODM) 
c. errclear 0 
(Clear the AIX system error log) 
d. cfgmgr -v 
(Attempt to re-add the LUNZ disks) 
e. lsdev -Cc disk | grep LUNZ 
(Double check to make sure the LUNZ disk does not add itself back to the 
system after the cfgmgr command) 
f. errpt -a 

(Monitor the AIX error log to insure the SCSI_DISK_ERR2's are gone) 
Task Complete... 


E87EF1BE   0512150008 P O dumpcheck      The largest dump device is too small.
------------------------------------------------------------------------------


HACMP error:
------------


LABEL:          LVM_GS_RLEAVE
IDENTIFIER:     AB59ABFF

Date/Time:       Tue May 26 09:05:36 ZOM 2009
Sequence Number: 1149
Machine Id:      00CC696E4C00
Node Id:         wijting
Class:           U
Type:            UNKN
Resource Name:   LIBLVM
Resource Class:  NONE
Resource Type:   NONE
Location:

Description
Remote node Concurrent Volume Group failure detected

Probable Causes
Remote node Concurrent Volume Group forced offline


Failure Causes
Remote node left VGSA/VGDA groups due to failure

        Recommended Actions
        Examine error log on identified remote node

Detail Data
Remote Node Name
vleet
Volume Group ID
00CC 94EE 0000 4C00 0000 0111 4FE8 8651
MAJOR/MINOR DEVICE NUMBER
0045 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000




Problems with errpt:
--------------------

Invalid log, or other problems

thread 1:

Q:

Hello ...

the 'errpt' Command tells me:

0315-180 logread: UNEXPECTED EOF 0315-171 Unable to process the error log file
/var/adm/ras/errlog. 0315-132 The supplied error log is not valid:
/var/adm/ras/errlog.

# ls -l /var/adm/ras/errlog
-rw-r--r-- 1 root system 0 Jun 14 17:31 /var/adm/ras/errlog

How can I fix this problem?

A:

/usr/lib/errstop           # stop logging

rm /var/adm/ras/errlog     # get rid of that log.

/usr/lib/errdemon          # restart the daemon, creating a new error log.


Some err identifiers that can sometimes be hard to trace to their true sources:
===============================================================================

Take a look at those errpt entries:


--------------------------------------------------------------------------


ERRPT ENTRY 1:
--------------

LABEL:          CORE_DUMP 
IDENTIFIER:     C69F5C9B 

Date/Time:       Thu Jan 15 02:00:45 MET 2009 
Sequence Number: 999 
Machine Id:      00CC94EE4C00 
Node Id:         srv1 
Class:           S 
Type:            PERM 
Resource Name:   SYSPROC 

Description 
SOFTWARE PROGRAM ABNORMALLY TERMINATED 

Probable Causes 
SOFTWARE PROGRAM 

User Causes 
USER GENERATED SIGNAL 

        Recommended Actions 
        CORRECT THEN RETRY 

Failure Causes 
SOFTWARE PROGRAM 

        Recommended Actions 
        RERUN THE APPLICATION PROGRAM 
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING 
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE 

Detail Data 
SIGNAL NUMBER 
          11 
USER'S PROCESS ID: 
               1298680 
FILE SYSTEM SERIAL NUMBER 
          57 
INODE NUMBER 
       37134 
CORE FILE NAME 
/var/core/core.1298680.15010044 
PROGRAM NAME 
BS_sear 
STACK EXECUTION DISABLED 
           0 
COME FROM ADDRESS REGISTER 

PROCESSOR ID 
  hw_fru_id: 1 
  hw_cpu_id: 9 

ADDITIONAL INFORMATION 
?? 
?? 
Unable to generate symptom string. 


  (or as another example of the last lines, where you can see the "program name")

  PROGRAM NAME 
  opmn 
  STACK EXECUTION DISABLED 
           0 
  COME FROM ADDRESS REGISTER 

  PROCESSOR ID 
    hw_fru_id: 0 
    hw_cpu_id: 2 

  ADDITIONAL INFORMATION 
  strlen 0 
  pmStrdup 14 

  Symptom Data 
  REPORTABLE 
  1 
  INTERNAL ERROR   
  0 
  SYMPTOM CODE 
  PCSS/SPI2 FLDS/opmn SIG/11 FLDS/strlen VALU/0 FLDS/pmStrdup 
  



--------------------------------------------------------------------------

POSSIBLE EXPLANATION:
=====================

http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.security/doc/security/stack_exec_disable.htm

AIXr has enabled the stack execution disable (SED) mechanism to disable the execution of code on a stack 
and select data areas of a process.

By disabling the execution and then terminating, an infringing program, the attacker is prevented 
from gaining root user privileges through a buffer overflow attack. While this feature does not stop 
buffer overflows, it provides protection by disabling the execution of attacks on buffers that have been overflowed.

Beginning with the POWER4T family of processors, you can use a page-level execution enable and/or disable feature 
for the memory. The AIX SED mechanism uses this underlying hardware support for implementing a 
no-execution feature on select memory areas. Once this feature is enabled, the operating system checks 
and flags various files during the executable programs. It then alerts the operating system memory manager 
and the process managers that the SED is enabled for the process being created. The select memory areas 
are marked for no-execution. If any execution occurs on these marked areas, the hardware raises 
an exception flag and the operating system stops the corresponding process. The exception and application 
termination details are captured through the AIX error log events.

SED is implemented mainly through the sedmgr command. The sedmgr command permits control 
of the systemwide SED mode of operation as well as setting the executable file based SED flags.

SED modes and monitoring
The stack execution disable (SED) mechanism in AIXr is implemented through systemwide mode flags, 
as well as individual executable file-based header flags.

While systemwide flags control the systemwide operation of the SED, file level flags indicate 
how files should be treated in SED. The buffer overflow protection (BOP) mechanism provides 
for four systemwide modes of operation:

-- off 
The SED mechanism is turned off and no process is marked for SED protection. 
--select 
Only a select set of files are enabled and monitored for SED protection. The select set of files 
are chosen by reviewing the SED related flags in the executable program binary headers. 
The executable program header enables SED related flags to request to be included in the select mode. 
-- setidfiles 
Permits you to enable SED, not only for the files requesting such a mechanism, but all the important 
setuid and setgid system files. In this mode, the operating system not only provides SED for the files 
with the request SED flag set, but also enables SED for the executable files with the following 
characteristics (except the files marked for exempt in their file headers):
 .SETUID files owned by root 
 .SETGID files with primary group as system or security 
-- all 
All executable programs loaded on the system are SED protected except for the files requesting 
an exemption from SED mode. Exemption related flags are part of the executable program headers. 
The SED feature on AIX also provides the ability to monitor instead of stopping the process when 
an exception happens. This systemwide control permits a system administrator to check for breakdowns 
and issues in the system environment by monitoring it before the SED is deployed in the production systems. 

The sedmgr command provides an option that permits you to enable SED to monitor files instead 
of stopping the processes when exceptions occur. The system administrator can evaluate whether 
an executable program is doing any legitimate stack execution. This setting works in conjunction 
with the systemwide mode set using the -c option. When the monitor mode is turned on, the system permits 
the process to continue operating even if an SED-related exception occurs. Instead of stopping the process, 
the operating system logs the exception in the AIX error log. If SED monitoring is off, 
the operating system stops any process that violates and raises an exception per SED facility.

Any changes to the SED mode systemwide flags requires that you restart the system for the changes 
to take effect. All of these types of events are audited.


note:
=====

Today, I found out why my LDAP installation is giving me core dump on a certain LPAR but not on the other LPAR, although both LPARs are "identically" built.

If we look closely in the error log...


....CORE FILE NAME/ldap/filesets/corePROGRAM NAMEldapcfgSTACK EXECUTION DISABLED1COME FROM ADDRESS REGISTER.....we can see the "STACK EXECUTION DISABLED" under the program name. What this means is that the AIX system had the Stack Execution Disable protection turned ON.

To confirm if its really turned ON,

# sedmgrStack Execution Disable (SED) mode: allSED configured in kernel: all
To change it,

# sedmgr -m selectSystem wide SED has been set successfully. It is effective at 64 bit kernel boot time.
Run sedmgr again (this step is not necessary), 

# sedmgrStack Execution Disable (SED) mode: selectSED configured in kernel: all
We need to reboot the server for the change to take effect.

# sedmgrStack Execution Disable (SED) mode: selectSED configured in kernel: select
Now I can configure my LDAP with the mksecldap command flawlessly.

Labels: AIX 


note:
=====








--------------------------------------------------------------------------

ERRPT ENTRY 2:
--------------

LABEL:          SRC 
IDENTIFIER:     E18E984F 

Date/Time:       Fri Jan 16 09:31:33 MET 2009 
Sequence Number: 1513 
Machine Id:      00C503AC4C00 
Node Id:         heilbot 
Class:           S 
Type:            PERM 
Resource Name:   SRC 

Description 
SOFTWARE PROGRAM ERROR 

Probable Causes 
APPLICATION PROGRAM 

Failure Causes 
SOFTWARE PROGRAM 

        Recommended Actions 
        PERFORM PROBLEM RECOVERY PROCEDURES 

Detail Data 
SYMPTOM CODE 
           0 
SOFTWARE ERROR CODE 
       -9053 
ERROR CODE 
           2 
DETECTING MODULE 
'tellsrc.c'@line:'87' 
FAILING MODULE 

Duplicates 
Number of duplicates 
           3 
Time of first duplicate 
Fri Jan 16 09:31:18 MET 2009 
Time of last duplicate 
Fri Jan 16 09:31:33 MET 2009 


POSSIBLE EXPLANATIONS:
======================

In entry 2, we see the identifier E18E984F, and "SOFTWARE ERROR CODE -9053", and "Detecting module tellsrc.c@line:87".
tellsrc.c'@line:'87'


http://www-01.ibm.com/support/docview.wss?uid=isg1IZ03064

IZ03064: VARYONVG -C FAILS WITH "GSCHILD:CANNOT REGISTER WITH DRIVER APPLIES TO AIX 5300-07


APAR status
Closed as program error.

Error description 
"varyonvg -c" fails to varyon concurrent volume group and
reports the following error message:

tellclvmd: request failed rc = -9014 [UNKNOWN rc]
0516-1334 varyonvg: The command /usr/sbin/tellclvmd
   returned an error.


errpt logs following entry:

LABEL:          SRC
IDENTIFIER:     E18E984F
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        PERFORM PROBLEM RECOVERY PROCEDURES

Detail Data
SYMPTOM CODE
           0
SOFTWARE ERROR CODE
       -9053
ERROR CODE
          74
DETECTING MODULE
'srcmstr.c'@line:'529'
FAILING MODULE
Local fix 
This problem occurs when multiple "varyonvg -nc"
commands are performed together. By serializing
these commands, this can be avoided.
Problem summary 
Multiple varyonvg -c processes will all create threads in
the gsclvmd daemon.  With certain timing, these threads can
interfere with eachothers global variables and possibly cause
varyonvg to fail.
Problem conclusion 
Privatize variables so mutliple vgs coming online can't
interfere with eachother.
Temporary fix 
Comments 
5200-10 - use AIX APAR IZ05735
5300-06 - use AIX APAR IZ02334
5300-07 - use AIX APAR IZ03064
APAR information 
APAR number IZ03064 
Reported component name AIX 5.3 
Reported component ID 5765G0300 
Reported release 530 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2007-08-14 
Closed date 2007-09-04 
Last modified date 2007-12-06 

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Publications Referenced


Fix information 
Fixed component name AIX 5.3 
Fixed component ID 5765G0300 


error INTRPPC_ERR:
------------------

LABEL:          INTRPPC_ERR
IDENTIFIER:     853015D6

Date/Time:       Sun Mar 22 00:27:49 MET 2009
Sequence Number: 1515
Machine Id:      00C503AC4C00
Node Id:         starboss
Class:           H
Type:            UNKN
Resource Name:   sysplanar0
Resource Class:  planar
Resource Type:   sysplanar_rspc
Location:

Description
UNDETERMINED ERROR

Probable Causes
SYSTEM I/O BUS
SOFTWARE PROGRAM
ADAPTER
DEVICE

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
BUS NUMBER
9001 00C0
INTERRUPT LEVEL
0009 0001
Number of Occurrences
0000 0001


Possible explanations:
----------------------

thread 1:


IY58847: INTRPPC_ERR ERRORS IN ERROR LOGS
  

 A fix is available 
Download fix packs
 


APAR status
Closed as program error.

Error description 
INTRPPC_ERR errors were observed in the error log while
customer ran a testcase as mentioned in the defect.
Local fix 
Problem summary 
INTRPPC_ERR errors were observed in the error log while
customer ran a testcase, which brings up and down the phxentdd
interface in a infinite loop. A ping is executed using the ip
address associated with this interface.
Problem conclusion 
A simple code change to ignore the interrupts
while driver is in closing state.
Temporary fix 
Comments 
APAR information 
APAR number IY58847 
Reported component name AIX 5L FOR POWE 
Reported component ID 5765E6100 
Reported release 510 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2004-07-13 
Closed date 2004-07-13 
Last modified date 2004-10-29 

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Publications Referenced


Fix information 
Fixed component name AIX 5L FOR POWE 
Fixed component ID 5765E6100 

Applicable component levels 
R510 PSY U477721    UP04/10/29 I 1000 
 

thread 2:


> > I've recently started getting INTRPCC_ERR's on an old (but important!)
> > aix 4.3 box. They dont seem to correspond to anything else and the
> > box seems to be working normally. I found a way to lookup the BUS
> > NUMBER via odmget -q value= CuAt, but that didn't return anything
> > for me. Also looking for the interrupt number via lsresource didn't
> > give any matches either. And diag/Advanced Diagnostics/Problem
> > Determination didn't find any trouble.
>
> > Any other suggestions on how to track this down?
>
> > Thanks,
>
> > LABEL: INTRPPC_ERR
> > IDENTIFIER: DADF69E4
>
> > Date/Time: Wed Jul 11 08:51:41
> > Sequence Number: 735309
> > Machine Id: 000247824C00
> > Node Id: scully
> > Class: H
> > Type: UNKN
> > Resource Name: SYSINTR
> > Resource Class: NONE
> > Resource Type: NONE
> > Location: NONE
>
> > Description
> > UNDETERMINED ERROR
>
> > Probable Causes
> > SYSTEM I/O BUS
> > SOFTWARE PROGRAM
> > ADAPTER
> > DEVICE
>
> > Recommended Actions
> > PERFORM PROBLEM DETERMINATION PROCEDURES
>
> > Detail Data
> > BUS NUMBER
> > 0000 00C0
> > INTERRUPT LEVEL
> > 0000 0005
>
> convert the bus number from hex, then look for that value in `ls -l /
> dev`


.... but it is more than likely a device driver issue rather than the
device itself.


thread 3:


Here's how to map the error information to a specific adapter. Let's
do that first.


>Detail Data
>BUS NUMBER
>0000 00C0
>INTERRUPT LEVEL
>0000 0005

Example:

Detail Data
BUS NUMBER
0000 00C0
INTERRUPT LEVEL
0000 0003

lsresource -al pci0 | grep 0x000000C0
--> O pci0 0x8d5c_5 0x000000c0 - 0x000000df

lsresource -al pci0 | grep 3 | grep bus_intr_lvl
--> N sa1 bus_intr_lvl 3


Note: lsresource command example:
---------------------------------


selalbe@starboss:/home/beab_krn/selalbe $ lsresource -al pci0
TYPE DEVICE ATTRIBUTE S G CURRENT
B    pci0   0xda40_1      0x0000000080080000 - 0x00000000800bffff
B    pci0   0xdfa8_1      0x0000000080000000 - 0x000000008003ffff
B    ent0   busmem        0x0000000080120000 - 0x000000008013ffff
B    ent0   rom_mem       0x00000000800c0000 - 0x00000000800fffff
B    ent1   busmem        0x0000000080100000 - 0x000000008011ffff
B    ent1   rom_mem       0x0000000080040000 - 0x000000008007ffff
O    pci0   0xda40_0      0x00000000000df800 - 0x00000000000df83f
O    pci0   0xdfa8_0      0x00000000000dfc00 - 0x00000000000dfc3f
I    ent0   busintr                249    (A1)
I    ent1   busintr                250    (A1)





--------------------------------------------------------------------------

5561971C ECH_CANNOT_SET_CLBK in errpt.

Most of the time, this error can be ignored


errpt command shows: ECH_CANNOT_SET_CLBK for etherchannel virtual i/o network adapters.
  
 Technote (FAQ) 
  
Question 
Why is errpt showing: ECH_CANNOT_SET_CLBK error for an etherchannel that is configured in Network Interface Backup mode 
using virtual i/o network adapters.  
  
 
 
Answer 

It is indeed true that ECH_CANNOT_SET_CLBK error entry is expected and can be ignored. This error is logged because 
the virtual ethernet does not support an ioctl currently. This was the comment by the developers on this error: 
"This just means that currently virtual Ethernets do not support the code to detect when their link status is down. 
Since virtual Ethernets are not physical and hardly if ever experience a "link down" event, this is just an 
informative message and can be safely ignored.

Most importantly is to verify that the Virtual i/o adapter etherchannel has a "internet address to ping"

to verify this:
#lsattr -El entX (where entX is the etherchannel)

if it is not set, it can be done using smitty:
#smitty etherchannel
<change show characteristics of an etherchannel>
select entX
select option: internet address to ping.  
  
 
--------------------------------------------------------------------------


ERRPT entry:
============


LABEL:          TS_LATEHB_PE
IDENTIFIER:     3C81E43F

Date/Time:       Sat Mar 28 16:30:18 MET 2009
Sequence Number: 1149
Machine Id:      00CC94EE4C00
Node Id:         vleet
Class:           U
Type:            PERF
Resource Name:   topsvcs
Resource Class:  NONE
Resource Type:   NONE
Location:

Description
Late in sending heartbeat

Probable Causes
Heavy CPU load
Severe physical memory shortage
Heavy I/O activities

Failure Causes
Daemon can not get required system resource

        Recommended Actions
        Reduce the system load

Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.213,4835
ERROR ID
6zESUw.88Yn7/UKN0Nr9GF0...................
REFERENCE CODE

A heartbeat is late by the following number of seconds
          86




diag command:
-------------

Whenever a hardware problem occurs in AIX, use the diag command to diagnose the problem.

The diag command is the starting point to run a wide choice of tasks and service aids. 
Most of the tasks/service aids are platform specific. 

To run diagnostics on the scdisk0 device, without questions, enter:

# diag -d scdisk0 -c


System dumps:
-------------

A system dump is created when the system has an unexpected system halt or system failure.
In AIX 5L the default dump device is /dev/hd6, which is also the default paging device.
You can use the sysdumpdev command to manage system crash dumps.

The sysdumpdev command changes the primary or secondary dump device designation in a system that is running. 
The primary and secondary dump devices are designated in a system configuration object. 
The new device designations are in effect until the sysdumpdev command is run again, or the system is restarted.

If no flags are used with the sysdumpdev command, the dump devices defined in the SWservAt 
ODM object class are used. The default primary dump device is /dev/hd6. The default secondary dump device is 
/dev/sysdumpnull.


Examples
To display current dump device settings, enter: 
sysdumpdev  -l

To designate logical volume hd7 as the primary dump device, enter: 
sysdumpdev  -p /dev/hd7

To designate tape device rmt0 as the secondary dump device, enter: 
sysdumpdev  -s /dev/rmt0

To display information from the previous dump invocation, enter: 
sysdumpdev  -L

To permanently change the database object for the primary dump device to /dev/newdisk1, enter: 
sysdumpdev  -P  -p /dev/newdisk1

To determine if a new system dump exists, enter: 
sysdumpdev  -z

If a system dump has occurred recently, output similar to the following will appear: 

4537344 /dev/hd7
To designate remote dump file /var/adm/ras/systemdump on host mercury for a primary dump device, enter: 
sysdumpdev  -p mercury:/var/adm/ras/systemdump

A : (colon) must be inserted between the host name and the file name. 
To specify the directory that a dump is copied to after a system crash, if the dump device is /dev/hd6, enter: 
sysdumpdev  -d /tmp/dump

This attempts to copy the dump from /dev/hd6 to /tmp/dump after a system crash. If there is an error during the copy, 
the system continues to boot and the dump is lost. 
To specify the directory that a dump is copied to after a system crash, if the dump device is /dev/hd6, enter: 
sysdumpdev  -D /tmp/dump

This attempts to copy the dump from /dev/hd6 to the /tmp/dump directory after a crash. If the copy fails, 
you are prompted with a menu that allows you to copy the dump manually to some external media.


Starting a system dump:
-----------------------

If you have the Software Service Aids Package installed, you have access to the sysdumpstart command.
You can start the system dump by entering:
# sysdumpstart -p

You can also use:
# smit dump

Notes regarding system dumps:
-----------------------------

note 1:
-------

The_Nail <tomapam@gmail.com> wrote: 
> I handle several AIX 5.1 servers and some of them warns me (via errpt) 
> about a lack of disk space for the dumpcheck ressource. 
> Here is a copy of the message : 

> 
> Description 
> The copy directory is too small. 
> 
> Recommended Actions 
> Increase the size of that file system. 
> 
> Detail Data 
> File system name 
> /var/adm/ras 
> 
> Current free space in kb 
> 7636 
> Current estimated dump size in kb 
> 207872 


> I guess /dev/hd6 is not big enough to contain a system dump. So how 
> can i change that? 


The error message tells you something else. 
Read it, and you will understand! 


> How can i configure a secondary susdump space in case the primary 
> would be unavailable? 


sysdumpdev -s /dev/whatever 


> What does "copy directory /var/adm/ras" mean? 


That's where the crash dump will be put when you reboot after the crash. 
/dev/hd6 will be needed for other purposes (paging space), so you cannot 
keep your system dump there. 


And that file system is too small to contain the dump, that's the meaning 
of the error message. 


You have two options: 


- increase the /var file system (it should have ample free space anyway). 
- change the dump directory to something where you have more space: 
  sysdumpdev -D /something/in/rootvg/with/free/space 


Yours, 
Laurenz Albe 


Note 2:
-------

Suppose you find the following error:

$ errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
F89FB899   0822150005 P O dumpcheck      The copy directory is too small

This message is the result of a dump device check. You can fix this by 
increasing the size of your dump device. If you are using the default 
dump device (/dev/hd6) then increase your paging size or go to smit dump 
and "select System Dump Compression". Myself, I don't like to use the 
default dump device so I create a sysdumplv and make sure I have enough 
space. To check space needed go to smit dump and select "Show Estimated 
Dump Size" this will give you an idea about the size needed.

The copy directory is whatever sysdumpdev says it is.
Run sysdumpdev and you will get something like
#sysdumpdev
primary              /dev/hd6
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON

# sysdumpdev -e
0453-041 Estimated dump size in bytes: 57881395
Divide this number by 1024.  This is the free space that is needed in 
your copy directory.  Compare it to a df -k or divide this number by 
512.  This is the free space that is needed in your copy directory.  
Compare it to a df



Note 2:
-------

Suppose you find the following error:

selalbe@wijting:/home/beab_krn/selalbe $ errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
E87EF1BE   0309150009 P O dumpcheck      The largest dump device is too small.


thread:

do sysdumpdev -l you should see both primary and secondary dump devices from
this you need to ensure that these are big enough to hold a system dump so
type sysdumpdev -e to get an estimate on the dump size and resize your dump
devices accordingly. 

Try to increase these above the value you have if it is a new system allow
for growth of the system and give it plenty of space if possible

thread:








IZ05158: POSSIBLE SYSTEM CRASH AFTER PCI BUS ERROR AFFECTING FC ADAPTER APPLIES TO AIX 5300-06
  

 A fix is available 
Obtain fix for this APAR
 


APAR status
Closed as program error.

Error description 
A system crash can occur if a Fibre Channel adapter
suffers PCI bus errors around the time of an adapter
reset.

Adapter resets are performed at adapter configuration
time or as recovery from certain severe errors.  Adapter
resets are not performed as a result of normal SAN or
storage communications failures.  End device or SAN
problems would not expose this problem.

PCI bus errors are usually accompanied by errors with
label "PCI_RECOVERABLE_ERR" in the AIX error log.
Systems logging those errors may be susceptible to this
problem.

The stack traceback of the crash as seen in kdb will be
similar to the following, and will include
efc_finish_read_rev_mb() calling efc_read_reg():

(2)> f
pvthread+800000 STACK:
[03CE5678]efc_read_reg+000048 ()
[03CFF738]efc_finish_read_rev_mb+000290 ()
[03CF1154]efc_adap_post_trb+00049C ()
[00031F3C]clock+00017C ()
[000DF250]i_softmod+00027C ()
[000DE928].finish_interrupt+000024 ()
Local fix 
Respond promptly to any PCI_RECOVERABLE_ERR errors that
appear in the AIX error log.  It will not be possible to
respond quickly enough to prevent a crash associated with
a particular error, but addressing the underlying problem
causing repeated PCI errors will lessen the risk to the
environment.
Problem summary 
A system crash can occur if a Fibre Channel adapter
suffers PCI bus errors around the time of an adapter
reset.
Problem conclusion 
The handling of possible Errorneous value read around EEH
event introduced in the relevant code path.
Temporary fix 
Comments 
5300-06 - use AIX APAR IZ05158
5300-07 - use AIX APAR IZ25395
5300-08 - use AIX APAR IZ24657
5300-09 - use AIX APAR IZ19062
6100-00 - use AIX APAR IZ23541
6100-01 - use AIX APAR IZ19769
6100-02 - use AIX APAR IZ19384
APAR information 
APAR number IZ05158 
Reported component name AIX 5.3 
Reported component ID 5765G0300 
Reported release 530 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2007-09-21 
Closed date 2008-03-18 
Last modified date 2008-11-17 

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:
IZ19062 IZ19384 IZ19769 IZ23541 IZ24657 IZ25395

Publications Referenced


Fix information 
Fixed component name AIX 5.3 
Fixed component ID 5765G0300 

Applicable component levels 
R530 PSY U816768    UP08/07/24 I 1000 

PTF to Fileset Mapping 
U816768 devices.pci.df1000f7.com 5.3.0.68 
 






22. Diagnostic output:
======================

0:Standard input  1: Standard output  2: Diagnostic output

redirect diag. outp. to file
# cat somefile nofile 2>errfile              
# cat somefile nofile > outfile 2>errfile

redirect diag. outp. to same place as standard outp.
# cat firsthalf secondhalf > composite 2>1&  


23. DOS2UNIX:
=============

If you want to convert a ascii PC file to unix, you can use many tools like tr etc..

# tr -d '\r' < original.file > new.file

# tr -d '\015' < original.file > new.file

Or scripts like:

#!/bin/sh
perl -p -i -e 'BEGIN { print "Converting DOS to UNIX.\n" ; } END { print "Done.\n" ; } s/\r\n$/\n/' $*

perl -p -i.bak -e 's/^\r+//;s/\r+$//;s/\r/\n/gs' file

Or, on many unixes You can use the utility "  dos2unix " to remove the ^M
Just type:  dos2unix <filename1> <filename2>  [RETURN]


dos2unix [ -ascii ] [ -iso ] [ -7 ] originalfile convertedfile 

-ascii 
Removes extra carriage returns and converts end of file characters in DOS format text files to conform to SunOS requirements. 
-iso 
This is the default. It converts characters in the DOS extended character set to the corresponding ISO standard characters. 
-7 
Convert 8 bit DOS graphics characters to 7 bit space characters so that SunOS can read the file. 


#!/bin/sh
# a script to strip carriage returns from DOS text files
if test -f $1
then
	tr -d '\r' <$1 >$.tmp
	rm $1
	mv $.tmp $1
fi

# tr -d '\015' < original.file > new.file

or this works well on most shells:

# tr -d '\r' < original.file > new.file


Note: Other formats on AIX:
---------------------------

1. nvdmetoa command:

How to convert EBCDIC files to ASCII:

On your AIX system, the tool nvdmetoa might be present.

Examples:
 
nvdmetoa <AS400.dat  >AIXver3.dat 

Converts an EBCDIC file taken off an AS400 and converts to an ASCII file for the pSeries or RS/6000 

nvdmetoa 132 <AS400.txt  >AIXver3.txt 

Converts an EBCDIC file with a record length of 132 characters to an ASCII file with 132 bytes per line 
PLUS 1 byte for the linefeed character. 


2. od command:

The od command translate a file into other formats, like for example hexadecimal format.
To translate a file into several formats at once, enter: 

# od -t cx a.out > a.xcd

This command writes the contents of the a.out file, in hexadecimal format (x) and character format (c), 
into the a.xcd file. 



24. Secure shell connections:
=============================

ssh:
====


What is Open Secure Shell?

Open Secure Shell (OpenSSH) is an open source version of the SSH protocol suite of network connectivity tools. 
The tools provide shell functions that are authenticated and encrypted. A shell is a command language interpreter 
that reads input from a command line string, stdin or a file. Why use OpenSSH? When you're running over 
unsecure public networks like the Internet, you can use the SSH command suite instead of the unsecure commands telnet, 
ftp, and r-commands.

OpenSSH delivers code that communicates using SSH1 and SSH2 protocols. What's the difference? The SSH2 protocol 
is a re-write of SSH1. SSH2 contains separate, layered protocols, but SSH1 is one large set of code. SSH2 supports 
both RSA & DSA keys, but SSH1 supports only RSA, and SSH2 uses a strong crypto integrity check, where SSH1 uses 
a CRC-32 check. The Internet Engineering Task Force (IETF) maintains the secure shell standards.



Example 1:
----------

Go to a terminal on your local Unix system (Solaris, Linux, Mac OS X, etc.) and type the following command:

ssh -l username acme.gatech.edu

Replace "username" with your Prism ID. If this is your first time connecting to acme, you will see 
a warning similar to this:

  The authenticity of host 'acme.gatech.edu (130.207.165.23)' can't be established.
  DSA key fingerprint is 72:ce:63:c5:86:3a:cb:8c:cb:43:6c:da:00:0d:4c:1f.
  Are you sure you want to continue connecting (yes/no)?

Type the word "yes" and hit <ENTER>. You should see the following warning:

  Warning: Permanently added 'acme.gatech.edu,130.207.165.23' (DSA) to the list of
  known hosts.
  
Next, you will be prompted for your password. Type your password and hit <ENTER>. 


Example 2:
----------

A secure shell 'terminal':

# ssh -l oracle 193.172.126.193
# ssh oracle@193.172.126.193


pscp:
=====

Example to Copy a file to a remote unix server:

# pscp c:\documents\foo.txt fred@example.com:/tmp/foo

To receive (a) file(s) from a remote server: 

pscp [options] [user@]host:source target
So to copy the file /etc/hosts from the server example.com as user fred to the file c:\temp\example-hosts.txt, 
you would type: 

pscp fred@example.com:/etc/hosts c:\temp\example-hosts.txt

To send (a) file(s) to a remote server: 

pscp [options] source [source...] [user@]host:target
So to copy the local file c:\documents\foo.txt to the server example.com as user 
fred to the file /tmp/foo you would type: 

pscp c:\documents\foo.txt fred@example.com:/tmp/foo

You can use wildcards to transfer multiple files in either direction, like this: 

pscp c:\documents\*.doc fred@example.com:docfiles
pscp fred@example.com:source/*.c c:\source


  Example of scripts using pscp with parameters;

  ------------------------------------
  @echo off

  REM Script om via pscp.exe een bestand van een UNIX systeem te copi%ren naar het werkstation.
  
  Echo Copy bestand van unix naar werkstation 

  SET /P systemname=Geef volledige systeemnaam:
  SET /P remotefile=Geef UNIX path+filename:
  SET /P localfile=Geef local filename:
  SET /P username=Geef username:

  echo pscp.exe %username%@%systemname%:%remotefile% %localfile%

  pscp.exe %username%@%systemname%:%remotefile% %localfile%

  echo bestand %remotefile% gecopieerd naar %localfile%
  pause

  ------------------------------------

  @echo off

  REM Script om via pscp.exe een bestand naar een UNIX systeem te copi%ren van het werkstation.
  
  Echo Copy bestand van werkstation naar unix

  SET /P systemname=Geef volledige systeemnaam:
  SET /P localfile=Geef local filename:
  SET /P remotefile=Geef UNIX path+filename:
  SET /P username=Geef username:

  echo pscp.exe %localfile% %username%@%systemname%:%remotefile% 
  pscp.exe %localfile% %username%@%systemname%:%remotefile% 
  echo bestand %localfile% gecopieerd naar %remotefile%
  pause
  ------------------------------------

scp:
====

Scp is a utility which allows files to be copied between machines. Scp is an updated version of an 
older utility named Rcp. It works the same, except that information (including the password used to log in) 
is encrypted. Also, if you have set up your .shosts file to allow you to ssh between machines 
without using a password as described in help on setting up your .shosts file, you will be able to scp 
files between machines without entering your password. 

Either the source or the destination may be on the remote machine; i.e., you may copy files or directories 
into the account on the remote system OR copy them from the account on the remote system into the account 
you are logged into. 

Example:
# scp conv1.tar.gz bu520@192.168.2.2:/backups/520backups/splenvs
# scp conv2.tar.gz bu520@192.168.2.2:/backups/520backups/splenvs


Example: 
# scp myfile xyz@sdcc7:myfile

Example: 	
To copy a directory, use the -r (recursive) option. 
# scp -r mydir xyz@sdcc7:mydir

Example: 
cd /oradata/arc
/usr/local/bin/scp *.arc  SPRAT:/oradata/arc

Example:
While logged into xyz on sdcc7, copy file "letter" into file "application" in remote account abc on sdcc3: 
% scp letter abc@sdcc3:application

While logged into abc on sdcc3, copy file "foo" from remote account xyz on sdcc7 into filename "bar" in abc: 
% scp xyz@sdcc7:foo bar



passwordless using ssh/scp between two (or more) hosts:
=======================================================

1. decide which useraccount to use (on all hosts), and logon to the local host with that account

2. Generate a public/private key pair on the local machine
   
ssh-keygen -t dsa    (or 'ssh-keygen -t rsa')  (rsa=protocol 1; dsa=protocol 2)

In response, you should see:
     
     Generating public/private dsa key pair
     Enter file in which to save the key ... 

Press Enter to accept this.

In response, you should see:

     Enter passphrase (empty for no passphrase):

You don't need a passphrase, so press Enter twice.

In response, you should see:

     Your identification has been saved in ... 
     Your public key has been saved in ... 

3. Note the name and location of the public key just generated. It always ends in .pub.

4. Change the permissions of the generated .pub file to 600, for example chmod 600 id_dsa.pub (or 700).
   In effect, make sure that no group, or everyone (world), has any access to the file. 

5. Copy the public key just generated to all of your remote boxes. You can use scp or FTP 
   or whatever to make the copy. 
   if you are logging in as a user, for example, albert, you should copy it to
   "/home/albert/.ssh/authorized_keys". But (!) first check whether that file already exists.
   If the file already exists and contains text, 
   you need to append the contents of your public key file to what already is there.

That should do the job.

Now you can use statements like, for example

albert@hosta:/tmp$> scp testfile albert@hostb:/tmp

without being prompted for a password.

If you want to do the same for scp from hostb to hosta, perform the same steps again, but now
ofcourse with the serverroles reversed.


Notes:

1. If it doesn't work, try changing the authorized_keys file name to authorized_keys2, 
or ask your system administrator what file name is ssh actually using. 

2. The name of the target server must have been registered in the "known_hosts" file in the .ssh directory. 
This can be done with a regular (with password) ssh connection, and accepting the host "as known". 

3. SSH protocol 2 is assumed in this procedure (it uses dsa keys). If your ssh configuration not uses this as a default, 
you may have to force it with the -2 option of the ssh and scp.

 


ssh on AIX:
===========

After you download the OpenSSL package, you can install OpenSSL and OpenSSH.

Install the OpenSSL RPM package using the geninstall command: 

# geninstall -d/dev/cd0 R:openssl-0.9.6m

Output similar to the following displays: 
SUCCESSES
---------
openssl-0.9.6m-3

Install the OpenSSH installp packages using the geninstall command: 
# geninstall -I"Y" -d/dev/cd0 I:openssh.base

Use the Y flag to accept the OpenSSH license agreement after you have reviewed the license agreement. 
(Note: we have seen this line as well: 
# geninstall -Y -d/dev/cd0 I:openssh.base)

Output similar to the following displays: 

Installation Summary                                                           
--------------------                                                           
Name                        Level           Part        Event       Result     
-------------------------------------------------------------------------------
openssh.base.client         3.8.0.5200      USR         APPLY       SUCCESS    
openssh.base.server         3.8.0.5200      USR         APPLY       SUCCESS    
openssh.base.client         3.8.0.5200      ROOT        APPLY       SUCCESS    
openssh.base.server         3.8.0.5200      ROOT        APPLY       SUCCESS     

You can also use the SMIT install_software fast path to install OpenSSL and OpenSSH.

The following OpenSSH binary files are installed as a result of the preceding procedure:

scp File copy program similar to rcp 
sftp Program similar to FTP that works over SSH1 and SSH2 protocol 
sftp-server SFTP server subsystem (started automatically by sshd daemon) 
ssh Similar to the rlogin and rsh client programs 
ssh-add Tool that adds keys to ssh-agent 
ssh-agent An agent that can store private keys 
ssh-keygen Key generation tool 
ssh-keyscan Utility for gathering public host keys from a number of hosts 
ssh-keysign Utility for host-based authentication 
ssh-rand-helper A program used by OpenSSH to gather random numbers. It is used only on AIX 5.1 installations. 
sshd Daemon that permits you to log in 

The following general information covers OpenSSH: 
The /etc/ssh directory contains the sshd daemon and the configuration files for the ssh client command. 
The /usr/openssh directory contains the readme file and the original OpenSSH open-source license text file. 
This directory also contains the ssh protocol and Kerberos license text. 

The sshd daemon is under AIX SRC control. You can start, stop, and view the status of the daemon 
by issuing the following commands: 

startsrc -s sshd   OR startsrc -g ssh  (group)
stopsrc -s sshd    OR stopsrc -g ssh
lssrc -s sshd      OR lssrc -s ssh




Automatic startup of sshd on boot:
----------------------------------

For example, on AIX create the following script "Sssh" in /etc/rc.d/rc2.d

root@zd110l14:/etc/rc.d/rc2.d#cat Ssshd
#!/bin/ksh

##################################################
# name: Ssshd
# purpose: script that will start or stop the sshd daemon.
##################################################

case "$1" in
start )
        startsrc -g ssh
        ;;
stop )
        stopsrc -g ssh
        ;;
* )
        echo "Usage: $0 (start | stop)"
        exit 1
esac





25. Pipelining and Redirecting:
===============================

CONCEPT: UNIX allows you to connect processes, by letting the standard output of one process feed into the 
standard input of another process. That mechanism is called a pipe. 
Connecting simple processes in a pipeline allows you to perform complex tasks without writing complex programs. 

EXAMPLE: Using the more command, and a pipe, send the contents of your .profile and .shrc files to the 
screen by typing 

cat .profile .shrc | more
to the shell. 

EXERCISE: How could you use head and tail in a pipeline to display lines 25 through 75 of a file? 

ANSWER: The command 

cat file | head -75 | tail -50

would work. The cat command feeds the file into the pipeline. The head command gets the first 75 lines 
of the file, and passes them down the pipeline to tail. The tail command then filters out all but the last 
50 lines of the input it received from head. It is important to note that in the above example, tail never 
sees the original file, but only sees the part of the file that was passed to it by the head command. 
It is easy for beginners to confuse the usage of the input/output redirection symbols < and >, with the 
usage of the pipe. Remember that input/output redirection connects processes with files, while the pipe connects 
processes with other processes. 

Grep
The grep utility is one of the most useful filters in UNIX. Grep searches line-by-line for a specified pattern, 
and outputs any line that matches the pattern. The basic syntax for the grep command is 
grep [-options] pattern [file]. If the file argument is omitted, grep will read from standard input.
 It is always best to enclose the pattern within single quotes, to prevent the shell 
from misinterpreting the command. 

The grep utility recognizes a variety of patterns, and the pattern specification syntax was taken from the 
vi editor. Here are some of the characters you can use to build grep expressions: 

The carat (^) matches the beginning of a line. 
The dollar sign ($) matches the end of a line. 
The period (.) matches any single character. 
The asterisk (*) matches zero or more occurrences of the previous character. 
The expression [a-b] matches any characters that are lexically between a and b. 

EXAMPLE: Type the command 

grep 'jon' /etc/passwd

to search the /etc/passwd file for any lines containing the string "jon". 

EXAMPLE: Type the command 

grep '^jon' /etc/passwd
to see the lines in /etc/passwd that begin with the character string "jon". 

EXERCISE:List all the files in the /tmp directory owned by the user root. 

EXPLANATION: The command 

ls -l /tmp | grep 'root'
would show all processes with the word "root" somewhere in the line. That doesn't necessarily mean that 
all the process would be owned by root, but using the grep filter can cut the down the number of processes 
you will have to look at. 


Redirecting:
------------

CONCEPT: Every program you run from the shell opens three files: Standard input, standard output, 
and standard error. The files provide the primary means of communications between the programs, 
and exist for as long as the process runs. 

The standard input file provides a way to send data to a process. As a default, the standard input is read 
from the terminal keyboard. 

The standard output provides a means for the program to output data. As a default, the standard output 
goes to the terminal display screen. 

The standard error is where the program reports any errors encountered during execution. 
By default, the standard error goes to the terminal display. 

CONCEPT: A program can be told where to look for input and where to send output, using input/output 
redirection. UNIX uses the "less than" and "greater than" special characters (< and >) to signify input 
and output redirection, respectively. 


Redirecting input
Using the "less-than" sign with a file name like this: 
< file1 

in a shell command instructs the shell to read input from a file called "file1" instead of from the keyboard. 

EXAMPLE:Use standard input redirection to send the contents of the file /etc/passwd to the more command: 

more < /etc/passwd 

Many UNIX commands that will accept a file name as a command line argument, will also accept input from 
standard input if no file is given on the command line. 

EXAMPLE: To see the first ten lines of the /etc/passwd file, the command: 

head /etc/passwd 
will work just the same as the command: 
head < /etc/passwd 

Redirecting output
Using the "greater-than" sign with a file name like this: 
> file2 
causes the shell to place the output from the command in a file called "file2" instead of on the screen. 
If the file "file2" already exists, the old version will be overwritten. 

EXAMPLE: Type the command 

ls /tmp > ~/ls.out

to redirect the output of the ls command into a file called "ls.out" in your home directory. 
Remember that the tilde (~) is UNIX shorthand for your home directory. In this command, the ls command 
will list the contents of the /tmp directory. 
Use two "greater-than" signs to append to an existing file. For example: 

>> file2 

causes the shell to append the output from a command to the end of a file called "file2". If the file 
"file2" does not already exist, it will be created. 

EXAMPLE: In this example, I list the contents of the /tmp directory, and put it in a file called myls. 
Then, I list the contents of the /etc directory, and append it to the file myls: 

ls /tmp > myls 
ls /etc >> myls 

Redirecting error
Redirecting standard error is a bit trickier, depending on the kind of shell you're using 
(there's more than one flavor of shell program!). In the POSIX shell and ksh, redirect the standard error 
with the symbol "2>". 

EXAMPLE: Sort the /etc/passwd file, place the results in a file called foo, and trap any errors in a file 
called err with the command: 

sort < /etc/passwd > foo 2> err 




===========================
27. UNIX DEVICES and mknod:
===========================


27.1 Note 1:
============

the files in the /dev directory are a little different from anything you may be used to in 
other operating systems. 
The very first thing to understand is that these files are NOT the drivers for the devices. Drivers are in 
the kernel itself (/unix etc..), and the files in /dev do not actually contain anything at all: 
they are just pointers to where the driver code can be found in the kernel. There is nothing more to it 
than that. These aren't programs, they aren't drivers, they are just pointers. 

That also means that if the device file points at code that isn't in the kernel, it obviously is not 
going to work. Existence of a device file does not necessarily mean that the device code is in the kernel, 
and creating a device file (with mknod) does NOT create kernel code. 

Unix actually even shows you what the pointer is. When you do a long listing of a file in /dev, 
you may have noticed that there are two numbers where the file size should be: 


brw-rw-rw-   2 bin      bin        2, 64 Dec  8 20:41 fd0

That "2,64" is a pointer into the kernel. I'll explain more about this in a minute, 
but first look at some more files: 

brw-rw-rw-   2 bin      bin        2, 64 Dec  8 20:41 fd0
brw-rw-rw-   2 bin      bin        2, 48 Sep 15 16:13 fd0135ds15
brw-rw-rw-   2 bin      bin        2, 60 Feb 12 10:45 fd0135ds18
brw-rw-rw-   1 bin      bin        2, 16 Sep 15 16:13 fd0135ds21
brw-rw-rw-   2 bin      bin        2, 44 Sep 15 16:13 fd0135ds36
brw-rw-rw-   3 bin      bin        2, 36 Sep 15 16:13 fd0135ds9

A different kind of device would have a different major number. For example, here are the serial com ports: 

crw-rw-rw-   1 bin      bin        5,128 Feb 14 05:35 tty1A
crw-rw-rw-   1 root     root       5,  0 Dec  9 13:13 tty1a
crw-rw-rw-   1 root     sys        5,136 Nov 25 07:28 tty2A
crw-r--r--   1 uucp     sys        5,  8 Nov 25 07:16 tty2a

Notice the "b" and the "c" as the first characters in the mode of the file. It designates whether
we have a block "b", or a character "c" device.

Notice that each of these files shares the "5" part of the pointer, but that the other number is different. 
The "5" means that the device is a serial port, and the other number tells exactly which com port you are 
referring to. In Unix parlance, the 5 is the "major number" and the other is the "minor number". 

These numbers get created with a "mknod" command. For example, you could type "mknod /dev/myfloppy b 2 60" and 
then "/dev/myfloppy" would point to the same driver code that /dev/fd0135ds18 points to, and it would 
work exactly the same. 

This also means that if you accidentally removed /dev/fd0135ds18, you could instantly recreate it with "mknod". 

But if you didn't know that the magic numbers were "2,60", how could you find out? 

It turns out that it's not hard. 

First, have a look at "man idmknod". The idmknod command wipes out all non-required devices, and then recreates them. 
Sounds scary, but this gets called every time you answer "Y" to that "Rebuild Kernel environment?" question that 
follows relinking. Actually, on 5.0.4 and on, the existing /dev files don't get wiped out; the command simply 
recreates whatever it has to. 

idmknod requires several arguments, and you'd need to get them right to have success. You could make it easier 
by simply relinking a new kernel and answering "Y" to the "Rebuild" question, but that's using a fire hose to 
put out a candle. 

A less dramatic method would be to look at the files that idmknod uses to recreate the device nodes. These are found 
in /etc/conf/node.d 

In this case, the file you want would be "fd". A quick look at part of that shows: 

fd	fd0		b	64	bin	bin	666
fd	fd0135ds36	b	44	bin	bin	666
fd	fd0135ds21	b	16	bin	bin	666
fd	fd0135ds18	b	60	bin	bin	666
fd	fd0135ds15	b	48	bin	bin	666
fd	fd0135ds9	b	36	bin	bin	666
fd	fd048		b	4	bin	bin	666

This gives you *almost* everything you need to know about the device nodes in the "fd" class. The only thing it 
doesn't tell you is the major number, but you can get that just by doing an "l" of any other fd entry: 

brw-rw-rw-   1 bin      bin        2, 60 Feb  5 09:45 fd096ds18

this shows you that the major number is "2". 

Armed with these two pieces of information, you can now do 

mknod /dev/fd0135ds18 b 2 60
chown bin /dev/fd0135ds18
chgrp bin /dev/fd0135ds18
chmod 666 /dev/fd0135ds18

If you examined the node file closely, you would also notice that /dev/rfd0135ds18 and /dev/fd0135ds18 differ only 
in that the "r" version is a "c" or character device and the other is "b" or block. If you had already known that, 
you wouldn't have even had to look at the node file; you'd simply have looked at an "l" of the /dev/rfd0135ds18 and 
recreated the block version appropriately. 

There are other fascinating things that can be learned from the node files. For example, fd096ds18 is also minor number 60, 
and can be used in the same way with identical results. In other words, if you z'd out (were momentarily innattentive, 
not CTRL-Z in a job control shell) and dd'd an image to /dev/fd096ds18, it would write to your hd floppy without incident. 

If you have a SCSI tape drive, notice what happens when you set it to be the "default" tape drive. 
It creates device files that have different names (rct0, etc.) but that have the same major and minor numbers. 

Knowing that it's easy to recreate missing device files also means that you can sometimes capture the output 
of programs that write directly to a device. For example, suppose some application prints directly to /dev/lp 
but you need to capture this to a file. In most situations, you can simply "rm /dev/lp" (after carefully noting 
its current ownership, permissions and, of course, major/minor numbers), and then "touch /dev/lp" to create an 
ordinary file. You'll need to chmod it for appropriate permissions, and then run your app. Unless the app has 
tried to do ioctl calls on the device, the output will be there for your use. This can be particularly useful 
for examining control characters that the app is sending. 

What's the Difference?
One question that comes up fairly often is "what's the difference between a block and a character device and when 
should I use one rather than the other?". To answer that question fully is hard, but I'm going to try to at least 
get you started here. 

The real difference lies in what the kernel does when a device file is accessed for reading or writing. If the device 
is a block device, the kernel gives the driver the address of a kernel buffer that the driver will use as the source 
or destination for data. Note that the address is a "kernel" address; that's important because that buffer will be 
cached by the kernel. If the device is raw , then the address it will use is in the user space of the process that is 
using the device. A block device is something you could make a filesystem on (a disk). You can move forward and backward, 
from the beginning of a block device to its end, and then back to the beginning again. If you ask to read a block that 
the kernel has buffered, then you get data from the buffer. If you ask for a block that has not yet been buffered, 
the kernel reads that block (and probably a few more following it) into the buffer cache. If you write to a block device, 
it goes to the buffer cache (eventually to the device, of course). A raw (or character) device is often something that 
doesn't have a beginning or end; it just gives a stream of characters that you read. A serial port is an excellent 
example- however, it is not at all unusual to have character (raw) drivers for things that do have a beginning 
and an end- a tape drive, for example. And many times there are BOTH character and block devices for the same 
physical device- disks, for example. Nor does using a raw device absolutely mean that you can't move forward and back, 
from beginning to end- you can move wherever you want with a tape or /dev/rfd0. 

And that's where the differences get confusing. It seems pretty reasonable that you'd use the block device to mount 
a disk. But which do you use for format? For fsck? For mkfs? 

Well, if you try to format /dev/fd0135ds18, you'll be told that it is not a formattable device. 
Does that make any sense? Well, the format process involves sequential access- it starts at the beginning and just 
keeps on going, so it seems to make sense that it wouldn't use the block device. But you can run "mkfs" on either 
the block or character device; it doesn't seem to care. The same is true for fsck. But although that's true for those 
programs on SCO OSR5, it isn't necessarily going to be true on some other UNIX, and the "required" device may make sense 
to whover wrote the program, but it may not make sense to you. 

You'd use a block device when you want to take advantage of the caching provided by the kernel. You'd use the raw device 
when you don't, or for ioctl operations like "tape status" or "stty -a". 


27.2 Note 2:
============


One of the unique things about Unix as an operating system is that regards everything as a file. Files can be divided into 
three categories; ordinary or plain files, directories, and special or device files.

Directories in Unix are properly known as directory files. They are a special type of file that holds a list of the 
other files they contain. 

Ordinary or plain files in Unix are not all text files. They may also contain ASCII text, binary data, and program input 
or output. Executable binaries (programs) are also files, as are commands. When a user enters a command, the associated 
file is retrieved and executed. This is an important feature and contributes to the flexibility of Unix.

Special files are also known as device files. In Unix all physical devices are accessed via device files; they are 
what programs use to communicate with hardware. Files hold information on location, type, and access mode for a 
specific device. There are two types of device files; character and block, as well as two modes of access.

- Block device files are used to access block device I/O. Block devices do buffered I/O, meaning that the the data is 
  collected in a buffer until a full block can be transfered.

- Character device files are associated with character or raw device access. They are used for unbuffered data transfers 
  to and from a device. Rather than transferring data in blocks the data is transfered character by character. 
  One transfer can consist of multiple characters.

So what about a device that could be accessed in character or block mode? How many device files would it have? 

One. 
Two. 
There are no such devices. 

Some devices, such as disk partitions, may be accessed in block or character mode. Because each device file corresponds 
to a single access mode, physical devices that have more than one access mode will have more than one device file.

Device files are found in the /dev directory. Each device is assigned a major and minor device number. The major 
device number identifies the type of device, i.e. all SCSI devices would have the same number as would all the keyboards. 
The minor device number identifies a specific device, i.e. the keyboard attached to this workstation.

Device files are created using the mknod command. The form for this command is:

mknod device-name type major minor 

device-name is the name of the device file 
type is either "c" for character or "b" for block 
major is the major device number 
minor is the minor device number 
The major and minor device numbers are indexed to device switches. There are two types of device switches; c
devsw for character devices and bdevsw for block devices. These switches are kernel structures that hold the names 
of all the control routines for a device and tell the kernel which driver module to execute. Device switches are 
actually tables that look something like this:

0 keyboard 
1 SCSIbus 
2 tty 
3 disk 
Using the ls command in the /dev directory will show entries that look like:

brw-r----- 1 root sys 1, 0 Aug 31 16:01 /dev/sd1a 

The "b" before the permissions indicates that this is a block device file. When a user enters /dev/sd1a the kernel sees 
the file opening, realizes that it's major device number 1, and calls up the SCSIbus function to handle it.



====================
28. Solaris devices:
====================

Devices are described in three ways in the Solaris environment, using three distinct naming
conventions: the physical device name, the instance name, and the logical device name.

Solaris stores the entries for physical devices under the /devices directory, 
and the logical device entries behind the /dev directory.


- A "physical device name" represents the full pathname of the device. 
  Physical device files are found in the /devices directory and have a
  naming convention like the following example:

  /devices/sbus@1,f8000000/esp@0,40000/sd@3,0:a

  Each device has a unique name representing both the type of device and the location of that device
  in the system-addressing structure called the "device tree". The OpenBoot firmware builds the 
  device tree for all devices from information gathered at POST. The device tree is loaded in memory
  and is used by the kernel during boot to identify all configured devices.
  A device pathname is a series of node names separated by slashes. 
  Each device has the following form: 
  
  driver-name@unit-address:device-arguments


- The "instance name" represents the kernel's abbreviated name for every possible device
  on the system. For example, sd0 and sd1 represents the instance names of two SCSI disk devices.
  Instance names are mapped in the /etc/path_to_inst file, and are displayed by using the
  commands dmesg, sysdef, and prtconf

- The "Logical device names" are used with most Solaris file system commands to refer to devices.
  Logical device files in the /dev directory are symbolically linked to physical device files
  in the /devices directory. Logical device names are used to access disk devices in the
  following circumstances:
  - adding a new disk to the system and partitioning the disk
  - moving a disk from one system to another
  - accessing or mounting a file system residing on a local disk
  - backing up a local file system
  - repairing a file system

  Logical devices are organized in subdirs under the /dev directory by their device types
  /dev/dsk    block interface to disk devices
  /dev/rdsk   raw or character interface to disk devices. 
              In commands, you mostly use raw logical devices, like for example # newfs /dev/rdsk/c0t3d0s7
  /dev/rmt    tape devices
  /dev/term   serial line devices 
  etc..

  Logical device files have a major and minor number that indicate device drivers, 
  hardware addresses, and other characteristics.
  Furthermore, a device filename must follow a specific naming convention.
  A logical device name for a disk drive has the following format:

  /dev/[r]dsk/cxtxdxsx

  where cx refers to the SCSI controller number, tx to the SCSI bus target number,
  dx to the disk number (always 0 except on storage arrays)
  and sx to the slice or partition number.

  

===========================
29. filesystems in Solaris:
===========================


29.1 A few traditional filesystem commands:
===========================================

The UFS filesystem has always been the most popular fs on Solaris.
Ofcourse, when the newer ZFS filesystem became available, it has been rapidly adopted.

We will frst take a look at a few classical commands, that you would typically use on a UFS filesystem.
Ofcourse, many "listing commands" like for example, df (to show what's used and what is free space), 
can be used on ZFS as well. But creating an fs on ZFS goes absolutly different from what you can find in section 29.1


Checks on the filesystems in Solaris:
-------------------------------------

1. used space etc.. 
#  df -k, df -h etc..

# du -ks /home/fred 

Shows only a summary of the disk usage of the /home/fred subdirectory (measured in kilobytes).

# du -ks /home/fred/* 

Shows a summary of the disk usage of each subdirectory of /home/fred (measured in kilobytes).

# du -s /home/fred

Shows a total summary of /home/fred

# du -sg /data

Shows a total summary of /data in GB


This command shows the diskusage of /dirname in GB
# du -g /dirname

2. examining the disklabel
#  prtvtoc /dev/rdisk/c0t3d0s2

3. format just by itself shows the disks
#  format

#  format -> specify disk -> choose partition -> choose print to get the partition table

4. Display information about SCSI devices

# cfgadm -al

or, from the PROM, commands like probe-scsi


What is the CDROM device in Solaris:
------------------------------------

-- pointer 1.

If you have a CD put in the drive, and it was automounted, simply use the "df" command to view your filesystems:

# df -k    or df -h

-- pointer 2.

From the output of the command

# iostat -En

you could figure out what logical device name your CDROM has.

-- pointer 3.

Solaris uses the same naming conventions as used with hardisks, for example the CDROM in the following command

# mount -r -F hsfs /dev/dsk/c0t6d0s2 /cdrom

means that in this case, the CDROM device is "/dev/dsk/c0t6d0s2"
Normally, a CD is automounted on "/cdrom" or "/cdrom/cdrom0"

The simplest way to mount CDROM on Solaris is use vold daemon.  The vold daemon in Solaris manages the CD-ROM device 
and automatically performs the mounting similar to how Windows manages CDROMs (but not as transparent or reliable). 
If CD is detected in drive its should be  automatically mounted to the /cdrom/cdrom0 directory. 


Recovering disk partition information in Solaris:
-------------------------------------------------

Use the fmthard command to write the backup VTOC information back to the disk.
The following example uses the fmthard command to recover a corrupt label on a disk
named /dev/rdisk/c0t3d0s1. The backup VTOC information is in a file named c0t3d0
in the /vtoc directory.

# fmthard -s /vtoc/c0t3d0s0 /dev/rdsk/c0t3d0s2

Remember that the format of /dev/(r)dsk/cWtXdYsZ means:

W is the controller number,
X is the SCSI target number,
Y is the logical unit number (LUN, almost always 0),
Z is the slice or partition number

Make a new filesystem in Solaris:
---------------------------------

To create a UFS filesystem on a formatted disk that already has been divided into slices
you need to know the raw device filename of the slice that will contain the filesystem.
Example:

# newfs /dev/rdsk/c0t3d0s7

defaults on UFS on Solaris: 
blocksize 8192
fragmentsize 1024
one inode for each 2K of diskspace

FSCK in Solaris:
----------------

If you just want to determine the state of a filesystem, whether it needs checking, 
you can use the fsck command while the fs is mounted.
Example:

# fsck -m /dev/rdsk/c0t0d0s6

The state flag in the superblock of the filesystem you specify is checked to see
whether the filesystem is clean or requires checking.

If you ommit the device argument, all the filesystems listed in /etc/vfstab  with a fsck 
pass value greater than 0 are checked.


Adding a disk in Solaris 2.6, 2.7, 8, 9:
----------------------------------------

In case you have just build in a new disk,
its probably best, to first use the probe-scsi command from the OK prompt:

ok probe-scsi
..
Target 3
 Unit 0  Disk   Seagate ST446452W   0001
..

Next, do a reconfiguration reboot, with the "boot -r" command:

ok boot -r

Specifying the -r flag when booting, tells Solaris to reconfigure itself by scanning
for new hardware.
Once the system is up, check the output for "dmesg" to find kernel messages relating
to the new disk.
You probably find complaints telling you stuff as "corrupt label - wrong magic number" etc..
That's good, because we now know that the kernel is aware of this new disk.

In this example, our disk is SCSI target 3, so we can refer to the whole disks as
/dev/rdsk/c0t3d0s2           # slice 2, or partition 2, s2 refers to the whole disk


Remember that the format of /dev/(r)dsk/cWtXdYsZ means:

W is the controller number,
X is the SCSI target number,
Y is the logical unit number (LUN, almost always 0),
Z is the slice or partition number


We now use the format program to partition the disk, and afterwards create filesystems.

# format /dev/rdsk/c0t3d0s2
(.. output..)
FORMAT MENU:

format>label
Ready to label disk, continue? y

format>partition 
PARTITION MENU:

partition>

Once you have created and sized the partitions, you can get a list with the "partition>print" command.

Now, for example, you can create a filesystem like in the following command:

# newfs /dev/rdsk/c0t3d0s0


devfsadm:
---------

As from Solaris 8:

devfsadm(1M) maintains the /dev and /devices namespaces. It replaces the previous suite of devfs administration tools 
including drvconfig(1M) , disks(1M) , tapes(1M) , ports(1M) , audlinks(1M) , and devlinks(1M) .

The default operation is to attempt to load every driver in the system and attach to all possible device instances. devfsadm then creates 
device special files in /devices and logical links in /dev .

In other words, the devfsadm command is used to dynamically reconfigure system device tables
without having to reboot the system.

Examples:

# devfsadm -i sd
# devfsadm -c tape

In the first example, devfsadm configures only those devices supported by the
sd driver. In the second example, devfsadm configures only tape devices.



29.2 Notes on filesystems on Solaris:
=====================================

There are at least 4 different types of filesystems you can use with Solaris 10 (except for zfs, 
for the older Solaris 8 and 9 versions).
These are:

-- UFS
The traditional filesystem for Solaris systems. UFS is old technology but it is a stable and fast filesystem. 
Sun has continuously tuned and improved the code over the years.
Solaris 10 (and older ofcouse) can only boot from a UFS root filesystem. In the future, 
ZFS boot will be available, as it already is in OpenSolaris. But for now, every Solaris system must have 
at least one UFS filesystem.
Note: This "boot-statement" was true at the time of writing. Maybe you read this way after that time, and maybe
Solaris can now boot from zfs or other filesystem.

-- ZFS
We will talk a bit on ZFS in section 29.3

-- VxFS
The Veritas filesystem and volume manager have their roots in a fault-tolerant proprietary minicomputer 
built by Veritas in the 1980s. They have been available for Solaris since at least 1993 and have been 
ported to AIX and Linux. They are integrated into HP-UX and SCO UNIX, and Veritas Volume Manager code 
has been used (and extensively modified) in Tru64 UNIX and even in Windows. 
VxFS has never been part of Solaris but, when UFS was the only option, it was a popular addition. 
VxVM and VxFS are tightly integrated. Through vxassist, one may shrink and grow filesystems and their 
underlying volumes with minimal trouble. 

VxFS can run in single instance mode or in a parallel access/cluster file system mode. 
This latter mode allows for multiple servers (also known as cluster nodes) to simultaneously access 
the same file system. When run in this mode, VxFS is referred to as VERITAS Cluster File System. 
Cluster File System provides cache coherency and POSIX compliance across nodes, so that data changes 
are atomically seen by all cluster nodes simultaneously. Because Cluster File System shares the same 
binaries and same on-disk layout as single instance VxFS, moving between cluster and single instance mode 
is straightforward.


-- SAM and QFS
QFS is Sun's cluster filesystem, meaning that the same filesystem may be simultaneously mounted 
by multiple systems. SAM is a hierarchical storage manager; it allows a set of disks to be used 
as a cache for a tape library. SAM and QFS are designed to work together, but each may be used separately. 

-- PCFS
It's even possible to use the DOS FAT filesystem.

-- HSFS
Ofcourse, the CDROM HSFS can be used.

Maybe the following list will show you what can be used in Solaris:

Filesystem 	Type 	Device 		Description 
UFS 		Regular Disk 		Unix Fast filesystem; default in Solaris
ZFS 		Regular	Disk		The new Regular FS in Solaris 10 
VxFS 		Regular Disk 		Veritas filesystem 
QFS 		Regular Disk 		QFS filesystem from LSC Inc. 
pcfs 		Regular Disk 		MSDOS FAT and FAT32 filesystem 
hsfs 		Regular Disk 		High Sierra filesystem (CDROM) 
tmpfs 		Regular Memory 		Uses memory and swap 
nfs 		Pseudo 	Network 	Network filesystem 
cachefs 	Pseudo 	filesystem 	Uses a local disk as cache for another NFS filesystem 
autofs 		Pseudo 	filesystem 	Uses a dynamic layout to mount other filesystems 
specfs 		Pseudo 	Device drivers 	filesystem for the /dev devices 
procfs 		Pseudo 	Kernel 		/proc filesystem representing processes 
sockfs 		Pseudo 	Network		Filesystem of socket connections 
fifofs 		Pseudo 	Files 		FIFO filesystem 

If we look at the regular disk based filesystems, the following can be said on the "allocation format":

Filesystem 	Allocation format 
UFS 		Block, allocator tries to allocate sequential blocks 
VxFS 		Extent based 
QFS 		Extent based 
ZFS		Extent based


29.3 Some notes on the ZFS filesystem. Solaris 10 
=================================================


>>> ZFS Pooled Storage:
-----------------------

ZFS uses the concept of storage pools to manage physical storage. Historically, file systems were constructed on top of a single physical device. 
To address multiple devices and provide for data redundancy, the concept of a "logical volume manager", LVM, was introduced to provide for Volume Groups,
and Logical Volumes (which could span multiple disks), and then add a filesystem on such a Logical Volume. This design added another layer 
of complexity and ultimately prevented certain file system advances, because the file system had no control over the physical placement 
of data on the virtualized volumes. 

ZFS eliminates the volume management altogether. Instead of forcing you to create virtualized volumes, ZFS aggregates devices into a storage pool. 
The storage pool describes the physical characteristics of the storage (device layout, data redundancy, and so on,) and acts as an arbitrary data store 
from which file systems can be created. File systems are no longer constrained to individual devices, allowing them to share space with all file systems 
in the pool. You no longer need to predetermine the size of a file system, as file systems grow automatically within the space allocated to the storage pool. 
When new storage is added, all file systems within the pool can immediately use the additional space without additional work. In many ways, 
the storage pool acts as a virtual memory system. When a memory DIMM is added to a system, the operating system doesn't force you to invoke some commands 
to configure the memory and assign it to individual processes. All processes on the system automatically use the additional memory.

Everything you hate about managing file systems and volumes is gone: you don't have to use format, and create slices/partitions, use newfs, mount, edit /etc/vfstab, 
fsck, growfs, metadb, metainit, etc.

Meet your new best friends: zpool and zfs.

ZFS is easy, so let's get on with it! It's time to create your first pool: 

# zpool create tank c1t2d0

You now have a single-disk storage pool named tank, with a single file system mounted at /tank. There is nothing else to do.
Yes, its really true: 
The new ZFS file system, tank, can use as much of the disk space as needed, and is automatically mounted at /tank.

You can determine if your pool was successfully created by using the zpool list command. 

# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
tank                     80G    137K     80G     0%  ONLINE     - 


Suppose we create a file in /tank and want to see how things looks like:
# mkfile 100m /tank/foo
# df -h /tank
Filesystem             size   used  avail capacity  Mounted on
tank                   80G   100M    80G     1%    /tank


If you want mirrored storage for mail and home directories, that's easy too:

Create the pool:

# zpool create tank mirror c1t2d0 c2t2d0

Now lets try to create the "/var/mail" file system:

# zfs create tank/mail
# zfs set mountpoint=/var/mail tank/mail

Create home directories, and mount them all in /export/home/<username>:

# zfs create tank/home
# zfs set mountpoint=/export/home tank/home


At this point, we have "/export/home" present.
Now you could even do this:

# zfs create tank/home/ahrens

ZFS file systems are hierarchical: each one inherits properties from above. In this example, the mountpoint property is inherited 
as a pathname prefix. That is, tank/home/ahrens is automatically mounted at /export/home/ahrens because tank/home is mounted at /export/home. 
You don't have to specify the mountpoint for each individual user - you just tell ZFS the pattern.


>>> Commit and Rollback semantics:
----------------------------------

ZFS uses a commit and rollback mechanism, to ensure that all data is written completely, and if not, everything is rolled back.
You probably know that with former filesystems, that you could choose 
- for a filesystem without journaling (logging)
- or indeed use journaling (or logging).

Now you have a third option: using a transactional filesystem, like zfs.

ZFS is a transactional file system, which means that the file system state is always consistent on disk. Traditional file systems (with no logging) 
overwrite data in place, which means that if the machine loses power, for example, between the time a data block is allocated and 
when it is linked into a directory, the file system will be left in an inconsistent state. Historically, this problem was solved through the use 
of the fsck command. This command was responsible for going through and verifying file system state, making an attempt to repair any inconsistencies 
in the process. This problem sometimes caused great pain to administrators and was never guaranteed to fix all possible problems. 

More recently, file systems have introduced the concept of journaling. The journaling process records action in a separate journal, 
which can then be replayed safely if a system crash occurs. This process introduces unnecessary overhead, because the data needs 
to be written twice, and often results in a new set of problems, such as when the journal can't be replayed properly. 

With a transactional file system, data is managed using copy on write semantics. Data is never overwritten, and any sequence of operations 
is either entirely committed or entirely ignored. This mechanism means that the file system can never be corrupted through accidental 
loss of power or a system crash. So, no need for a fsck equivalent exists. While the most recently written pieces of data might be lost, 
the file system itself will always be consistent. In addition, synchronous data (written using the O_DSYNC flag) is always guaranteed 
to be written before returning, so it is never lost.


>>> Unparalleled Scalability:
-----------------------------

ZFS has been designed from the ground up to be a very scalable file system. The file system itself is 128-bit, allowing for 256 quadrillion zettabytes 
of storage. All metadata is allocated dynamically, so no need exists to pre-allocate inodes or otherwise limit the scalability 
of the file system when it is first created. All the algorithms have been written with scalability in mind. 
Directories can have up to 248 (256 trillion) entries, and no limit exists on the number of file systems or number of files 
that can be contained within a file system.


>>> Some more examples:
-----------------------

-- To give user ahrens a 10G quota:

# zfs set quota=10g tank/home/ahrens

-- To give user bonwick a 100G reservation (membership has its privileges):

# zfs set reservation=100g tank/home/bonwick

-- To automatically NFS-export all home directories read/write:

# zfs set sharenfs=rw tank/home

-- To scrub all disks and verify the integrity of all data in the pool:

# zpool scrub tank

-- To replace a flaky disk:

# zpool replace tank c2t2d0 c4t1d0

-- To add more space:

# zpool add tank mirror c5t1d0 c6t1d0

-- To move your pool from SPARC machine 'sparky' to AMD machine 'amdy':

[on sparky]
    # zpool export tank

Physically move your disks from sparky to amdy.

[on amdy]
    # zpool import tank


-- Determining if Problems Exist in a ZFS Storage Pool

The easiest way to determine if any known problems exist on the system is to use the "zpool status x" command. 
This command describes only pools exhibiting problems. If no bad pools exist on the system, 
then the command displays a simple message, as follows:

# zpool status -x

all pools are healthy

Without the x flag, the command displays the complete status for all pools (or the requested pool, if specified on the command line), 
even if the pools are otherwise healthy. 


-- Understanding zpool status Output
The complete zpool status output looks similar to the following:

# zpool status tank
  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: none requested
 config:

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          mirror     DEGRADED     0     0     0
            c1t0d0   ONLINE       0     0     0
            c1t1d0   OFFLINE      0     0     0

errors: No known data errors


29.4 Some examples on VxFS:
===========================

See section 29.2 for a general description about the filesystems you can use on Solaris.

Example 1:
----------

# mkfs -F vxfs /dev/vx/rdsk/testdg/msvol1 200m
version 4 layout
409600 sectors, 204800 blocks of size 1024, log size 1024 blocks
unlimited inodes, largefiles not supported
204800 data blocks, 203656 free data blocks
7 allocation units of 32768 blocks, 32768 data blocks
last allocation unit has 8192 data blocks

Example 2:
----------

We are going to show how to create a mirroring volume and a stripping volume on Veritas Storage Foundation.
on Solaris 10.

The first step is to check quantity of disks you have available on the server. 
A simple way to check this on solaris is using format utility:

bash-3.00# format

Searching for disks.done

AVAILABLE DISK SELECTIONS:

0. c1t0d0 <DEFAULT cyl 4092 alt 2 hd 128 sec 32>
/pci@0,0/pci15ad,1976@10/sd@0,0

1. c1t1d0 <DEFAULT cyl 7 alt 2 hd 64 sec 32>
/pci@0,0/pci15ad,1976@10/sd@1,0

2. c1t2d0 <DEFAULT cyl 7 alt 2 hd 64 sec 32>
/pci@0,0/pci15ad,1976@10/sd@2,0

3. c1t3d0 <DEFAULT cyl 2 alt 2 hd 64 sec 32>
/pci@0,0/pci15ad,1976@10/sd@3,0

Also, you can check disks available to Veritas Storage Foundation using vxdisk command:

bash-3.00# vxdisk -o alldgs list

DEVICE TYPE DISK GROUP STATUS

c1t0d0s2 auto:none - - online invalid
c1t1d0s2 auto:none - - online invalid
c1t2d0s2 auto:none - - online invalid
c1t3d0s2 auto:none - - online invalid

You can see above that there are 4 disks on the server that are available to Veritas but they have not yet 
been initialized by Veritas (invalid status). To use a disk on Veritas SF you need to initialize this 
using Veritas utilities.

NOTE: If you are going to use a disk on Veritas, pay attention that you should give this whole disk to Veritas. 
Disk will be formatted and you will lose all data in the disk when you are allocating a disk to Veritas Storage.

In this example the only disk that is in use for O.S Solaris is the first one. (c1t0d0s2).

We can use those 3 others disks to add on Veritas Storage.

Caution: If for a mistake we add the first disk (c1t0d0s2) to Veritas Storage, it will format 
the disk and erase Solaris info. We need to pay attention to get the right disks.

Let's start allocating (initializing) those 3 disks to solaris:

# vxdisksetup -i c1t1d0
#
# vxdisksetup -i c1t2d0

# vxdisksetup -i c1t3d0

We have those 3 disks initialized on Veritas, then the next step is to create a Disk Group.

>>> Disk Group

Disk Group is a collection of disks. Disk Group is very useful for management and isolation purpose.
Lets create a DG using only the fist disk initialized on Veritas (c1t1d0). 
We are using DG1 for the name of Disk Group.

# vxdg init DG1 c1t1d0

Check if  DG1 was created successfully:

# vxdg list

NAME STATE ID

DG1 enabled,cds 1218633322.13.vrt2

Also, check if the disk is properly assigned to DG1:

# vxdisk -o alldgs list

DEVICE TYPE DISK GROUP STATUS

c1t0d0s2 auto:none - - online invalid
c1t1d0s2 auto:cdsdisk c1t1d0 DG1 online
c1t2d0s2 auto:cdsdisk - - online
c1t3d0s2 auto:cdsdisk - - online

Let's add more 2 disks to DG1:

# vxdg -g DG1 adddisk c1t2d0s2 c1t3d0s2

Check if the disks are properly assigned to DG1:

# vxdisk -o alldgs list

DEVICE TYPE DISK GROUP STATUS

c1t0d0s2 auto:none - - online invalid
c1t1d0s2 auto:cdsdisk c1t1d0 DG1 online
c1t2d0s2 auto:cdsdisk c1t2d0 DG1 online
c1t3d0s2 auto:cdsdisk c1t3d0 DG1 online

At this point we have added 3 disks into Disk Group DG1. 

Next step we will create 2 different volumes in the DG1.

>>> Volumes

A volume is a virtual storage that is used as an physical disk. Volume can be composed by many disks 
and have many layouts.

In this example, we are going to create two Volumes:

Volume VolS - Stripping layout using c1t1d0 and c1t2d0 disks (RAID 0).
Volume VolM - Mirroring layout using c1t2d0 and c1t3d0 (RAID 1).

-- To create a Stripping Volume VolS (Size=10m):

# vxassist -g DG1 make VolS 10m layout=stripe c1t1d0s2 c1t2d0s2

To check if volume VolS was created successfully:

# vxprint -g DG1

TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0

dg DG1 DG1 - - - - - -

dm c1t1d0 c1t1d0s2 - 159488 - - - -
dm c1t2d0s2 c1t2d0s2 - 159488 - - - -
dm c1t3d0s2 c1t3d0s2 - 159488 - - - -


v VolS fsgen ENABLED 20480 - ACTIVE - -
pl VolS-01 VolS ENABLED 20480 - ACTIVE - -
sd c1t1d0-01 VolS-01 ENABLED 10240 0 - - -
sd c1t2d0s2-01 VolS-01 ENABLED 10240 0 - - -


-- To create a Mirroring Volume VolM (Size=10m):

# vxassist -g DG1 make VolM 10m layout=mirror c1t2d0s2 c1t3d0s2

To check if Volume VolM was created successfully:

# vxprint -g DG1

TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0

dg DG1 DG1 - - - - - -

dm c1t1d0 c1t1d0s2 - 159488 - - - -
dm c1t2d0s2 c1t2d0s2 - 159488 - - - -
dm c1t3d0s2 c1t3d0s2 - 159488 - - - -

v VolM fsgen ENABLED 20480 - ACTIVE - -
pl VolM-01 VolM ENABLED 20480 - ACTIVE - -
sd c1t3d0s2-01 VolM-01 ENABLED 20480 0 - - -

pl VolM-02 VolM ENABLED 20480 - ACTIVE - -
sd c1t2d0s2-02 VolM-02 ENABLED 20480 0 - - -

v VolS fsgen ENABLED 20480 - ACTIVE - -
pl VolS-01 VolS ENABLED 20480 - ACTIVE - -
sd c1t1d0-01 VolS-01 ENABLED 10240 0 - - -
sd c1t2d0s2-01 VolS-01 ENABLED 10240 0 - - -

Note: You can see above that both Volumes were created successfully. Also, you can note the difference 
between stripping and mirroring volume layouts. 

VolM is using two different Plex in differente disks. This means that if you lose one disk (Plex) 
you still have the data in the other disk (other Plex). It is the main configuration of Mirroring Volumes.

VolS is using only one Plex divided in 2 disks. This means that the data will be split in those 2 disks. 
If you lose one disk you would lose the whole Plex, therefore you would lose the data. 
This is the main configuration of Stripping Volumes. It does not provide data protection but it is very useful 
for performance for purpose.

Also, you can add those 2 layouts in only one layout that provide data protection and better performance. 
It is the case of RAID 0 + 1 or RAID 1 + 0.

In the next step we will create 2 different Filesystem using those 2 Volumes.

>>> Filesystem

In this example we will create two filesystem:

- Filesystem fsS will use VolS. It will be mounted at /stripe mount point.
- Filesystem fsM will use VolM. It will be mounted at /mirror mount point.

To create a VxFS filesystem:

# mkfs -F vxfs /dev/vx/rdsk/DG1/VolS

version 7 layout

20480 sectors, 10240 blocks of size 1024, log size 1024 blocks
largefiles supported

# mkfs -F vxfs /dev/vx/rdsk/DG1/VolM

version 7 layout

20480 sectors, 10240 blocks of size 1024, log size 1024 blocks

largefiles supported

To mount a VxFS filesystem:

# mount -F vxfs /dev/vx/dsk/DG1/VolS /stripe/
# mount -F vxfs /dev/vx/dsk/DG1/VolM /mirror/

Now there are 2 filesystems configured and you can use it at Solaris Mount Point level.

Any data written in /stripe directory will be written in the stripping VolS volume.
Any data written in /mirror directory will be written in the mirroring VolM volume.


Example 3:
----------

Rather than mess with vxmake  you can employ vxassist to do all the dirty work. If you have any amount of experience with vxassist 
you'll know that the more information you can supply to vxassist the better the end product will be. 

I'm going to use vxassist to build a stripe-pro volume from four disks and I want the volume to be 1G in size:

# vxassist -g testdg make stripeprovol 1g  layout=stripe-mirror \
			testdg01 testdg02 testdg03 testdg04


Pretty kool, huh? Quick, efficient, and poorly named; everything you love about vxassist. I can then go a bit further 
and explore my sizing options to see how much I can grow my new volume if I need to:

# vxassist -g testdg maxgrow stripeprovol

Volume stripeprovol can be extended by 282050560 to 284147712 (138744Mb)

See? Just like a normal volume. Now comes the beauty part. When you look at that seemingly unmanageable mess of objects above 
does it really make you want to tear it apart and work on it like you might other "normal" volumes? Probably not. And you'd be wise 
to feel that way, there are just too many places to get confused or make a mistake when real data is involved. What if you could get back 
to a more normal point of view? Luckily you can, check this out:

# vxassist -g testdg convert stripeprovol layout=mirror-stripe


Veritas terminology:

In a "typical" RAID0+1 volume configuration, we take several disks and then create a stripe across thoughs disks (the RAID0 part). 
Then once complete we do this again on a separate set of disks, and then attach that new stripe to the first creating a mirror (the +1 part). 
We then have a RAID0+1 volume thats ready to have a filesystem put on it. The point of interest with this setup is that we're actually 
mirroring a complete stripe (and therefore ALL the disks in that stripe) to another stripe (and therefore ALL of it's disks). 
The problem here is that if for some reason we need to re-sync the volume we'd need to re-sync a full stripe to a full stripe (very timely) 
which is a nearly tragic proposition if your talking about 50G+. A far more efficient setup would be to mirror each disk to each disk... 
in other words, to mirror a bunch of disks on a one-to-one basis, and then build a stripe on top of these mirrors. In this case if we need 
to re-sync due to a disk failure we can simply sync the failed disk to its mirror, instead of the full stripe. This is the power of RAID1+0; 
the difference between mirroring the stripes (0+1) and stripping the mirrors (1+0).

If the terms seem to confuse you, try this for size:

RAID0	Striping (VxVM says: stripe)
RAID1	Mirroring (VxVM says: mirror)
RAID0+1 Striping plus Mirroring (VxVM says: mirror-stripe)
	Think this: Striped disks, then mirror the stripes
RAID1+0 Mirroring plus Striping (VxVM says: stripe-mirror) 
	(Veritas Marketing Dept says: StripePro
	Think this: Mirrored disks, then stripe on top of the mirrors
Concat+Mirror	Concatenation plus Mirroring (VxVM says: mirror)
		Same as RAID1
Mirror+Concat	Mirroring plus Concatenation (VxVM says: concat-mirror)
		(Veritas Marketing Dept says: ConcatPro)
		Think this: Concatenation on top of mirrored disks.


Veritas Default diskgroup: rootdg

Default rootdg disk group. 
 Block Device Node /dev/vx/dsk/volume_name 
 Raw Device Node /dev/vx/rdsk/volume_name 
Other DiskGroups 
 Block Device Node /dev/vx/dsk/diskgroup_name/volume_name 
 Raw Device Node /dev/vx/rdsk/diskgroup_name/volume_name 
 

Example 4:
----------

Some more examples:

Create Veritas layout on a disk:
	vxdisksetup -i c1t10d0

Create a disk group on a new disk:
	vxdg init <dg name> <media name>=c1t10d0

Add disk to an existing disk group:
	vxdg -g <dg name> adddisk <media name>=c2t0d0
 	replace addisk with rmdisk to remove a disk

Set up a preferred reading plex, this can be useful if we have a sparse plex (plex in RAM):
	vxvol -g <group> rdpol prefer <volname> <plexname>
	instead of prefer we can have round or sdeet

View configuration:
	vxprint -th
List disks:
	vxdisk list
	vxdisk -o alldgs list (shows deported disks)

Adding disks while solaris is running:
	drvconfig	(This probes scsi - Solaris)
	disks		(Creates links in /dev - Solaris)
	prtvtoc		(View the vtoc - Solaris)
	vxdctl enable	(Rescan for disks - Veritas)
	vxdisk list	(Shows the disk in error as they are not initalized jet)
	vxdisksetup  	(init the disks)

To encapsulate use:
 	vxencap -g <discgroup> <devicename>

Export a disk group:
	vxdg deport <dg name>
	vxdg -h <hostame> deport <dgname> to export to another host

Import a disk group:
	vxdg import <dg name>
	vxdg -C to clear hostid of old host (When failing over in DR situation)
	vxdg -fC to clear hostid of old host and forcing diskgroup online

Destroy a disk group:
	vxdg destroy <disk group>

Evacuate data from a disk:
	vxevac -g <dg name> <from disk> <to disks>

Create a volume on a diskgroup:
	vxassist -g <dg name> make <volname> <size> layou=stripe
	ncols=number of colums stripeunit=size

Create a veritas filesystem on this volume:
        mkfs -F vxfs /dev/vx/rdsk/<disk group>/<volume> <size>

Delete a volume	same as creatiuon but replace make with remove

Resize a filesystem:
        vxresize -g <disk group> -F <fstype> <volume> <size>

If Veritas is ever causing you problems, do the following:
	Touch /etc/vx/reconfig.d/state.d/install-db
	edit /etc/system and modify /etc/vfstab 
	to disable VRTS to start up and access the old root
	partitions


vxassist make martin 100m
makes a volume called martin using any disk

vxassist make martin 100m disk10
makes a volume called martin using disk10

vxassist make martin 100m layout=stripe disk07 disk08
creates a 100mb striped volume called martin using disks7 and 8

vxassist mirror martin disk05 disk06
uses disks5 and 6 ro make a mirror on volume called martin

vxassist make martin 50m layout=mirror
makes a 50Mb mirror using any 2 disks

vxassist make martin 50m layout=mirror disk05 disk06
makes a 50mb mirror using disks 5 and 6

vxassist make martin 50m layout=mirror,stripe disk05 disk06 disk07 
disk08
makes a 50Mb stripe using disks5 and 6 mirrored across 7 and 8

vxassist make martin 50m layout=mirror,stripe,log disk05 disk06 disk07 
disk08
makes a 50Mb stripe using disks5 and 6 mirrored across 7 and 8 and uses 
a 
log subdisk

vxassist make martin 100m layout=raid5
makes a 100m raid5 volume

/usr/sbin/vxedit -g rootdg rename disk12 disk09 
to rename disk12 to disk09 in the rootdg

vxedit rm disk10 
to remove a greyed out or obsolete disk in this case disk10
or to remove a disk from a diskgroup

vxdisk list - to list all disks under vmcontrol 

vxdisk clearimport c#t#d#s#
to allow a disk to be imported after a server crash

vxdg -g razadg rmdisk test
to remove a disk called test from a dg called razadg

vxdg -g razadg adddisk test=c1t3d3  
to add disk c1t3d3 to a dg called razadg calling the disk test, use 
vxdisk list
to determine what disks are free :)

vxedit -g rootdg set spare=on disk09
sets disk09 in the rootdg as a hotspare.


vxmirror rootdisk disk01
mirrors all the volumes on the root disk to disk01

vxassist -g rootdg mirror vol01 disk03
mirrors vol01 (in rootdg) to disk03


vxassist mirror martin

will mirror the volume martin


to make a mirror manually try

 /usr/sbin/vxmake -g rootdg sd disk03-01 dm_name=disk03 dm_offset=0 
 len=81920 
 to create a subdisk on disk03 callin the subdisk disk03-01 the len 
 81920 is
 81920sectors x 512bytes =40M 

 vxmake plex martin-02 sd=disk03-01
 creates a plex called martin-02 using subdisk disk03-01

 vxplex att martin martin-02
 attaches the plex martin-02 to volume martin

 to list all volumes on your primary boot disk enter
 vxprint -t -v -e 'aslist.aslist.sd_disk="boot_disk_name"'


 vxsd mv disk03-01 disk05-01
 moves the contents of subdisk disk03-01 to disk05-01
 then moves  subdisk disk05-01 into the plex where subdisk disk03-01
 once lived, leaving disk03-01 to your mercy :)


 to make a subdisk

 vxmake sd disk02-02 disk02,0,8000
 this would create a subdisk called disk02-02 at the start of disk02
 and would be 8000blocks (4000k) long.
 if you wanted to create another subdisk on this disk the offset would 
 be
 8000 as this is where the next free space would be onthe disk so...
 vxmake sd disk02-02 disk02,8000,8000 would create another 8000block
 subdisk.


 vxdisk rm c#t#d#s2
 to remove a disk so it's out of vm control

 vxdiskadd c#t#d#
 to add bring a new disk under vm control

 or you can try...
 vxdisksetup -i c#t#d#  

 vxvol -g dg volname stop
 this stops a volume

 vxedit -rf rm martin
 removes a volume called martin and plex(es) and subdisks though

 vxprint -ht volume




================
30. AIX devices:
================

In AIX 5.x, the device configuration information is stored in the ODM repository. The corresponding files
are in 

/etc/objrepos
/usr/lib/objrepos
/usr/share/lib/objrepos


There are 2 sections in ODM:
- predefined: all of the devices in principle supported by the OS
- customized: all devices already configured in the system

Every device in ODM has a unique definition that is provided by 3 attributes:

1. Type
2. Class
3. Subclass


Information thats stored in the ODM:

- PdDv,PdAt, PdCn   :  Predefined device information
- CuDv, CuAt, CuDep :  Customized device information
- lpp, inventory    :  Software vital product data
- smit menu's
- Error log, alog, and dump information
- System Resource Controller: SRCsubsys, SRCsubsrv
- NIM: nim_attr, nim_object, nim_pdattr


There are commands, representing an interface to ODM, so you can add, retrieve, drop and change objects.
The following commands can be used with ODM:

odmadd, 
odmdrop, 
odmshow, 
odmdelete, 
odmcreate, 
odmchange

Examples:

# odmget -q "type LIKE lv*" PdDv
# odmget -q name=hdisk0 CuAt


Logical devices and physical devices:
-------------------------------------

AIX includes both logical devices and physical devices in the ODM device configuration database.
Logical devices include Volume Groups, Logical Volumes, network interfaces and so on.
Physical devices are adapters, modems etc..


Most devices are selfconfiguring devices, only serial devices (modems, printers) are not selfconfigurable.

The command that configures devices is "cfgmgr", the "configuration manager".
When run, it compares the information from the device with the predefined section in ODM.
If it finds a match, then it creates the entries in the customized section in ODM.

The configuration manager runs every time the system is restarted.

If you have installed an adapter for example, and you have put the software in a directory
like /usr/sys/inst.images, you can call cfgmgr to install device drivers as well with

# cfgmgr -i /usr/sys/inst.images

$$
09-08-00-1,0
u5971-t1-l1-l0


Device information:
-------------------

The most important AIX command to show device info is "lsdev". This command queries the ODM, so we can use
it to locate the customized or the predifined devices.

The main commands in AIX to get device information are:
- lsdev  : queries ODM
- lsattr : gets specific configuration attributes of a device
- lscfg  : gets vendor name, serial number, type, model etc.. of the device

lsdev also shows the status of a device as Available (that is configured) or as Defined (that is predefined).


lsdev examples:
---------------

If you need to see disk or other devices, defined or available, you can use the lsdev command
as in the following examples:

# lsdev -Cc tape
rmt0  Available  10-60-00-5,0  SCSI 8mm Tape Drive

# lsdev -Cc disk
hdisk0 Available 20-60-00-8,0    16 Bit LVD SCSI Disk Drive
hdisk1 Available 20-60-00-9,0    16 Bit LVD SCSI Disk Drive
hdisk2 Available 20-60-00-10,0   16 Bit LVD SCSI Disk Drive
hdisk3 Available 20-60-00-11,0   16 Bit LVD SCSI Disk Drive
hdisk4 Available 20-60-00-13,0   16 Bit LVD SCSI Disk Drive

Note: -C queries the Customized section of ODM, -P queries the Predefined section of ODM.

Example if some of the disks are on a SAN (through FC adapters):

# lsdev -Cc disk
hdisk0 Available          Virtual SCSI Disk Drive
hdisk1 Available          Virtual SCSI Disk Drive
hdisk2 Available 02-08-02 SAN Volume Controller MPIO Device  (through FC adapter)
hdisk3 Available 02-08-02 SAN Volume Controller MPIO Device  (through FC adapter)

# lsattr -El hdisk2
PCM             PCM/friend/sddpcm                                   PCM                                     True
PR_key_value    none                                                Reserve Key                             True
algorithm       load_balance                                        Algorithm                               True
dist_err_pcnt   0                                                   Distributed Error Percentage            True
dist_tw_width   50                                                  Distributed Error Sample Time           True
hcheck_interval 20                                                  Health Check Interval                   True
hcheck_mode     nonactive                                           Health Check Mode                       True
location                                                            Location Label                          True
lun_id          0x0                                                 Logical Unit Number ID                  False
lun_reset_spt   yes                                                 Support SCSI LUN reset                  True
max_transfer    0x40000                                             Maximum TRANSFER Size                   True
node_name       0x50050768010029c8                                  FC Node Name                            False
pvid            00cb5b9e66cc16470000000000000000                    Physical volume identifier              False
q_type          simple                                              Queuing TYPE                            True
qfull_dly       20                                                  delay in seconds for SCSI TASK SET FULL True
queue_depth     20                                                  Queue DEPTH                             True
reserve_policy  no_reserve                                          Reserve Policy                          True
rw_timeout      60                                                  READ/WRITE time out value               True
scbsy_dly       20                                                  delay in seconds for SCSI BUSY          True
scsi_id         0x611013                                            SCSI ID                                 False
start_timeout   180                                                 START unit time out value               True
unique_id       33213600507680190014E30000000000001E204214503IBMfcp Device Unique Identification            False
ww_name         0x50050768014029c8                                  FC World Wide Name                      False



lsdev [ -C ][ -c Class ] [ -s Subclass ] [ -t Type ] [ -f File ] [ -F Format |
-r ColumnName ] [ -h ] [ -H ] [ -l { Name | - } ] [ -p Parent ] [ -S State ]

lsdev -P [ -c Class ] [ -s Subclass ] [ -t Type ] [ -f File ] [ -F Format | -r
ColumnName ] [ -h ] [ -H ]

Remark:

For local attached SCSI devices, the general format of the LOCATION code "AB-CD-EF-GH" is actually "AB-CD-EF-G,H" , 
the first three sections are the same and for the GH section, the G is de SCSI ID and the H is the LUN. 
For adapters, only the AB-CD is mentioned in the location code.

A location code is a representation of the path to the device, from drawer, slot, connector and port.

- For an adapter it is sufficient to have the codes of the drawer and slot to identify
  the adapter. The location code of an adapter takes the form of AB-CD.

- Other devices needs more specification, like a specific disk on a specific SCSI bus.
  For other devices the format is AB-CD-EF-GH. 
  The AB-CD part then indicates the adapter the device is connected on.

- For SCSI devices we have a location code like AB-CD-EF-S,L where the S,L fields identifies
  the SCSI ID and LUN of the device.


To lists all devices in the Predefined object class with column headers, use
# lsdev -P -H

To list the adapters that are in the Available state in the Customized Devices object class, use
# lsdev -C -c adapter -S 


lsattr examples:
----------------

This command gets the current attributes (-E flag) for a tape drive: 

# lsattr -El rmt0
mode           yes     Use DEVICE BUFFERS during writes    True
block_size     1024    Block size (0=variable length)      True
extfm          no      Use EXTENDED file marks             True
ret            no      RETENSION on tape change or reset   True
..
..

(Ofcourse, the equivalent for the above command is for example # lsattr -l rmt0 -E )

To list the default values for that tape device (-D flag), use
# lsattr -l -D rmt0


This command gets the attributes for a network adapter:

# lsattr -E -l ent1
busmem     0x3cfec00     Bus memory address     False
busintr    7             Bus interrupt level    False
..
..

To list only a certain attribute (-a flag), use the command as in the following example:

# lsattr -l -E scsi0 -a bus_intr_lvl 
bus_intr_lvl 14 Bus interrupt level False

# lsattr -El tty0 -a speed
speed 9600 BAUD rate true


You must specify one of the following flags with the lsattr command: 
-D  Displays default values.  
-E  Displays effective values (valid only for customized devices specified with the -l flag).  
-F  Format  Specifies the user-defined format.  
-R  Displays the range of legal values.  
-a  Displays for that attribute


lscfg examples:
---------------

Example 1:

This command gets the Vital Product Data for the tape drive rmt0:

# lscfg -vl rmt0
Manufacturer...............EXABYTE
Machine Type and Model.....IBM-20GB
Device Specific(Z1)........38zA
Serial Number..............60089837
..
..

-l Name Displays device information for the named device.

-p Displays the platform-specific device information. This flag only applies to
   AIX 4.2.1 or later.

-v Displays the VPD found in the Customized VPD object class. Also, on AIX 4.2.1
   or later, displays platform specific VPD when used with the -p flag.

-s Displays the device description on a separate line from the name and
   location.


# lscfg -vp | grep -p 'Platform Firmware:'

# lscfg -vp | grep -p Platform

sample output:

Platform Firmware:
ROM Level.(alterable).......3R040602
Version.....................RS6K
System Info Specific.(YL)...U1.18-P1-H2/Y2
Physical Location: U1.18-P1-H2/Y2
The ROM Level denotes the firmware/microcode level
Platform Firmware:
ROM Level ............. RH020930
Version ................RS6K
.. 


Example 2:

The following command shows details about the Fiber Channel cards:

# lscfg -vl fcs*          (fcs0 for example, is the parent of fsci0)



Adding a device:
----------------

Adding a device with cfmgr:
---------------------------

To add a device you can run cfgmgr, or shutdown the system, attach the new device and boot the system.
There are also many smitty screens to accomplish the task of adding a new device.


Adding a device with mkdev:
---------------------------

Also the mkdev command can be used as in the following example:

# mkdev -c tape -s scsi -t scsd -p scsi0 -w 5,0

where

-c    Class of the device
-s    Subclass of the device
-t    Type of the device. This is a specific attribute for the device 
-p    The parent adapter of the device. You have to specify the logical name.
-w    You have to know the SCSI ID that you are goiing to assign to the new device.
      If it's non SCSI, you have to know the port number on the adapter.
-a    Specifies the device attribute-value pair


The mkdev command also creates the ODM entries for the device and loads the device driver.

The following command configures a new disk and ensures that it is available as a physical volume.
This example adds a 2.2GB disk with a scsi ID of 6 and a LUN of 0 to the scsi3 SCSI bus.

# mkdev -c disk -s scsi -t 2200mb -p scsi3 -w 6,0 -a pv=yes

This example adds a terminal:

# mkdev -c tty -t tty -s rd232 -p sa1 -w 0 -a login=enable -a term=ibm3151
tty0 Available


Changing a device with chdev:
-----------------------------

Suppose you have just added a new disk. Suppose the cfgmgr has run and detected the disk.

Now you run
# lspv
hdisk1    none                 none
OR
hdisk1    0005264d2            none

The first field identifies the system-assigned name of the disk. The second field displays the
"physical volume id" PVID. If that is not shown, you can use chdev:

# chdev -l hdisk2 -a pv=yes


Removing a device with rmdev:
-----------------------------

Examples:

# lsdev -Cc tape
rmt0  Available  10-60-00-5,0  SCSI 8mm Tape Drive

# rmdev -l rmt0               # -l indicates using the logical device name
rmt0 Defined

The status have shifted from Available to Defined.

# lsdev -Cc tape
rmt0  Defined  10-60-00-5,0  SCSI 8mm Tape Drive

If you really want to remove it from the system, use the -d flag as well

# rmdev -l rmt0 -d

To unconfigure the childeren of PCI bus pci1 and all devices under them, while retaining their
device definition in the Customized Devices Object Class. 

# rmdev -p pci1
rmt0 Defined
hdisk1 Defined
scsi1 Defined
ent0 Defined



The special device sys0:
------------------------

In AIX 5.x we have a special device named sys0 that is used to manage some kernel parameters.
The way to change these values is by using smitty, the chdev command or WSM.

Example.

To change the maxusersprocesses parameter, you can for example use the Web-based System Manager.
You can also use the chdev command:

#chdev -l sys0 -a maxuproc=50
sys0 changed

Note: In Solaris, to change kernel parameters, you have to edit /etc/system.

Device drivers:
---------------

Device drivers are located in /usr/lib/drivers directory.



============================
31. filesystem commands AIX:
============================


31.1 The Logical Volume Manager LVM:
====================================

In AIX, it's common to use a Logical Volume Manager LVM to cross the boundaries posed by
traditional disk management.
Traditionally, a filesystem was on a single disk or on a single partition.
Changing a partionion size was a difficult task. With a LVM, we can create logical volumes
which can span several disks.

The LVM has been a feature of the AIX operating system since version 3, and it is installed 
automatically with the Operating System.

LVM commands in AIX:
--------------------

mkvg  (or the mkvg4vp command in case of SAN vpath disks. See section 31.3)
cplv
rmlv
mklvcopy
extendvg
reducevg
getlvcb
lspv
lslv
lsvg
mirrorvg
chpv
migratepv
exportvg, importvg
varyonvg, varyoffvg

And related commands:
mkdev
chdev
rmdev
lsdev

Volume group:
-------------

What a physical disk is, or a physical volume is, is evident. When you add a physical volume to a volume group,
the physical volume is partitioned into contiguous equal-sized units of space called "physical partitions".
A physical partition is the smallest unit of storage space allocation and is a contiguous space
on a physical volume.
The physical volume must now become part of a volume group. The disk must be in a available state
and must have a "physical volume id" assigned to it.

A volume group (VG) is an entity consisting of 1 to 32 physical volumes (of varying sizes and types). 
A "Big volume group" kan scale up to 128 devices.

You create a volume group with the "mkvg" command. You add a physical volume to an existing volume group with
the "extendvg" command, you make use of the changed size of a physical volume with the "chvg" command,
and remove a physical volume from a volume group with the "reducevg" command.
Some of the other commands that you use on volume groups include:
list (lsvg), remove (exportvg), install (importvg), reorganize (reorgvg), synchronize (syncvg),
make available for use (varyonvg), and make unavailable for use (varyoffvg).

To create a VG, using local disks, use the "mkvg" command:

mkvg -y <name_of_volume_group> -s <partition_size> <list_of_hard_disks>

Typical example:

mkvg -y oravg -s 64 hdisk3 hdisk4

mkvg -y appsvg -s 32 hdisk2
mkvg -y datavg -s 64 hdisk3

mkvg -y appsvg -s 32 hdisk3
mkvg -y datavg -s 32 hdisk2
mkvg -y vge1corrap01 -s 64 hdisk2


In case you use the socalled SDD subsystem with vpath SAN storage, you should use the "mkvg4vp" command,
which works similar (same flags) as the mkvg command.



Types of VG's:
==============

There are 3 kinds of VG's:

- Normal VG (AIX 5L)
- Big VG (AIX 5L)
- Scalable VG (as from AIX 5.3)

Normal VG:
----------

Number of disks		Max number of partitions/disk
1			32512
2			16256
4			8128
8			4064
16			2032
32			1016

Big VG:
-------
Number of disks		Max number of partitions/disk
1			130048
2			65024
4			32512
8			16256
16			8128
32			4064
64			2032
128			1016


VG Type		Max PV's	Max LV's	Max PP's per VG
---------------------------------------------------------------
Normal		32		256		32512
Big		128		512		130048
Scalable	1024		4096		2097152


Physical Partition:
===================

You can change the NUMBER of PPs in a VG, but you cannot change the SIZE of PPs afterwards.
Defaults:
- 4 MB partition size. It can be a multiple of that amount. The Max size is 1024 MB
- The default is 1016 PPs per disk. You can increase the number of PPs in powers of 2 per PV, but the number
  of maximum disks per VG is decreased. 

#disks   max # of PPs / disk
32       1016
16       2032
8        4064
4        8128
2       16256
1       32512


In the case of a set of "normal" internal disks of, for example, 30G or 70G or so,
common partition sizes are 64M or 128M.


Logical Partition:
------------------

A LP maps to (at least) one PP, and is actually the smallest unit of allocatable space.


Logical Volume:
---------------

Consists of LPs in a VG. A LV consists of LPs from actual PPs from one or more disks.


   |-----|               | ----|
   |LP1  |      --->     | PP1 | 
   |-----|               | ----|
   |LP2  |      --->     | PP2 |
   |-----|               | ----|
   |..   |                hdisk 1 (Physical Volume 1)
   |..   |
   |..   |
   |-----|               |---- |
   |LPn  |      --->     |PPn  |
   |-----|               |---- |
   |LPn+1|      --->     |PPn+1|
   |-----|               |---- |
   Logical Volume      hdisk2 (Physical Volume 2)


So, a VG is a collection of related PVs, but you know that actually LVs are created in the VG.
For the applications, the LVs are the entities they work with.
In AIX, a filesystem like "/data", corresponds to a LV.


lspv Command
------------

Purpose: Displays information about a physical volume within a volume group.

lspv [ -L ] [ -l | -p | -M ] [ -n DescriptorPhysicalVolume] [ -v VolumeGroupID] PhysicalVolume

-p: lists range, state, region, LV names, type and mount points


# lspv
# lspv hdisk3
# lspv -p hdisk3


# lspv
hdisk0   00453267554   rootvg
hdisk1   00465249766   rootvg

# lspv hdisk23
PHYSICAL VOLUME:    hdisk23                  VOLUME GROUP:     oravg
PV IDENTIFIER:      00ccf45d564cfec0 VG IDENTIFIER     00ccf45d00004c0000000104564d2386
PV STATE:           active
STALE PARTITIONS:   0                        ALLOCATABLE:      yes
PP SIZE:            256 megabyte(s)          LOGICAL VOLUMES:  3
TOTAL PPs:          947 (242432 megabytes)   VG DESCRIPTORS:   1
FREE PPs:           247 (63232 megabytes)    HOT SPARE:        no
USED PPs:           700 (179200 megabytes)
FREE DISTRIBUTION:  00..00..00..57..190
USED DISTRIBUTION:  190..189..189..132..00


# lspv -p hdisk23
hdisk23:
PP RANGE  STATE   REGION        LV NAME             TYPE       MOUNT POINT
  1-22    used    outer edge    u01                 jfs2       /u01
 23-190   used    outer edge    u02                 jfs2       /u02
191-379   used    outer middle  u01                 jfs2       /u01
380-568   used    center        u01                 jfs2       /u01
569-600   used    inner middle  u02                 jfs2       /u02
601-700   used    inner middle  u03                 jfs2       /u03
701-757   free    inner middle
758-947   free    inner edge

# lspv -p hdisk0
hdisk0:
PP RANGE  STATE   REGION        LV NAME             TYPE       MOUNT POINT
1-1       used    outer edge    hd5                 boot       N/A
2-48      free    outer edge       
49-51     used    outer edge    hd9var              jfs        /var
52-52     used    outer edge    hd2                 jfs        /usr
53-108    used    outer edge    hd6                 paging     N/A
109-116   used    outer middle  hd6                 paging     N/A
117-215   used    outer middel  hd2                 jfs        /usr
216-216   used    center        hd8                 jfslog     N/A
217-217   used    center        hd4                 jfs        /
218-222   used    center        hd2                 jfs        /usr
223-320   used    center        hd4                 jfs        /
..
..

Note that in this example the Logical Volumes corresponds to the filesystems in the
following way: 
hd4= /, hd5=boot, hd6=paging, hd2=/usr, hd3=/tmp, hd9var=/var


lslv Command
------------
Purpose: Displays information about a logical volume.


To Display Logical Volume Information
lslv [ -L ] [ -l| -m ] [ -nPhysicalVolume ] LogicalVolume

To Display Logical Volume Allocation Map
lslv [ -L ] [ -nPhysicalVolume ] -pPhysicalVolume [ LogicalVolume ]


# lslv -l lv06
lv06:/backups
PV                COPIES        IN BAND       DISTRIBUTION
hdisk3            512:000:000   100%          000:218:218:076:000


# lslv lv06
LOGICAL VOLUME:     lv06                   VOLUME GROUP:   backupvg
LV IDENTIFIER:      00c8132e00004c0000000106ef70cec2.2 PERMISSION:     read/write
VG STATE:           active/complete        LV STATE:       opened/syncd
TYPE:               jfs                    WRITE VERIFY:   off
MAX LPs:            512                    PP SIZE:        64 megabyte(s)
COPIES:             1                      SCHED POLICY:   parallel
LPs:                512                    PPs:            512
STALE PPs:          0                      BB POLICY:      relocatable
INTER-POLICY:       minimum                RELOCATABLE:    yes
INTRA-POLICY:       middle                 UPPER BOUND:    32
MOUNT POINT:        /backups               LABEL:          /backups
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?:     NO

# lslv -p hdisk3
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE       1-10
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE      11-20
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE      21-30
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE      31-40
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE      41-50
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE      51-60
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE      61-70
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE      71-80
FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE   FREE      81-90
..
..



Also, you can list LVs per VG by running, for example:

# lsvg -l backupvg
backupvg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
loglv02             jfslog     1     1     1    open/syncd    N/A
lv06                jfs        512   512   1    open/syncd    /backups

# lsvg -l splvg
splvg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
loglv01             jfslog     1     1     1    open/syncd    N/A
lv04                jfs        240   240   1    open/syncd    /data
lv00                jfs        384   384   1    open/syncd    /spl
lv07                jfs        256   256   1    open/syncd    /apps

For a complete storage system, this could yield in for example:

-redovg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
redo1lv             jfs2       42    42    3    open/syncd    /u05
redo2lv             jfs2       1401  1401  3    open/syncd    /u04
loglv03             jfs2log    1     1     1    open/syncd    N/A
-db2vg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
db2lv               jfs2       600   600   2    open/syncd    /db2_database
loglv00             jfs2log    1     1     1    open/syncd    N/A
-oravg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
u01                 jfs2       800   800   2    open/syncd    /u01
u02                 jfs2       400   400   2    open/syncd    /u02
u03                 jfs2       200   200   2    open/syncd    /u03
logfs               jfs2log    2     2     1    open/syncd    N/A
-rootvg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
hd5                 boot       1     2     2    closed/syncd  N/A
hd6                 paging     36    72    2    open/syncd    N/A
hd8                 jfs2log    1     2     2    open/syncd    N/A
hd4                 jfs2       8     16    3    open/syncd    /
hd2                 jfs2       24    48    2    open/syncd    /usr
hd9var              jfs2       9     18    3    open/syncd    /var
hd3                 jfs2       11    22    3    open/syncd    /tmp
hd1                 jfs2       10    20    2    open/syncd    /home
hd10opt             jfs2       2     4     2    open/syncd    /opt
fslv00              jfs2       1     2     2    open/syncd    /XmRec
fslv01              jfs2       2     4     3    open/syncd    /tmp/m2
paging00            paging     32    32    1    open/syncd    N/A
sysdump1            sysdump    80    80    1    open/syncd    N/A
oralv               jfs2       100   100   1    open/syncd    /opt/app/oracle
fslv03              jfs2       63    63    2    open/syncd    /bmc_home


And you can list the LVs by PV by running
# lspv -l hdiskn


lsvg Command:
-------------

-o          Shows only the active volume groups.
-p VG_name  Shows all the PVs that belong to the vg_name
-l VG_name  Shows all the LVs that belong to the vg_name


Examples:

# lsvg
rootvg
informixvg
oravg

# lsvg -o
rootvg
oravg

# lsvg oravg
VOLUME GROUP:   oravg                    VG IDENTIFIER:  00ccf45d00004c0000000104564d2386
VG STATE:       active                   PP SIZE:        256 megabyte(s)
VG PERMISSION:  read/write               TOTAL PPs:      1894 (484864 megabytes)
MAX LVs:        256                      FREE PPs:       492 (125952 megabytes)
LVs:            4                        USED PPs:       1402 (358912 megabytes)
OPEN LVs:       4                        QUORUM:         2
TOTAL PVs:      2                        VG DESCRIPTORS: 3
STALE PVs:      0                        STALE PPs:      0
ACTIVE PVs:     2                        AUTO ON:        yes
MAX PPs per PV: 1016                     MAX PVs:        32
LTG size:       128 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:      no                       BB POLICY:      relocatable

# lsvg -p informixvg
informixvg
PV_NAME       PV STATE     TOTAL PPs     FREE PPs     FREE DISTRIBUTION
hdisk3        active       542           462          109..28..108..108..109
hdisk4        active       542           447          109..13..108..108..109

# lsvg -l rootvg
LV NAME       TYPE         LPs    PPs    PVs     LV STATE      MOUNT POINT
hd5           boot         1      1      1       closed/syncd  N/A
hd6           paging       24     24     1       open/syncd    N/A
hd8           jfslog       1      1      1       open/syncd    N/A
hd4           jfs          4      4      1       open/synced   /
hd2           jfs          76     76     1       open/synced   /usr
hd9var        jfs          4      4      1       open/synced   /var
hd3           jfs          6      6      1       open/synced   /tmp
paging00      paging       20     20     1       open/synced   N/A
..
..

Suppose we have 70GB disk=70000MB
1016 partitions=> 63 MB per PP


extendvg command:
-----------------

extendvg VGName hdiskNumber

# extendvg newvg hdisk23

How to Add a Disk to a Volume Group? 

extendvg   VolumeGroupName   hdisk0 hdisk1 ... hdiskn 


reducevg command:
-----------------

To remove a PV from a VG:

# reducevg myvg hdisk23

To remove a VG:

Suppose we have a VG informixvg with 2 PV, hdisk3 and hdisk4:

# reducevg -d informixvg hdisk4

When you delete the last disk from the VG, the VG is also removed.

# reducevg -d informix hdisk3


varyonvg and varyoffvg commands:
--------------------------------

When you activate a VG for use, all its resident filesystems are mounted by default if they have
the flag mount=true in the /etc/filesystems file.

# varyonvg apachevg

# varyoffvg apachevg

To use this command, you must be sure that none of the logical volumes are opened, that is, in use.


mkvg command:
-------------

You can create a new VG by using "smitty mkvg" or by using the mkvg command.

Use the following command, where s "partition_size" sets the number of megabytes in each physical partition 
where the partition_size is expressed in units of megabytes from 1 through 1024. The size variable must 
be equal to a power of 2 (for example 1, 2, 4, 8). The default value is 4.

mkvg -y <name_of_volume_group> -s <partition_size> <list_of_hard_disks>

As with physical volumes, volume groups can be created and removed and their characteristics
can be modified.

Before a new volume group can be added to the system, one or more physical volumes not used
in other volume groups, and in an available state, must exist on the system.

The following example shows the use of the mkvg command to create a volume group myvg
using the physical volumes hdisk1 and hdisk5.

# mkvg -y myvg -d 10 -s 8 hdisk1 hdisk5

# mkvg -y oravg -d 10 -s 64 hdisk1



mklv command:
-------------

To create a LV, you can use the smitty command "smitty mklv" or just use the mklv command
by itself.

The mklv command creates a new logical volume within the VolumeGroup. For example, all file systems 
must be on separate logical volumes. The mklv command allocates the number of logical partitions 
to the new logical volume. If you specify one or more physical volumes with the PhysicalVolume parameter, 
only those physical volumes are available for allocating physical partitions; otherwise, all the 
physical volumes within the volume group are available. 

The default settings provide the most commonly used characteristics, but use flags to tailor the logical volume 
to the requirements of your system. Once a logical volume is created, its characteristics can be changed 
with the chlv command. 

When you create a LV, you also specify the number of LP's, and how a LP maps to PP's. 
Later, you can create one filesystem per LV.

Examples

The following example creates a LV "lv05" on the VG "splvg", with two copies (2 PPs) of each LP.
In this case, we are mirroring a LP to two PP's.
Also, 200 PP's are specified. If a PP is 128 MB is size, the total amount of space of one "mirror" is 25600 MB.

# mklv -y lv05 -c 2 splvg 200

The following example shows the use of mklv command to create a new LV newlv in the rootvg
and it will have 10 LP's and each LP consists of 2 physical partitions.

# mklv -y newlv -c 2 rootvg 10

To make a logical volume in volume group vg02 with one logical partition and a total of two copies of the data, enter: 

# mklv -c 2 vg02 1

To make a logical volume in volume group vg03 with nine logical partitions and a total of three copies 
spread across a maximum of two physical volumes, and whose allocation policy is not strict, enter: 

# mklv -c 3 -u 2 -s n vg03 9

To make a logical volume in vg04 with five logical partitions allocated across the center sections of the 
physical volumes when possible, with no bad-block relocation, and whose type is paging, enter: 

# mklv -a c -t paging -b n vg04 5

To make a logical volume in vg03 with 15 logical partitions chosen from physical volumes hdisk5, hdisk6, and hdisk9, 
enter: 

# mklv vg03 15 hdisk5 hdisk6 hdisk9

To make a striped logical volume in vg05 with a stripe size of 64K across 3 physical volumes and 12 
logical partitions, enter: 

# mklv -u 3 -S 64K vg05 12

To make a striped logical volume in vg05 with a stripe size of 8K across hdisk1, hdisk2, and hdisk3 and 
12 logical partitions, enter: 

# mklv -S 8K vg05 12 hdisk1 hdisk2 hdisk3

The following example uses a "map file /tmp/mymap1" which list which PPs are to be used in creating a LV:

# mklv -t jfs -y lv06 -m /tmp/mymap1 rootvg 10


The setting Strict=y means that each copy of the LP is placed on a different PV. The setting Strict=n means
that copies are not restricted to different PVs. 
The default is strict.


# mklv -y lv13 -c 2 failovervg 150
# crfs -v jfs -d lv13 -m /backups2 -a bf=true

Another simple example using local disks:

# mkvg -y appsvg -s 32 hdisk2
# mkvg -y datavg -s 32 hdisk3

# mklv -y testlv -c 1 appsvg 10
# mklv -y backuplv -c 1 datavg 10

# crfs -v jfs -d testlv -m /test -a bf=true
# crfs -v jfs -d backuplv -m /backup -a bf=true

mklv -y testlv1 -c 1 appsvg 10
mklv -y testlv2 -c 1 datavg 10
crfs -v jfs -d testlv1 -m /test1 -a bf=true
crfs -v jfs -d testlv2 -m /test2 -a bf=true


mklv -y testlv1 -c 1 vgp0corddap01 10
mklv -y testlv2 -c 1 vgp0corddad01 10
crfs -v jfs -d testlv1 -m /test1 -a bf=true
crfs -v jfs -d testlv2 -m /test2 -a bf=true

rmlv command:
-------------

# rmlv newlv
Warning, all data on logical volume newlv will be destroyed.
rmlv: Do you wish to continue? y(es) n(o) y
#

extendlv command:
-----------------

The following example shows the use of the extentlv command to add 3 more LP's to the LP newlv:

# extendlv newlv 3

cplv command:
-------------

The following command copies the contents of LV oldlv to a new LV called newlv:
# cplv -v myvg -y newlv oldlv

To copy to an existing LV:
# cplv -e existinglv oldlv

Purpose
Copies the contents of a logical volume to a new logical volume.

Syntax
To Copy to a New Logical Volume

cplv [ -vg VolumeGroup ] [ -lv NewLogicalVolume | -prefix Prefix ] SourceLogicalVolume

To Copy to an Existing Logical Volume

cplv [ -f ] SourceLogicalVolume DestinationLogicalVolume

cplv -e DestinationLogicalVolume [-f] SourceLogicalVolume

-e: specifies that the DestinationLogicalVolume already exists.
-f: no user confirmation
-y: specifies the name to use for the NewLogicalVolume, instead of a system generated name.

Description
Attention: Do not copy from a larger logical volume containing data to a smaller one. Doing so results 
in a corrupted file system because some data is not copied.
The cplv command copies the contents of SourceLogicalVolume to a new or existing logical volume. 
The SourceLogicalVolume parameter can be a logical volume name or a logical volume ID. 
The cplv command creates a new logical volume with a system-generated name by using the default syntax. 
The system-generated name is displayed. 

Note:
The cplv command can not copy logical volumes which are in the open state, 
including logical volumes 
that are being used as backing devices for virtual storage.
Flags
-f Copies to an existing logical volume without requesting user confirmation. 
-lv NewLogicalVolume Specifies the name to use, in place of a system-generated name, 
 for the new logical volume. Logical volume names must be unique systemwide names, and can range 
 from 1 to 15 characters. 
-prefix Prefix Specifies a prefix to use in building a system-generated name for the new logical volume. 
 The prefix must be less than or equal to 13 characters. A name cannot be a name already used by another device. 
-vg VolumeGroup Specifies the volume group where the new logical volume resides. If this is not specified, 
 the new logical volume resides in the same volume group as the SourceLogicalVolume. 

Examples
To copy the contents of logical volume fslv03 to a new logical volume, type: 

# cplv fslv03
The new logical volume is created, placed in the same volume group as fslv03, 
and named by the system. 

To copy the contents of logical volume fslv03 to a new logical volume in volume group vg02, 
type: 
#cplv  -vg vg02 fslv03
The new logical volume is created, named, and added to volume group vg02. 

#To copy the contents of logical volume lv02 to a smaller, existing logical volume, 
lvtest, without requiring user confirmation, type: 
cplv -f lv02 lvtest


Errors:
-------

0516-746 cplv: Destination logical volume must have 
         type set to copy 

chlv -t copy lvprj


==========================================================================
CASES of usage of cplv command:

CASE 1:
-------

TITLE    : Procedure for moving a filesystem between disks that are in
           different volume groups using the cplv command.
OS LEVEL : AIX 4.x
DATE     : 25/11/99
VERSION  : 1.0

----------------------------------------------------------------------------

In the following example, an RS6000 has 1 one disk with rootvg on, and has
just had a second disk installed. The second disk needs a volume group
creating on it and a data filesystem transferring to the new disk. Ensure
that you have a full system backup befor you start.


lspv

hdisk0         00009922faf79f0d    rootvg         
hdisk1         None                None           

df -k

Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4             8192      1228   86%     1647    41% /
/dev/hd2           380928     40984   90%    11014    12% /usr
/dev/hd9var         32768     20952   37%      236     3% /var
/dev/hd3            28672      1644   95%      166     3% /tmp
/dev/hd1            53248     51284    4%       95     1% /home
/dev/lv00          200704    110324   46%     1869     4% /home/john
/dev/ftplv         102400     94528    8%       32     1% /home/ftp
/dev/lv01          114688     58240   50%       59     1% /usr2

In this example the /usr2 filesystem needs to be moved to the new disk 
drive, freeing up space in the root volume group. 


1, Create a data volume group on the new disk (hdisk1), the command below
   will create a volume group called datavg on hdisk1 with a PP size of 
   32 Meg:-

   mkvg -s 32 -y datavg hdisk1

2, Create a jfslog logical volume on the new volume group :-

   mklv -y datalog -t jfslog datavg 1

3, Initialise the jfslog :-

   logform /dev/datalog

   logform: destroy /dev/datalog (y)?y

4, Umount the filesystem that is being copied :-

   umount /usr2

5, Copy the /usr2 logical volume (lv01) to a new logical volume (lv11) on 
   the new volume group :-

   cplv -y lv11 -v datavg lv01

   cplv: Logical volume lv01 successfully copied to lv11 .

6, Change the /usr2 filesystem to use the new (/dev/lv11) logical volume 
   and not the old (/dev/lv01) logical volume :-

   chfs -a dev=/dev/lv11 /usr2

7, Change the /usr2 filesystem to use the jfslog on the new volume group 
   (/dev/datalog) :- 

   chfs -a log=/dev/datalog /usr2

8, Mount the filesystem :-

   mount /usr2

   df -k

   Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
   /dev/hd4             8192      1220   86%     1649    41% /
   /dev/hd2           380928     40984   90%    11014    12% /usr
   /dev/hd9var         32768     20952   37%      236     3% /var
   /dev/hd3            28672      1644   95%      166     3% /tmp
   /dev/hd1            53248     51284    4%       95     1% /home
   /dev/lv00          200704    110324   46%     1869     4% /home/john
   /dev/ftplv         102400     94528    8%       32     1% /home/ftp
   /dev/lv11          114688     58240   50%       59     1% /usr2

9, Once the filesystem has been checked out, the old logical volume can
   be removed :-

   rmfs /dev/lv01

   Warning, all data contained on logical volume lv01 will be destroyed.
   rmlv: Do you wish to continue? y(es) n(o)? y
   rmlv: Logical volume lv01 is removed. 


If you wish to copy further filesystems repeat parts 4 to 9.

==========================================================================

CASE 2:
-------

Doel:
-----

Een "move" van het /prj filesystem (met Websphere in /prj/was) op rootvg,
naar een nieuw (groter en beter) volume group "wasvg".
Het huidige /prj op rootvg, correspondeerd met de LV "prjlv".
De nieuw te maken /prj op wasvg, correspondeerd met de LV "lvprj".

  ROOTVG                     WASVG
  --------------            --------------
  |/usr  (hd2) |            |             |
  |..          |            |             |
  |/prj (prjlv)|----------->|/prj (lvprj) | 
  |..          |            |             |
  --------------             -------------
  hdisk0,hdisk1              hdisk12,hdisk13

opm: /prj bevat "/prj/was", en dat is Websphere.

Hier maken we geen gebruik van een backup tape.

Gebruik het cplv command

  umount /prj
  chfs -m /prj_old /prj

 + mkvg -y wasvg -d 10 -s 128 hdisk12 hdisk13   -- maak VG aan

 + mklv -y lvprj -c 2 wasvg 400                 -- maak LV aan

 + mklv -y waslog -t jfslog wasvg 1             -- maak een jfslog

 + logform /dev/waslog                          -- init de log


  cplv -e lvprj prjlv

  chfs -a dev=/dev/lvprj /prj_old                   --
 
  chfs -a log=/dev/waslog /prj_old

  chfs -m /prj /prj_old
 
  mount /prj

==========================================================================


migratepv command:
------------------

Use the following command to move PPs from hdisk1 to hdisk6 and hdisk7 (all PVs must be in 1 VG)
# migratepv hdisk1 hdisk6 hdisk7

Use the following command to move PPs in LV lv02 from hdisk1 to hdisk6 
# migratepv -l lv02 hdisk1 hdisk6


chvg command:
-------------

This example multiplies by 2 the number of PPs:
# chvg -t2 datavg
 

chpv command:
-------------

The chpv command changes the state of the physical volume in a volume group by setting allocation 
permission to either allow or not allow allocation and by setting the availability to either 
available or removed. This command can also be used to clear the boot record for the given physical volume. 
Characteristics for a physical volume remain in effect unless explicitly changed with the corresponding flag.

Examples

To close physical volume hdisk03, enter: 
# chpv -v r hdisk03

The physical volume is closed to logical input and output until the -v a flag is used. 

To open physical volume hdisk03, enter: 
# chpv -v a hdisk03

The physical volume is now open for logical input and output. 

To stop the allocation of physical partitions to physical volume hdisk03, enter: 
# chpv -a n hdisk03

No physical partitions can be allocated until the -a y flag is used. 

To clear the boot record of a physical volume hdisk3, enter: 
# chpv -c hdisk3




How to synchronize stale partitions in a VG?:
---------------------------------------------

the syncvg command:

syncvg Command

Purpose
Synchronizes logical volume copies that are not current.

Syntax
syncvg [ -f ] [ -i ] [ -H ] [ -P NumParallelLps ] { -l | -p | -v } Name ...

Description
The syncvg command synchronizes the physical partitions, which are copies of the original physical partition, 
that are not current. The syncvg command can be used with logical volumes, physical volumes, 
or volume groups, with the Name parameter representing the logical volume name, physical volume name, 
or volume group name. The synchronization process can be time consuming, depending on the 
hardware characteristics and the amount of data.

When the -f flag is used, a good physical copy is chosen and propagated to all other copies 
of the logical partition, whether or not they are stale. Using this flag is necessary 
in cases where the logical volume does not have the mirror write consistency recovery.

Unless disabled, the copies within a volume group are synchronized automatically when the volume group is 
activated by the varyonvg command. 

Note:
For the sycnvg command to be successful, at least one good copy of the logical volume should 
be accessible, and the physical volumes that contains this copy should be in ACTIVE state. 
If the -f option is used, the above condition applies to all mirror copies.
If the -P option is not specified, syncvg will check for the NUM_PARALLEL_LPS environment variable. 
The value of NUM_PARALLEL_LPS will be used to set the number of logical partitions to be synchronized in parallel.

Examples
To synchronize the copies on physical volumes hdisk04 and hdisk05, enter: 
# syncvg  -p hdisk04 hdisk05

To synchronize the copies on volume groups vg04 and vg05, enter: 
# syncvg  -v vg04 vg05




How to Mirror a Logical Volume? :
--------------------------------

mklvcopy LogicalVolumeName Numberofcopies 
syncvg VolumeGroupName 

To add a copy for LV lv01 on disk hdisk7:

# mklvcopy lv01 2 hdisk7


Identifying hotspots: lvmstat command:
--------------------------------------

The lvmstat command display statistics values since the previous lvmstat command.
# lvmstat -v rootvg -e
# lvmstat -v rootvg -C
# lvmstat -v rootvg

Logical Volume       iocnt    KB_read   KB_wrtn   Kbps
hd8                   4        0        0         0.00
paging01              0        0        0         0.00
..
..


31.2 Mirroring a VG:
====================

LVM provide a disk mirroring facility at the LV level. 
Mirroring is the association of 2 or 3 PP's with each LP in a LV.

Use the "mklv", or the "mklvcopy", or the "mirrorvg" command.

The mklv command allows you to select one or two additional copies for each logical volume.

example:

To make a logical volume in volume group vg03 with nine logical partitions and a total of three copies 
spread across a maximum of two physical volumes, and whose allocation policy is not strict, enter: 

mklv -c 3 -u 2 -s n vg03 9

Mirroring can also be added to an existing LV using the mklvcopy command.

The mirrorvg command mirrors all the LV's on a given VG.
Examples:

- To triply mirror a VG, run
# mirrorvg -c 3 myvg

- To get default mirroring of the rootvg, run
# mirrorvg rootvg

- To replace a failed disk in a mirrored VG, run
# unmirrorvg workvg hdisk7
# reducevg workvg hdisk7
# rmdev -l hdisk7 -d

Now replace the failed disk with a new one and name it hdisk7
# extendvg workvg hdisk7
# mirrorvg workvg


mirrorvg command:
-----------------

mirrorvg Command


Purpose
Mirrors all the logical volumes that exist on a given volume group. 
This command only applies to AIX 4.2.1 or later. 


Syntax
mirrorvg [ -S | -s ] [ -Q ] [ -c Copies] [ -m ] VolumeGroup [ PhysicalVolume ... ] 


Description
The mirrorvg command takes all the logical volumes on a given volume group and mirrors 
those logical volumes. This same functionality may also be accomplished manually if you execute 
the mklvcopy command for each individual logical volume in a volume group. As with mklvcopy, 
the target physical drives to be mirrored with data must already be members of the volume group. 
To add disks to a volume group, run the extendvg command. 

By default, mirrorvg attempts to mirror the logical volumes onto any of the disks in a volume group. 
If you wish to control which drives are used for mirroring, you must include the list of disks in the 
input parameters, PhysicalVolume. Mirror strictness is enforced. Additionally, mirrorvg mirrors 
the logical volumes, using the default settings of the logical volume being mirrored. 
If you wish to violate mirror strictness or affect the policy by which the mirror is created, 
you must execute the mirroring of all logical volumes manually with the mklvcopy command. 

When mirrorvg is executed, the default behavior of the command requires that the synchronization 
of the mirrors must complete before the command returns to the user. If you wish to avoid the delay, 
use the -S or -s option. Additionally, the default value of 2 copies is always used. To specify a value 
other than 2, use the -c option. 


Note: To use this command, you must either have root user authority or be a member of the system group. 

Attention: The mirrorvg command may take a significant amount of time before completing because 
of complex error checking, the amount of logical volumes to mirror in a volume group, and the time 
is takes to synchronize the new mirrored logical volumes. 
You can use the Volumes application in Web-based System Manager (wsm) to change volume characteristics. 
You could also use the System Management Interface Tool (SMIT) smit mirrorvg fast path to run this command. 


Flags

-c Copies  Specifies the minimum number of copies that each logical volume must have after 
   the mirrorvg command has finished executing. It may be possible, through the independent use 
   of mklvcopy, that some logical volumes may have more than the minimum number specified after 
   the mirrorvg command has executed. Minimum value is 2 and 3 is the maximum value. 
   A value of 1 is ignored.  
-m exact map  Allows mirroring of logical volumes in the exact physical partition order that 
   the original copy is ordered. This option requires you to specify a PhysicalVolume(s) where the exact map 
   copy should be placed. If the space is insufficient for an exact mapping, then the command will fail. 
   You should add new drives or pick a different set of drives that will satisfy an exact 
   logical volume mapping of the entire volume group. The designated disks must be equal to or exceed 
   the size of the drives which are to be exactly mirrored, regardless of if the entire disk is used. 
   Also, if any logical volume to be mirrored is already mirrored, this command will fail.  
-Q Quorum Keep  By default in mirrorvg, when a volume group's contents becomes mirrored, volume group 
   quorum is disabled. If the user wishes to keep the volume group quorum requirement after mirroring 
   is complete, this option should be used in the command. For later quorum changes, refer to the chvg command.  
-S Background Sync  Returns the mirrorvg command immediately and starts a background syncvg of the volume group. 
   With this option, it is not obvious when the mirrors have completely finished their synchronization. 
   However, as portions of the mirrors become synchronized, they are immediately used by the operating system 
   in mirror usage.  
-s Disable Sync  Returns the mirrorvg command immediately without performing any type of 
   mirror synchronization. If this option is used, the mirror may exist for a logical volume but 
   is not used by the operating system until it has been synchronized with the syncvg command.  


The following is a description of rootvg: 

- rootvg mirroring  When the rootvg mirroring has completed, you must perform three additional tasks: 
bosboot, bootlist, and reboot. 
The bosboot command is required to customize the bootrec of the newly mirrored drive. 
The bootlist command needs to be performed to instruct the system which disk and order you prefer 
the mirrored boot process to start. 

Finally, the default of this command is for Quorum to be turned off. For this to take effect 
on a rootvg volume group, the system must be rebooted. 
 
- non-rootvg mirroring  When this volume group has been mirrored, the default command causes Quorum 
to deactivated. The user must close all open logical volumes, execute varyoffvg and then varyonvg on 
the volume group for the system to understand that quorum is or is not needed for the volume group. 
If you do not revaryon the volume group, mirror will still work correctly. However, any quorum changes 
will not have taken effect.  
rootvg and non-rootvg mirroring  The system dump devices, primary and secondary, should not be mirrored. 
In some systems, the paging device and the dump device are the same device. However, most users want 
the paging device mirrored. When mirrorvg detects that a dump device and the paging device are the same, 
the logical volume will be mirrored automatically. 
If mirrorvg detects that the dump and paging device are different logical volumes, the paging device 
is automatically mirrored, but the dump logical volume is not. The dump device can be queried and modified 
with the sysdumpdev command. 

 
Remark:
-------
Run bosboot to initialize all boot records and devices by executing the 
following command:
bosboot -a -d /dev/hdisk?
hdisk? is the first hdisk listed under the PV heading after the command 
lslv -l hd5 has executed.

Secondary, you need to understant that the mirroring under AIX it's at 
the logical volume level. The mirrorvg command is a hight level command 
that use "mklvcopy" command.
So, all LV created before runing the mirrorvg command are keep 
synchronised, but if you add a new LV after runing mirrorvg, you need to 
mirror it manualy using "mklvcopy" .

Remark:
-------

lresynclv



Mirroring the rootvg:
---------------------

Method 1:
---------

Howto mirror an AIX rootvg
The following steps will guide you trough the mirroring of an AIX rootvg.
This info is valid for AIX 4.3.3, AIX 5.1, AIX 5.2 and AIX 5.3.

Make sure you have an empty disk, in this example its hdisk1 
Add the disk to the vg via 

# extendvg rootvg hdisk1 

Mirror the vg via: 

# mirrorvg -s rootvg

Now synchronize the new copies you created:

# syncvg -v rootvg

As we want to be able to boot from different disks, we need to use bosboot:

# bosboot -a

As hd5 is mirrored there is no need to do it for each disk.

Now, update the bootlist:

# bootlist -m normal hdisk1 hdisk0
# bootlist -m service hdisk1 hdisk0


When mirrorvg is executed, the default behavior of the command requires that the synchronization of the mirrors 
must complete before the command returns to the user. If you wish to avoid the delay, use the -S or -s option. 
Additionally, the default value of 2 copies is always used. To specify a value other than 2, use the -c option.


Method 2:
---------

-------------------------------------------------------------------------------
# Add the new disk, say its hdisk5, to rootvg

extendvg rootvg hdisk5

# If you use one mirror disk, be sure that a quorum is not required for varyon:

chvg -Qn rootvg

# Add the mirrors for all rootvg LV's:

mklvcopy hd1 2 hdisk5
mklvcopy hd2 2 hdisk5
mklvcopy hd3 2 hdisk5
mklvcopy hd4 2 hdisk5
mklvcopy hd5 2 hdisk5
mklvcopy hd6 2 hdisk5
mklvcopy hd8 2 hdisk5
mklvcopy hd9var 2 hdisk5
mklvcopy hd10opt 2 hdisk5
mklvcopy prjlv 2 hdisk5

#If you have other LV's in your rootvg, be sure to create copies for them as well !!
------------------------------------------------------------------------------

# lspv -l hdisk0
hd5                   1     1     01..00..00..00..00    N/A
prjlv                 256   256   108..44..38..50..16   /prj
hd6                   59    59    00..59..00..00..00    N/A
fwdump                5     5     00..05..00..00..00    /var/adm/ras/platform
hd8                   1     1     00..00..01..00..00    N/A
hd4                   26    26    00..00..02..24..00    /
hd2                   45    45    00..00..37..08..00    /usr
hd9var                10    10    00..00..02..08..00    /var
hd3                   22    22    00..00..04..10..08    /tmp
hd1                   8     8     00..00..08..00..00    /home
hd10opt               24    24    00..00..16..08..00    /opt



Method 3:
---------

In the following example, an RS6000 has 3 disks, 2 of which have the AIX
filesystems mirrored on. The boolist contains both hdisk0 and hdisk1. 
There are no other logical volumes in rootvg other than the AIX system 
logical volumes. hdisk0 has failed and need replacing, both hdisk0 and hdisk1
are in "Hot Swap" carriers and therefore the machine does not need shutting 
down. 

lspv

hdisk0         00522d5f22e3b29d    rootvg
hdisk1         00522d5f90e66fd2    rootvg 
hdisk2         00522df586d454c3    datavg                                     

lsvg -l rootvg

rootvg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
hd6                 paging     4     8     2    open/syncd    N/A
hd5                 boot       1     2     2    closed/syncd  N/A
hd8                 jfslog     1     2     2    open/syncd    N/A
hd4                 jfs        1     2     2    open/syncd    /
hd2                 jfs        12    24    2    open/syncd    /usr
hd9var              jfs        1     2     2    open/syncd    /var
hd3                 jfs        2     4     2    open/syncd    /tmp
hd1                 jfs        1     2     2    open/syncd    /home



1, Reduce the logical volume copies from both disks to hdisk1 only :-

   rmlvcopy hd6 1 hdisk0
   rmlvcopy hd5 1 hdisk0
   rmlvcopy hd8 1 hdisk0
   rmlvcopy hd4 1 hdisk0
   rmlvcopy hd2 1 hdisk0
   rmlvcopy hd9var 1 hdisk0
   rmlvcopy hd3 1 hdisk0
   rmlvcopy hd1 1 hdisk0
   
2, Check that no logical volumes are left on hdisk0 :-

   lspv -p hdisk0

   hdisk0:
   PP RANGE  STATE   REGION        LV ID          TYPE       MOUNT POINT
     1-101   free    outer edge
   102-201   free    outer middle
   202-301   free    center
   302-401   free    inner middle
   402-501   free    inner edge     

3, Remove the volume group from hdisk0

   reducevg -df rootvg hdisk0

4, Recreate the boot logical volume on hdisk1, and reset bootlist:-

   bosboot -a -d /dev/hdisk1
   bootlist -m normal rmt0 cd0 hdisk1

5, Check that everything has been removed from hdisk0 :-

   lspv

   hdisk0         00522d5f22e3b29d    None
   hdisk1         00522d5f90e66fd2    rootvg
   hdisk2         00522df586d454c3    datavg          

6, Delete hdisk0 :-

   rmdev -l hdisk0 -d

7, Remove the failed hard drive and replace with a new hard drive.

8, Configure the new disk drive :-

   cfgmgr

9, Check new hard drive is present :-

   lspv

10, Include the new hdisk in root volume group :-

    extendvg rootvg hdisk?  (where hdisk? is the new hard disk)

11, Re-create the mirror :-

    mirrorvg rootvg hdisk?  (where hdisk? is the new hard disk)

12, Syncronise the mirror :-

    syncvg -v rootvg

13, Reset the bootlist :-

    bootlist -m normal rmt0 cd0 hdisk0 hdisk1

14, Turn off Quorum checking on rootvg :-

    chvg -Q n rootvg


Method 4:
---------

Howto mirror an AIX rootvg
The following steps will guide you trough the mirroring of an AIX rootvg.
This info is valid for AIX 4.3.3, AIX 5.1, AIX 5.2 and AIX 5.3.

Make sure you have an empty disk, in this example its hdisk1 
Add the disk to the vg via "extendvg rootvg hdisk1 
Mirror the vg via: "mirrorvg rootvg" 
Adapt the bootlist to add the current disk, the system will then fail to hdisk1 is hdisk0 fails during startup 
do bootlist -o -m normal 
this will list currently 1 disk, in this exmaple hdisk0 
do bootlist -m normal hdisk0 hdisk1 
Run a bosboot on both new disks, this will install all software needed for boot on the disk 
bosboot -ad hdisk0 
bosboot -ad hdisk1 


Method 5:
---------

Although the steps to mirror volume groups between HP and AIX are incredibly similar, 
there are enough differences to send me through hoops if/when I ever have to do that. 
Therefore, the following checklist: 

1. Mirror the logical volumes: 
If you don't care what disks the lvs get mirrored to, execute

mirrorvg rootvg


Otherwise: 

for lv in $(lsvg -l rootvg | grep -i open/syncd | \
	grep -v dumplv | awk '{print $1}')
do
	mklvcopy ${lv} 1 ${disk}
done

2. Change the quorum checking if you did not use mirrorvg:

chvg -Q n rootvg


3. Run bosboot on the new drive to copy boot files to it:

bosboot ${disk}


4. Update the bootlist with the new drive:

bootlist -m normal hdisk0 hdisk1


5. Reboot the system to enable the new quorum checking parameter 


Method 6:
---------

Audience: System Administrators 
Date: September 25, 2002 


Mirroring "rootvg" protects the operating system from a disk failure. Mirroring "rootvg" 
requires a couple extra steps compared to other volume groups. The mirrored rootvg disk must be bootable 
*and* in the bootlist. Otherwise, if the primary disk fails, you'll continue to run, 
but you won't be able to reboot. 

In brief, the procedure to mirror rootvg on hdisk0 to hdisk1 is 

1. Add hdisk1 to rootvg:
extendvg rootvg hdisk1 

2. Mirror rootvg to hdisk1:
mirrorvg rootvg hdisk1 (or smitty mirrorvg) 

3. Create boot images on hdisk1:
bosboot -ad /dev/hdisk1 

4. Add hdisk1 to the bootlist:
bootlist -m normal hdisk0 hdisk1 

5. Reboot to disable quorum checking on rootvg. The mirrorvg turns off quorum by default, 
but the system needs to be rebooted for it to take effect. 

For more information, and a comprehensive procedure see the man page for mirrorvg and 





Example using mklvcopy:
-----------------------

mklvcopy [ -a Position ] [ -e Range ] [ -k ] [ -m MapFile ] [ -s Strict ] [ -u UpperBound ] LogicalVolume 
         Copies [ PhysicalVolume... ] 


Add a copy of LV "lv01" on disk hdisk7:

# mklvcopy lv01 2 hdisk7

The mklvcopy command increases the number of copies in each logical partition in LogicalVolume. 
This is accomplished by increasing the total number of physical partitions for each logical partition 
to the number represented by Copies. The LogicalVolume parameter can be a logical volume name or 
logical volume ID. You can request that the physical partitions for the new copies be allocated 
on specific physical volumes (within the volume group) with the PhysicalVolume parameter; 
otherwise, all the physical volumes within the volume group are available for allocation.

The logical volume modified with this command uses the Copies parameter as its new copy characteristic. 
The data in the new copies are not synchronized until one of the following occurs: 
the -k option is used, the volume group is activated by the varyonvg command, or the volume group 
or logical volume is synchronized explicitly by the syncvg command. Individual logical partitions 
are always updated as they are written to.

The default allocation policy is to use minimum numbering of physical volumes per logical volume copy, 
to place the physical partitions belong to a copy as contiguously as possible, and then to place 
the physical partitions in the desired region specified by the -a flag. Also, by default, each copy 
of a logical partition is placed on a separate physical volume.




Using smitty:
-------------

# smit mklv 

or 

# smit mklvcopy

Using "smit mklv" you can create a new LV and at the same time tell the system to create a mirror
(2 or 3 copies) of each LP and which PV's are involved.

Using "smit mklvcopy" you can add mirrors to an existing LV.



31.3 Filesystems in AIX:
========================

After a VG is created, you can create filesystems. You can use smitty or the crfs and mkfs command.
File systems are confined to a single logical volume.

The journaled file system (JFS) and the enhanced journaled file system (JFS2) are built into the 
base operating system. Both file system types link their file and directory data to the structure 
used by the AIX Logical Volume Manager for storage and retrieval. A difference is that JFS2 is designed to accommodate 
a 64-bit kernel and larger files.

Run lsfs -v jfs2 to determine if your system uses JFS2 file systems. 
This command returns no output if it finds only standard file systems. 


crfs:
-----

crfs -v VfsType { -g VolumeGroup | -d Device } [ -l LogPartitions ]
     -m MountPoint [ -n NodeName ] [ -u MountGroup ] [ -A { yes | no } ] [ -p {ro | rw } ] 
     [ -a Attribute= Value ... ] [ -t { yes | no } ]



The crfs command creates a file system on a logical volume within a previously created volume group. 
A new logical volume is created for the file system unless the name of an existing logical volume is 
specified using the -d. An entry for the file system is put into the /etc/filesystems file.

crfs -v jfs -g(vg) -m(mount point) -a size=(size of fs) -A yes 
Will create a logical volume on the volume group and create the file system on 
the logical volume. All at the size stated. Will add entry into 
/etc/filesystems and will create the mount point directory if it does not exist. 

- To make a JFS on the rootvg volume group with nondefault fragment size and nondefault nbpi, enter:
# crfs  -v jfs  -g  rootvg  -m /test -a size=32768 -a frag=512 -a nbpi=1024

This command creates the /test file system on the rootvg volume group with a fragment size of 512 bytes, 
a number of bytes per i-node (nbpi) ratio of 1024, and an initial size of 16MB (512 * 32768).

- To make a JFS on the rootvg volume group with nondefault fragment size and nondefault nbpi, enter: 
# crfs -v jfs -g rootvg -m /test -a size=16M -a frag=512 -a nbpi=1024

This command creates the /test file system on the rootvg volume group with a fragment size of 512 bytes, 
a number of bytes per i-node (nbpi) ratio of 1024, and an initial size of 16MB. 

- To create a JFS2 file system which can support NFS4 ACLs, type: 
# crfs -v jfs2 -g rootvg -m /test -a size=1G -a ea=v2

- This command creates the /test JFS2 file system on the rootvg volume group with an initial size of 1 gigabyte. 
The file system will store extended attributes using the v2 format.
# crfs -v jfs -g backupvg -m /backups -a size=32G -a bf=true

# crfs -v jfs -g oravg -m /filetransfer -a size=4G -a bf=true


Extended example:
-----------------

The following command creates a JFS filesystem on a previously created LV "lv05".
In this example, suppose the LV was created in the following way:

# mklv -y lv05 -c 2 splvg 200

In this case, it is clear that we mirror each LP to 2 PP's (because of the -c 2).

Now to create a filesystem on lv05, we can use the command
# crfs -v jfs -d lv05 -m /spl -a bf=true

Note that we did not mentioned the size of the filesystem. This is because we use a previously defined LV
with a known size. 
 

Notes:

1. The option -a bf=true allows large files [ > 2Gb]; 

2. Specifying -m /<name> (like for example "/data") will create the entry in /etc/filesystems for you


Some more examples:
-------------------

Commands to create VG's:
mkvg oravg -d 10 -s 128 hdisk2 hdisk4
mkvg splvg -d 10 -s 128 hdisk3 hdisk5
mkvg softwvg -d 10 -s 128 hdisk6
mkvg backupvg -d 10 -s 128 hdisk7

Set of Create Logical Volume and Filesystem commands:	

# crfs -v jfs -g <Vgname> -m <Mountpoint> -a size=xG -a bf=true
or
# mklv -y <LV_name> -c 2 <VG_name> No_Of_PPs
# crfs -v jfs -d <LV_name> -m <MountPoint> -a bf=true

		
# mklv -y lv05 -c 2 splvg 300			
# crfs -v jfs -d lv05 -m /spl -a bf=true			
# mklv -y lv06 -c 2 splvg 100			
# crfs -v jfs -d lv06 -m /u04 -a bf=true			
			
# mklv -y lv02 -c 2 oravg 200			
# mklv -y lv03 -c 2 oravg 200			
# mklv -y lv04 -c 2 oravg 200			
# crfs -v jfs -d lv02 -m /u01 -a bf=true			
# crfs -v jfs -d lv03 -m /u02 -a bf=true			
# crfs -v jfs -d lv04 -m /u03 -a bf=true			
			
# crfs -v jfs -g backupvg -m /backups -a size=33G -a bf=true			
# crfs -v jfs -g backupvg -m /data -a size=33G -a bf=true			
# crfs -v jfs -g softwvg -m /apps -a size=16G -a bf=true			
# crfs -v jfs -g softwvg -m /software -a size=33G -a bf=true			
# crfs -v jfs -g softwvg -m /u05 -a size=12G -a bf=true			



mkfs:
-----

The mkfs command makes a new file system on a specified device. The mkfs command initializes the volume label, 
file system label, and startup block.

The Device parameter specifies a block device name, raw device name, or file system name. If the parameter 
specifies a file system name, the mkfs command uses this name to obtain the following parameters from the 
applicable stanza in the /etc/filesystems file, unless these parameters are entered with the mkfs command.

- To specify the volume and file system name for a new file system, type: 
# mkfs  -lworks  -vvol001 /dev/hd3

This command creates an empty file system on the /dev/hd3 device, giving it the volume serial number vol001 
and file system name works. The new file system occupies the entire device. 
The file system has a default fragment size (4096 bytes) and a default nbpi ratio (4096). 

- To create a file system with nondefault attributes, type: 
# mkfs  -s 8192  -o nbpi=2048,frag=512 /dev/lv01

This command creates an empty 4 MB file system on the /dev/lv01 device with 512-byte fragments and 
1 i-node for each 2048 bytes. 

-To create a large file enabled file system, type: 
# mkfs -V jfs -o nbpi=131072,bf=true,ag=64 /dev/lv01

This creates a large file enabled JFS file system with an allocation group size of 64 megabytes and 1 inode 
for every 131072 bytes of disk. The size of the file system will be the size of the logical volume lv01.

- To create a file system with nondefault attributes, type: 
# mkfs -s 4M -o nbpi=2048, frag=512 /dev/lv01

This command creates an empty 4 MB file system on the /dev/lv01 device with 512-byte fragments and one i-node 
for each 2048 bytes. 

- To create a JFS2 file system which can support NFS4 ACLs, type: 
# mkfs -V jfs2 -o ea=v2 /dev/lv01

This command creates an empty file system on the /dev/lv01 device with v2 format for extended attributes.


chfs command:
-------------

- Example 1:

How do I change the size of a filesystem? 

To increase /usr filesystem size by 1000000 512-byte blocks, type:
# chfs -a size=+1000000 /usr
- Example 2:

To split off a copy of a mirrored file system and mount it read-only for use as an online backup, enter: 
# chfs -a splitcopy=/backup -a copy=2 /testfs
This mount a read-only copy of /testfs at /backup.

- Example 3:

To change the mount point of a file system, enter: 
# chfs  -m /test2 /test
This command changes the mount point of a file system from /test to /test2. 

- Eaxample 4:

# chfs -a size=+20G /data/udb/eidwha2/eddwha2/DATA03

- Example 5:

chfs -a size=+5M /opt



 would do it this way:

1) chfs -m old_filename new_filename

2) umount old_filename

3) mount new_filename

To stop or kill access to a fs, use:
fuser -xuc /scratch



lsfs command:
-------------

Displays the characteristics of file systems.

Syntax
lsfs [ -q ] [ -c | -l ] [ -a | -v VfsType | -u MountGroup| [FileSystem...] ]

Description
The lsfs command displays characteristics of file systems, such as mount points, automatic mounts, permissions, 
and file system size. The FileSystem parameter reports on a specific file system. 
The following subsets can be queried for a listing of characteristics:

All file systems 
All file systems of a certain mount group 
All file systems of a certain virtual file system type 
One or more individual file systems

The lsfs command displays additional Journaled File System (JFS) or Enhanced Journaled File System (JFS2) 
characteristics if the -q flag is specified.

To show all file systems in the /etc/filesystems file, enter: 
#lsfs

To show all file systems of vfs type jfs, enter: 
#lsfs  -v jfs

To show the file system size, the fragment size, the compression algorithm (if any), and the 
number of bytes per i-node as recorded in the superblock of the root file system, enter: 
#lsfs  -q /




31.4 SAN connection via SDD, and related commands:
==================================================

If you use advanced storage on AIX, the workings on disks and volume groups are a bit different
from the traditional ways, using local disks, as described above. 

You can use SDD or SDDPCM Multipath IO. This section describes SDD. See section 31.5 for SDDPCM.


Overview of the Subsystem device driver:
----------------------------------------

The IBM System Storage Multipath Device Driver SDD provides multipath configuration environment support
for a host system that is attached to storage devices. It provides:

-Enhanced data availability 
-Automatic path failover and recovery to an alternate path 
-Dynamic load balancing of multiple paths 
-Concurrent microcode upgrade.

The IBM System Storage Multipath Subsystem Device Driver Path Control Module SDDPCM provides
AIX MPIO support. Its a loadable module. During the configuration of supported devices, SDDPCM is loaded
and becomes part of the AIX MPIO Fibre Channel protocol device driver. The AIX MPIO-capable device driver
with the SDDPCM module provides the same functions that SDD provides.

Note that before attempting to exploit the Virtual shared disk support for the Subsystem device driver, 
you must read IBM Subsystem Device Driver Installation and User's Guide.

An SDD implementation is available for AIX, Solaris, HP-UX, some Linux distro's, Windows 200x.

An impression about the architecture on AIX can be seen in the following figure:


               -------------------------------
               | Host System                 |
               | -------             ------- |
               | |FC 0 |             | FC 1| |
               | -------             ------- |
               -------------------------------
                    |                   |
                    |                   |
              ----------------------------------
          ESS |  --------         --------    |
              |  |port 0|         |port 1|    |
              |  -------- \      /--------    |
              |      |      \   /      |      | 
              |      |        \/       |      |
              |      |        / \      |      |
              |   -----------/    \---------- |
              |   |Cluster 1|      |Cluster 2||
              |   -----------      -----------|
              |    |  |  |  |       | | |  |  |
              |    |  |  |  |       | | |  |  |
              |    O--|--|--|-------| | |  |  |           
              |   lun0|  |  |         | |  |  |
              |       O--|--|---------| |  |  |
              |      lun1|  |           |  |  |
              |          O--|-----------|  |  |
              |         lun2|              |  |
              |             O--------------|  |
              |            lun3               |
              ---------------------------------


DPO (Data Path Optimizer) was renamed by IBM a couple years ago- and became SDD (Subsystem Device Driver). 
When redundant paths are configured to ESS logical units, and the SDD is installed and configured, 
the AIX(R) lspv command shows multiple hdisks as well as a new construct called a vpath. The hdisks and vpaths 
represent the same logical unit. You will need to use the lsvpcfg command to get more information. 

Each SDD vpath device represents a unique physical device on the storage server.
Each physical device is presented to the operating system as an operating system disk device.
So, essentially, a vpath device acts like a disk.

You will see later on that a hdisk is actually a "path" to a LUN, that can be reached either by fscsi0 or fscsi1.
Also you will see that a vpath represents the LUN.

SDD does not support multipathing to a bootdevice.

Support for VIO:
----------------

Starting from SDD version 1.6.2.0, a unique ID attribute is added to SDD vpath devices, in order to 
support AIX5.3 VIO future features. AIX device configure methods have been changed in both AIX52 TL8 and 
AIX53 TL4 for this support.


Examples:
---------

For example, after issuing lspv, you see output similar to this:

# lspv
hdisk0          000047690001d59d      rootvg
hdisk1          000047694d8ce8b6      None
hdisk18         000047694caaba22      None
hdisk19         000047694caadf9a      None
hdisk20         none                  None
hdisk21         none                  None
hdisk22         000047694cab2963      None
hdisk23         none                  None
hdisk24         none                  None
vpath0          none                  None
vpath1          none                  None
vpath2          000047694cab0b35      gpfs1scsivg
vpath3          000047694cab1d27      gpfs1scsivg


After issuing lsvpcfg, you see output similar to this:

# lsvpcfg
vpath0 (Avail ) 502FCA01 = hdisk18 (Avail pv )
vpath1 (Avail ) 503FCA01 = hdisk19 (Avail pv )
vpath2 (Avail pv gpfs1scsivg) 407FCA01 = hdisk20 (Avail ) hdisk24 (Avail )


The examples above illustrate some important points:

- vpath0 consists of a single path (hdisk18) and therefore will not provide failover protection. 
Also, hdisk18 is defined to AIX as a physical volume (pv flag) and has a PVID, as you can see from the output 
of the lspv command. Likewise for vpath1.

- vpath2 has two paths (hdisk20 and hdisk24) and has a volume group defined on it. Notice that with the 
lspv command, hdisk20 and hdisk24 look like newly installed disks with no PVIDs. The lsvpcfg command had 
to be used to determine that hdisk20 and hdisk24 make up vpath2, which has a PVID.

Warning: so be very carefull not to use a hdisk for a "local" VG, if its already used for a vpath.


Other Example:
--------------

# lspv
 hdisk0          00c49e8c8053fe86                    rootvg          active
 hdisk1          00c49e8c841a74d5                    rootvg          active
-hdisk2          none                                None
-hdisk3          none                                None
 vpath0          00c49e8c94c02c15                    datavg          active
 vpath1          00c49e8c94c050d4                    appsvg          active
-hdisk4          none                                None
 vpath2          00c49e8c2806dc22                    appsvg          active
-hdisk5          none                                None
-hdisk6          none                                None
-hdisk7          none                                None


# lsvpcfg

vpath0 (Avail pv datavg) 75BAFX1006C = hdisk2 (Avail ) hdisk5 (Avail )
vpath1 (Avail pv appsvg) 75BAFX1017B = hdisk3 (Avail ) hdisk6 (Avail )
vpath2 (Avail pv appsvg) 75BAFX10329 = hdisk4 (Avail ) hdisk7 (Avail )


# datapath query adapter

Active Adapters :2

Adpt#     Name   State     Mode             Select     Errors  Paths  Active
    0   fscsi0  NORMAL   ACTIVE           12611291          0      3       3
    1   fscsi1  NORMAL   ACTIVE           13375287          0      3       3


# datapath query device

Total Devices : 3


DEV#:   0  DEVICE NAME: vpath0  TYPE: 2107900         POLICY:    Optimized  # this is vpath0
SERIAL: 75BAFX1006C
==========================================================================
Path#      Adapter/Hard Disk          State     Mode     Select     Errors
    0          fscsi0/hdisk2           OPEN   NORMAL   12561763          0
    1          fscsi1/hdisk5           OPEN   NORMAL   13324883          0

DEV#:   1  DEVICE NAME: vpath1  TYPE: 2107900         POLICY:    Optimized
SERIAL: 75BAFX1017B
==========================================================================
Path#      Adapter/Hard Disk          State     Mode     Select     Errors
    0          fscsi0/hdisk3           OPEN   NORMAL      28024          0
    1          fscsi1/hdisk6           OPEN   NORMAL      28847          0

DEV#:   2  DEVICE NAME: vpath2  TYPE: 2107900         POLICY:    Optimized
SERIAL: 75BAFX10329
==========================================================================
Path#      Adapter/Hard Disk          State     Mode     Select     Errors
    0          fscsi0/hdisk4           OPEN   NORMAL      21672          0
    1          fscsi1/hdisk7           OPEN   NORMAL      21712          0


# lsattr -El vpath0
active_hdisk  hdisk2/75BAFX1006C/fscsi0        Active hdisk               False
active_hdisk  hdisk5/75BAFX1006C/fscsi1        Active hdisk               False
policy        df                               Scheduling Policy          True
pvid          00c49e8c94c02c150000000000000000 Physical volume identifier False
serial_number 75BAFX1006C                      LUN serial number          False


# lsdev -Cc adapter
ent0      Available 04-08 10/100/1000 Base-TX PCI-X Adapter (14106902)
ent1      Available 06-08 10/100/1000 Base-TX PCI-X Adapter (14106902)
fcs0      Available 05-08 FC Adapter
fcs1      Available 07-08 FC Adapter
sa0       Available       LPAR Virtual Serial Adapter
sisscsia0 Available 03-08 PCI-X Ultra320 SCSI Adapter


# lsattr -El fcs0
bus_intr_lvl  131193     Bus interrupt level                                False
bus_io_addr   0xcfc00    Bus I/O address                                    False
bus_mem_addr  0xc0040000 Bus memory address                                 False
init_link     al         INIT Link flags                                    True
intr_priority 3          Interrupt priority                                 False
lg_term_dma   0x800000   Long term DMA                                      True
max_xfer_size 0x100000   Maximum Transfer Size                              True
num_cmd_elems 200        Maximum number of COMMANDS to queue to the adapter True
pref_alpa     0x1        Preferred AL_PA                                    True
sw_fc_class   2          FC Class for Fabric                                True


# lscfg -lv fcs0
  fcs0             U7879.001.DQDKCPR-P1-C2-T1  FC Adapter

        Part Number.................03N6441
        EC Level....................A
        Serial Number...............1D54508045
        Manufacturer................001D
        Feature Code................280B
        FRU Number.................. 03N6441
        Device Specific.(ZM)........3
        Network Address.............10000000C94F91CD
        ROS Level and ID............0288193D
        Device Specific.(Z0)........1001206D
        Device Specific.(Z1)........00000000
        Device Specific.(Z2)........00000000
        Device Specific.(Z3)........03000909
        Device Specific.(Z4)........FF801412
        Device Specific.(Z5)........0288193D
        Device Specific.(Z6)........0683193D
        Device Specific.(Z7)........0783193D
        Device Specific.(Z8)........20000000C94F91CD
        Device Specific.(Z9)........TS1.90X13
        Device Specific.(ZA)........T1D1.90X13
        Device Specific.(ZB)........T2D1.90X13
        Device Specific.(YL)........U7879.001.DQDKCPR-P1-C2-T1


# lsdev -Cc adapter -F 'name parent'
ent0      pci4
ent1      pci6
fcs0      pci5
fcs1      pci7
sa0
sisscsia0 pci3


# lsdev -Cc disk -F 'name location'
hdisk0 03-08-00-3,0
hdisk1 03-08-00-5,0
hdisk2 05-08-01 ------------------------>|
hdisk3 05-08-01 ------------------------>|
hdisk4 05-08-01 ------------------------>|
hdisk5 07-08-01                          |
hdisk6 07-08-01                          |
hdisk7 07-08-01                          |
vpath0                                   |
vpath1                                   |
vpath2                                   |
                                         |
                                         |
# lsdev -Cc driver -F 'name location'    |
dpo                                      |
fcnet0 05-08-02                          |
fcnet1 07-08-02                          |
fscsi0 05-08-01 <-------------------------
fscsi1 07-08-01
iscsi0
scsi0  03-08-00

Please note that, for example, from the above output, that fsci0 can be "linked" to hdisk2, hdisk3 and hdisk4,
due to the location code.
You can compare that to the output of "datapath query device".
Also interesting can be the following:

# lsdev -C | grep fc
fcnet0      Defined   05-08-02      Fibre Channel Network Protocol Device
fcnet1      Defined   07-08-02      Fibre Channel Network Protocol Device
fcs0        Available 05-08         FC Adapter
fcs1        Available 07-08         FC Adapter

# lsdev -C | grep fsc
fscsi0      Available 05-08-01      FC SCSI I/O Controller Protocol Device
fscsi1      Available 07-08-01      FC SCSI I/O Controller Protocol Device

From this, you can see that fcs0 is the "parent" of the child "fsci0".


# lsattr -D -l fscsi0
attach       none         How this adapter is CONNECTED         False
dyntrk       no           Dynamic Tracking of FC Devices        True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id                   Adapter SCSI ID                       False
sw_fc_class  3            FC Class for Fabric                   True

# lsattr -D -l fcs0
bus_intr_lvl             Bus interrupt level                                Fals                                              e
bus_io_addr   0x00010000 Bus I/O address                                    Fals                                              e
bus_mem_addr  0x01000000 Bus memory address                                 Fals                                              e
init_link     al         INIT Link flags                                    True
intr_priority 3          Interrupt priority                                 Fals                                              e
lg_term_dma   0x800000   Long term DMA                                      True
max_xfer_size 0x100000   Maximum Transfer Size                              True
num_cmd_elems 200        Maximum number of COMMANDS to queue to the adapter True
pref_alpa     0x1        Preferred AL_PA                                    True
sw_fc_class   2          FC Class for Fabric                                True


# datapath query essmap
 Disk          Path  P     Location   adapter    LUN SN       Type           Size   LSS     Vol  Rank  C/A   S   Connection  port RaidMode
-------       -----  -   -----------  ------   -----------  ------------     ----   ----    ---  ----- ----  -   ----------- ---- --------
vpath0        hdisk2     05-08-01[FC] fscsi0   75BAFX1006C  IBM 2107-900  107.5GB     0    108   fff2   02   Y   R1-B3-H3-ZC  232 RAID5
vpath0        hdisk5     07-08-01[FC] fscsi1   75BAFX1006C  IBM 2107-900  107.5GB     0    108   fff2   02   Y   R1-B3-H3-ZA  230 RAID5
vpath1        hdisk3     05-08-01[FC] fscsi0   75BAFX1017B  IBM 2107-900   14.3GB     1    123   fff1   0b   Y   R1-B3-H3-ZC  232 RAID5
vpath1        hdisk6     07-08-01[FC] fscsi1   75BAFX1017B  IBM 2107-900   14.3GB     1    123   fff1   0b   Y   R1-B3-H3-ZA  230 RAID5
vpath2        hdisk4     05-08-01[FC] fscsi0   75BAFX10329  IBM 2107-900   14.3GB     3     41   ffe1   08   Y   R1-B3-H3-ZC  232 RAID5
vpath2        hdisk7     07-08-01[FC] fscsi1   75BAFX10329  IBM 2107-900   14.3GB     3     41   ffe1   08   Y   R1-B3-H3-ZA  230 RAID5

From this you can see that a hdisk is actually a "path" to a LUN, that can be reached either by fscsi0 or fscsi1.
Also you can see that a vpath represents the LUN.
 

# datapath query adaptstats

Adapter #:  0
=============
                Total Read  Total Write  Active Read  Active Write   Maximum
I/O:               9595892      4371836            0             0        23
SECTOR:          176489389    138699019            0             0      5128

Adapter #:  1
=============
                Total Read  Total Write  Active Read  Active Write   Maximum
I/O:              10238891      4523508            0             0        24
SECTOR:          188677891    143739157            0             0      5128



# datapath query portmap
                          BAY-1(B1)                BAY-2(B2)                BAY-3(B3)                BAY-4(B4)
   ESSID    DISK      H1   H2   H3   H4        H1   H2   H3   H4        H1   H2   H3   H4        H1   H2   H3   H4
                     ABCD ABCD ABCD ABCD      ABCD ABCD ABCD ABCD      ABCD ABCD ABCD ABCD      ABCD ABCD ABCD ABCD
                          BAY-5(B5)                BAY-6(B6)                BAY-7(B7)                BAY-8(B8)
                      H1   H2   H3   H4        H1   H2   H3   H4        H1   H2   H3   H4        H1   H2   H3   H4
                     ABCD ABCD ABCD ABCD      ABCD ABCD ABCD ABCD      ABCD ABCD ABCD ABCD      ABCD ABCD ABCD ABCD
 75BAFX1    vpath0   ---- ---- ---- ----      ---- ---- ---- ----      ---- ---- Y-Y- ----      ---- ---- ---- ----
 75BAFX1    vpath1   ---- ---- ---- ----      ---- ---- ---- ----      ---- ---- Y-Y- ----      ---- ---- ---- ----
 75BAFX1    vpath2   ---- ---- ---- ----      ---- ---- ---- ----      ---- ---- Y-Y- ----      ---- ---- ---- ----

Y  =  online/open               y = (alternate path) online/open
O  =  online/closed             o = (alternate path) online/closed
N  =  offline                   n = (alternate path) offline
-  =  path not configured
PD =  path down

Note: 2105 devices' essid has 5 digits, while 1750/2107 device's essid has 7 digits.


# datapath query wwpn
Adapter Name    PortWWN
fscsi0          10000000C94F91CD
fscsi1          10000000C94F9923



If you need to force the Subsystem Device Driver (SDD), or equivalent driver, to rescan and map the new devices,
use the following command at the system prompt: 

# /usr/sbin/cfgvpath

Procedure to make a new lun available to AIX:
---------------------------------------------

-Allocate the new lun on the SAN 
-Run "cfgmgr" 
-Verify the new vpath/hdisk by running "lsvpcfg" 

There should be a new vpath and it should be available with no volume group - if not, rerun cfgmgr


Create Volume groups with vpaths:
---------------------------------

You should use the mkvg4vp command to create Volume Groups.

Example:

# mkvg4vp -B -t 32 -s 4 -y DB01_RECOV_VG1 vpath4 vpath10

By default, VG's can accommodate up to 255 LV's and 32 PV's. If the -B flag is used on the mkvg or mkvg4vp
command, the resulting VG will support up to 512 LV's and 128 PV's.
The -s flag, as usual, designates the Partition size.



SDD software on AIX:
--------------------

Starting with SDD 1.6.1.0, the SDD package for AIX53 is devices.sdd.53.rte and requires AIX53E 
with APAR IY76997.

Starting with SDD 1.6.2.0, the SDD package for AIX52 is devices.sdd.52.rte and requires AIX52M
with APAR IY76997.

See also in this document:
IBM Flash Alert: SDD 1.6.2.0 requires minimum AIX code levels; possible 0514-035 error

The SDD installation package installs a number of new commands, like datapath, chgvpath, lsvpcfg etc..

Before installing SDD, you should check firmware levels, and AIX APAR requirements. See the following sites: 

-- scsi and ESS, and Fiber:
www-1.ibm.com/servers/storage/support/
www-1.ibm.com/servers/eserver/support/unixservers/index.html 

-- AIX APAR:
www-03.ibm.com/servers/eserver/support/unixservers/aixfixes.html            or,
www.ibm.com/servers/eserver/support/pseries/aixfixes.html                   or,
www14.software.ibm.com/webapp/set2/sas/f/genunix3/aixfixes.html







31.5 SAN connections with SDDPCM MPIO:
======================================

We have seen the SDD connections in section 31.4.

This section covers some of the SDDPCM MPIO SAN connections. 
There are some different commands with this type
of connections to SAN storage.

The use of SDD or SDDPCM gives the AIX host the ability to access multiple paths to a single LUN 
within an ESS or SAN. This ability to access a single LUN on multiple paths allows for a higher degree of 
data availability in the event of a path failure. Data can continue to be accessed within the ESS 
as long as there is at least one available path. Without one of these installed, you will lose access 
to the LUN in the event of a path failure. 

If you have "sdd" installed use the datapath command, and with sddpcm use the pcmpath command.

Just as the commands shown in section 31.4, just replace datapath with pcmpath, like



# pcmpath query device

DEV#:   2  DEVICE NAME: hdisk2  TYPE: 2107900  ALGORITHM:  Load Balance
SERIAL: 75065711100
==========================================================================
Path#      Adapter/Path Name          State     Mode     Select     Errors
    0           fscsi0/path0           OPEN   NORMAL       1240          0
    1           fscsi0/path1           OPEN   NORMAL       1313          0
    2           fscsi0/path2           OPEN   NORMAL       1297          0
    3           fscsi0/path3           OPEN   NORMAL       1294          0

DEV#:   3  DEVICE NAME: hdisk3  TYPE: 2107900  ALGORITHM:  Load Balance
SERIAL: 75065711101
==========================================================================
Path#      Adapter/Path Name          State     Mode     Select     Errors
    0           fscsi0/path0          CLOSE   NORMAL          0          0
    1           fscsi0/path1          CLOSE   NORMAL          0          0
    2           fscsi0/path2          CLOSE   NORMAL          0          0
    3           fscsi0/path3          CLOSE   NORMAL          0          0

DEV#:   4  DEVICE NAME: hdisk4  TYPE: 1750500  ALGORITHM:  Load Balance
SERIAL: 13AAGXA1101
==========================================================================
Path#      Adapter/Path Name          State     Mode     Select     Errors
    0*          fscsi0/path0           OPEN   NORMAL         12          0
    1           fscsi0/path1           OPEN   NORMAL       3787          0
    2*          fscsi1/path2           OPEN   NORMAL         17          0
    3           fscsi1/path3           OPEN   NORMAL       3822          0


# pcmpath query essmap


Some possible errors with pcmpath:

root@zd110l04:/root#pcmpath query device

Kernel extension sdduserke was not loaded. Errno=8.
Please verify SDDPCM device configuration.


On a system with SDDPCM, you will see the SDDPCM server daemon, "pcmsrv", running. 
This process checks available paths and does other checks and monitoring.

The process is under control of the resource controller, like for example starting and stopping it goes with

# stopsrc -s pcmsrv
# startsrc -s pcmsrv

The process is started on boot from inittab:

# cat /etc/inittab | grep pcmsrv
srv:2:wait:/usr/bin/startsrc -s pcmsrv > /dev/null 2>&1


 
 
Notes on SDD and SDDPCM:
========================

Note 1:
-------

thread

Q +A:

> I've been reading IBM web sites and PDF manuals and still can't decide
> on exactly how to upgrade my AIX 4.3.3 machine to AIX 5.2 and have my
> ESS SDD vpath disks visible and working when I'm done.
>
> Has someone done this? Can you comment on my proposed method here?

Yes, I've done this.


> What I think I need to do is this:
>
> 1. Do the migration installation from 4.3.3 to 5. Question: Do I need to
> do anything to my ESS disks BEFORE migrating? Unmount? Vary off volume
> groups? Export volume groups?

Yes to all of the above, prior to upgrade. Uninstall SDD software.


> 2. After the migration, and reboot, I understand that the ESS disks will
> not "be there", since the migration does not upgrade the SDD (subsystem
> device driver) does NOT get upgraded. Question: Is this true?

Yes, the datapath devices will be gone because you deleted the SDD
software; IIRC, that is part of the un-install process. After your
upgrade, install SDD just like the first time. This will get you your
hdisks and vpaths back, though not necessarily with the same numbers; have
a 'lsvpcfg' from before your upgrade to cross-reference your new setup to.
'importvg' the VG(s) one at a time, using one of the hdisk's which
constitute the vpath, then run 'hd2vp' on the VG. That will convert the
VG back to using the vpath's.

Note: IIRC, If I Recall/Remember Correctly

>
> 3. Vary off all ESS volume groups, if I shouldn't have done this back in
> step 1.
>
> 4. Remove all the "datapath devices", via: rmdev -dl dpo -R
>
> 5. Uninstall the 4.3 version of the SDD.
>
> 6. Install the 5.2 version of the SDD.
>
> 7. Install the latest PTF of the 5.2 SDD, that they call version
> 1.5.1.3.
>
> 8. Reboot.
>
>
> If you can tell me how to make this procedure more nearly correct, I'd
> greatly appreciate it.


Note 2:
-------

thread

Q + A:

>
> I need a quick refresher here. I've got a HACMP (4.4) cluster with SAN- attached
> ESS storage. SDD is installed. Can I add volumes to one of these volume groups on
> the fly, or does HA need to be down? It's been awhile since I have done this and I
> can't quite remember if I have to jump through any hoops. Thanks for the help.

Should be relatively easy with no downtime required.
1) acquire the new disks on primary node (where the VG is in service) with: 

cfgmgr -Svl fcs0 
- repeat this for all fcs adapters in system
2) convert hdisks to vpaths, note use the smit screens for this because the commands
have changed from version to version.
3) add vpaths to VG with: extendvg4vp vgname vpath#
4) create LVs/filesystems on the vpaths.
5) break VG/scsi locks so that other systems can see the disks with: varyonvg
-b -u vgname
6) perform steps 1 & 2 for all failover nodes in the cluster.
7) refresh the VG definitions on all the failover nodes with: importvg -L
vgname vpath#
8) reestablish disk locks on service node with: varyonvg vgname
9) add new filesystems to HA configuration.
10) synchronise HA resources to the cluster.


Note 3:
-------

From IBM Doc SC30-4131-00:


hd2vp and vp2hd 

SDD provides two conversion scripts, hd2vp and vp2hd. 

The hd2vp script converts a volume group from supported storage device
hdisks to SDD vpath devices, and the vp2hd script converts a volume
group from SDD vpath devices to supported storage device hdisks. 

Use the vp2hd program when you want to configure your applications back
to original supported storage device hdisks, or when you want to remove
SDD from your AIX host system. 

The syntax for these conversion scripts is as follows:
hd2vp vgname 
vp2hd vgname 

vgname Specifies the volume group name to be converted.


Note 4:
-------

thread

Q:

Hi There, 
I want to add a vpath to running hacmp cluster with HACMP 5.1 on AIX 5.2 with Rotating Resource Group. 
If anyone has done it before then can provide a step by step procedure for this. Do i need to stop and start 
HACMP for this? 


A:

On Vg active node : 
#extendvg4vp vg00 vpath10 vpath11 
#smitty chfs ( Increase the f/s as required ) 
#varyonvg -bu vg00 ( this is to un-lock the vg) 

On Secondary node where vg is not active : 
# cfgmgr -vl fscsi0 ( fscsi1 and fcs0 and fcs1 ) 
Found new vpaths 
# chdev -l vpath10 -a pv=yes ( for vpath11 also ) 
# lsvg vg00|grep path ( just note down any one vpath which is from this o/p-for e.g vpath0 ) 
# importvg vg00 vpath0 

Once its fine...go to Primary Node 

# varyonvg vg00 ( Locking the VG ) 

Regards

Note 5:
-------

> HI,

> Is there a way to know dependencies between devices.
> For example,
> hdisk2 is attached to fscsi0 which in turn is attached to fcs0

> I have found nothing in lsdev's man
> Do I have to look in the odm directly

> I need this in order to improve a script

This is a good question and the lsdev man
page should be burned in front of the building
where they develop and document AIX in
Austin, TX, for not answering it for you.
After all, you bothered to read the damn
thing; why didn't it tell you?

$ /usr/sbin/lsdev -Cc adapter -F 'name parent'
ppa0 isa0
sa0 isa0
sa1 isa0
sa2 isa0
siokma0 isa0
fda0 isa0
scsi0 pci0
ent0 pci0
cxpa0 pci0
ent1 pci0
mga0 pci1
ent2 pci1
scsi1 pci2
sioka0 siokma0
sioma0 siokma0
ent3 pci0

There's also the lsparent command.

Regards,

Actually, I have the same question as Frederic and you have not
quite answered it. Sure, lsdev can tell you that "hdisk5" is
matched to "fcs0" . . . but what tells you that "fcs0" in turn
matches to "fscsi0"? And if "hdisk126" matches to adapter "fchan1",
how do I determine what that matches to? I've checked all of the
various lsxxxx commands but can't find this bit of info.

ONCE AGAIN the answer pops up just moments after announcing
to the world that "there's no way to do that" and "I've looked
everywhere and tried everything". Herewith the output from the
necessary commands, with extraneous lines removed:

# lsdev -C -c disk -F 'name location'
hdisk0 11-08-00-2,0
hdisk1 11-08-00-4,0
hdisk2 3A-08-01
hdisk3 3A-08-01
hdisk4 27-08-01
hdisk5 27-08-01


# lsdev -C -c driver -F 'name location'
fscsi0 27-08-01
fscsi1 3A-08-01

# lsdev -C -c adapter -F 'name location'
scsi0 11-08
scsi1 11-09
fcs0 27-08
mg20 2D-08
fcs1 3A-08
#

Obviously it is a simply matter to match disk to adapter to driver
by the location of each object. After that I can easily

sprintf(pathname, "/dev/%s", driver);
fp = open(pathname, O_RDONLY | O_NDELAY);
ioctl(fp, SCIOINQU, &info);

to get the scsi inquiry buffer.


Note 6:
-------

thread

Q:

where to fidnd a guide for the adapter (described  all its states, LED blinkging/lighting)

Adapter is cabled by SAN guys, they double checked it and when I run:

rmdev -Rl fcs0
cfgmgr -l fcs0
lsattr -El fscsi0 -l attach

I don't see "switch" but "none".


thx in advance.

A:

Did you check SAN Switch Zoning?

Regards,

Do something like:

rmdev -Rdl fscsi0
rmdev -dl fcnet0
rmdev -l fcs0
cfgmgr -l fcs0

rmdev -Rdl fscsi0

rmdev -Rdl fscsi1
rmdev -l fcs1

This way, the FC adapter re-negociates an FC fabric logon.

HTH,

I had already done something similiar but it didn't helped:

# lsslot -c slot|grep fcs0
U787B.001.DNWFFM5-P1-C4   Logical I/O Slot  pci4 fcs0
# rmdev -dl pci4 -R
fcnet0 deleted
fscsi0 deleted
fcs0 deleted
pci4 deleted
# cfgmgr
Method error (/usr/lib/methods/cfgefscsi -l fscsi0 ):
        0514-061 Cannot find a child device.
# lsattr -El fscsi0 -a attach
attach none How this adapter is CONNECTED False

the second FC is connected ok:
# lsattr -El fscsi1 -a attach
attach switch How this adapter is CONNECTED False
#

thx anyway,
I will ask my SAN team to check cables once more.
 

Note 7:
-------

thread

hdisk and vpath correspondance for IBM SAN (shark) 
Description

Correspondance between phsical disks:

4 hdisk = 1 vpath = 1 physical disk

To remove all vpaths run the command:

# rmdev -dl dpo -R

To remove all fibre channel disks (2 cards in this example):

# rmdev -dl fscsi0 -R
# rmdev -dl fscsi1 -R

To recreate the hdisks run the command:
# cfgmgr -vl fcs0
# cfgmgr -vl fcs1

To recreate the vpaths run the command:

# cfallvpath

To delete a device run this command:

# rmdev -l fcs1 -d 
Example

rmdev -dl dpo -R ; rmdev -dl fscsi0 -R ; cfgmgr -vl fcs0 ; cfallvpath 


Note 8:
-------

Technote (FAQ) 
  
Problem 
When non-root AIX users issue SDD datapath commands, the "No device file found" message results.  
  
Cause 
AIX SDD does not distinguish between file not found and invalid permissions.  
  
Solution 
Login as the root user or "su" to root user and re-execute command in order to obtain the desired SDD datapath 
command output.  


Note 9: 
-------

(thread ibm site)

Question:

Hi,

I have an AIX 5.3 server running with 2 FCs. One on a DS8300 and one on a DS4300.
On the server, i have a filesystems that is mounted and active (hdisks are from the DS8300). 
I can access it fine, write, delete etc...

Yet, when i do a "datapath query adapter" i get the following :

# datapath query adapter
Active Adapters :1
Adpt# Name State Mode Select Errors Paths Active
0 fscsi0 NORMAL ACTIVE 4111177 0 32 0

I would expect to see my 32 paths Active. I checked another server that has a similar configuration 
(though it only has 1 FC) and i can see 32 Paths, 32 Active...

Is it because of the other FC being connected to a DS4300?

Answer:

Hi.

The reason is that the vpaths are not part of a varied on volume group.
If you do a 'datapath query device' you should find all the paths will be 
state=closed.
If the vpaths are being used by a volume group, do a varyonvg xxxx.
Then display the datapath and the paths should be active.

Question:

Hi.

THanks, but as i mentionned in my original post, the VG is varied on and the FS is mounted. I ran the 
datapath command after i i varyonvg bkpvg and mount /backup. 
I then dumped a DB within the FS, deleted and everything else works...yet datapath query adapter shows 
no Active paths...weird...

Question:

Hi.

What version of SDD?
What does 'datapath query device' say?

Answer:

Version of SDD is 1.6.0.5
And a datapath query device shows :

...

DEV#: 14 DEVICE NAME: vpath14 TYPE: 2107900 POLICY: Optimized
SERIAL: 75AYYV111B7
===========================================================================
Path# Adapter/Hard Disk State Mode Select Errors
0 fscsi0/hdisk40 CLOSE NORMAL 147989 0
1 fscsi0/hdisk23 CLOSE NORMAL 0 0

DEV#: 15 DEVICE NAME: vpath15 TYPE: 2107900 POLICY: Optimized
SERIAL: 75AYYV111B8
===========================================================================
Path# Adapter/Hard Disk State Mode Select Errors
0 fscsi0/hdisk41 CLOSE NORMAL 155256 0
1 fscsi0/hdisk24 CLOSE NORMAL 0 0


yet, as i mentionned, my FS /backup is mounted and accessible... 


Note 10:
--------

thread

Q:

Hi All, 

I am having problems on a p570 on which there are 3 HBA cards. 
2 of the HBAs are connected via a SAN switch to an ESS 800. 
It appears only one of the "paths" to the ESS 800 is working 
As I only have one set of view of the disks on the ESS. 

Running cfgmgr on the adapter gives the following error. 

I have tried removing fscsi0 then unconfiguring fcs0, 
Then reconfiguring fcs0 but I still get the same error. 
Any ideas? Is there some command/utility I can run to verify 
The state of ths HBA? Thank you. 

bash-3.00# cfgmgr -l fcs0 
Method error (/usr/lib/methods/cfgefscsi -l fscsi0 ): 
0514-061 Cannot find a child device. 
bash-3.00# 

0514-061 Cannot find a child device 

A:

HI 

I have had the same problem using HDS SAN devices. 

AT that time I did not have the corect version off the device driver for the fiber cards in P570. 

For aix 5.2 
devices.pci.df1000fa >= 5.2.0.40 
For aix 5.3 
devices.pci.df1000f7 >= 5.3.0.10 

/HGA


Note 11:
--------

Greetings: 

The "0514-061 Cannot find a child device" is common when the FC card is either 
not attached to a FC device, or if it is attached, then I would look at the 
polarity of the cable 
ie. (tx -> rx and rx -> tx) NOT (tx -> tx and rx -> rx) 

cfgmgr is attempting to configure the FC device it is connected to (child 
device) but is unable to see it. 

In this context, device would be some sort of FC endpoint, not just a switch or 
director. 

I would make sure the FC card has connectivity to a FC device, not just the 
fabric and re-run cfgmgr. 


-=Patrick=- 


"Vincent D'Antonio, III" <dantoniov@COMCAST.NET> on 02/19/2003 01:51:24 PM 

Please respond to IBM AIX Discussion List <aix-l@Princeton.EDU> 

  To: aix-l@Princeton.EDU 
  cc: (bcc: Patrick Bigelbach/DSS) 
  Subject Re: Cannot cfgmgr on a new FC 

Put in your OS cd in the cdrom drive and run: 

cfgmgr -vi /dev/cd0 

this should load any filesets you need for the adapter if they are not 
already there. You should the adapter in lsdev -Cc adapter | grep fs. 

HTH 
Vince 

-----Original Message----- 
From: IBM AIX Discussion List [mailto:aix-l@Princeton.EDU] On Behalf Of 
Calderon, Linda 
Sent: Wednesday, February 19, 2003 10:12 AM 
To: aix-l@Princeton.EDU 
Subject: Cannot cfgmgr on a new FC 

I am trying to connect a new HBA on a P660 to a switch for a SAN. This HBA 
has not been used previously, newly cabled etc. I issued the following 
commands and receive the following errors: 

* rmdev -Rdl fsc1 

0514-519 The following device was not found in the customized device 
configuration database: name 'fcs1' 

* cfgmgr 

0514-061 Cannot find a child device 

Looking for ideas as to root cause. 


Note 12:
--------

thread

Q:

Hi All AIXers,
I am trying to add some vpath to Current Volume Group (which is on vpath)and i
am getting this error


Method Error (/usr/lib/methods/chgvpath):
0514-047 Cannot access a device

0516-1182 extendvg open failure on vpath3

0516-792 extendvg: Unable to estend a Volume Group

Do anybody have any idea about this error. I never seen this error before.
Thanks


A:

James,

If you're adding a vpath to a volume group that has other vpaths, you
will need to use extendvg4vp instead of extendvg.

Hope this helps!


Note 13:
--------

On Vg active node : 
#extendvg4vp vg00 vpath10 vpath11 
#smitty chfs ( Increase the f/s as required ) 
#varyonvg -bu vg00 ( this is to un-lock the vg) 

On Secondary node where vg is not active : 
# cfgmgr -vl fscsi0 ( fscsi1 and fcs0 and fcs1 ) 
Found new vpaths 
# chdev -l vpath10 -a pv=yes ( for vpath11 also ) 
# lsvg vg00|grep path ( just note down any one vpath which is from this o/p-for e.g vpath0 ) 
# importvg vg00 vpath0 

Once its fine...go to Primary Node 

# varyonvg vg00 ( Locking the VG ) 

Regards


Note 14:
--------

thread

How to add a a new PV into an existing concurrent mounted VG.

The PMR action plan suggests:

- stop of the resource group
- varyoffvg dummyvg
- varyonvg -nc dummyvg
- extendvg4vp dummyvg vpath0
- start of the resource group

as a backup action

- restart of the cluster
- extendvg4vp dummyvg vpath0
- start of the resource group

After a spech with the Country IBM referent we modify the action plan
in:

- stop of the cluster
- varyoffvg dummyvg
- varyonvg dummyvg
dummyvg should remain Enhanced Concurrent Capable, but I mount
it in normal mode to do the extentions
- extendvg4vp dummyvg vpath0
- importvg -L dummyvg disk on the other node of the cluster
- varyoffvg dummyvg
- cluster verification & syncro
- start of the cluster

Anyway before applying the modified action plan I try to follow the
original one, but with unpredictable return codes. With some vpaths
works, with someothers halfworks (update the VGDA, but not the odm),
with others return the original error.

In my opinion there is an high probability that the cause is in
gsclvmd...

So, a bit disappointed, I applied the modified plan.
All works and the extendvg4vp enlarged the dummyvg...
My machines are too downlevel and very full of lacks :-(

After that my curiosity pulls me to try the next step:

mirrorvg -s -c 2 dummyvg vpath0 vpath1
0516-1509 : VGDA corruption: physical partition info for this LV
is invalid.
0516-842 : Unable to make logical partition copies for logical
volume.
0516-1199 mirrorvg: Failed to create logical partition copies for
logical volume dummylv.
0516-1200 mirrorvg: Failed to mirror the volume group

Now, IBM support is working for analyze this new issue......

Regards.


Note 15: cfgmgr method errors:
------------------------------

1:
==

APAR status
Closed as program error.

Error description 
Users of the 64bit kernel may observe an error when cfgmgr is
invoked at runtime in the cfgsisscsi or cfgsisioa config
methods. Following is an example:
# cfgmgr
Method error (/usr/lib/methods/cfgsisscsi -l sisscsia0 ):
        0514-061 Cannot find a child device.

The error occurs in the cfgsisscsi or cfgsisioa routines
which automatically update the microcode on the adapter if
it is found to be at a level lower than the minimum supported
microcode level.

If the adapter was previously unconfigured, the adapter will
remain in the Defined state. A system reboot should make it
Available.

APAR information 
APAR number IY48873 
Reported component name AIX 5L POWER V5 
Reported component ID 5765E6200 
Reported release 520 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2003-09-19 
Closed date 2003-09-19 
Last modified date 2003-10-24 


Note 16: cfgmgr method errors:
------------------------------

Q:

cfgmgr error-- devices are reported twice
Asked by kuntal_acharyy... on 11/28/2005 6:15:00 AM  

I have an IBM DS4400 with two EXP 700s expansion units connected to a pSeries 650 with AIX 5.1.I have 
created two logical drives in the storage unit.When i run "cfgmgr" to recognise the new raw physical volume 
each disk is reported twice. 

hdisk4 Available 1n-08-01 1742 (700) Disk Array Device 
hdisk5 Available 1n-08-01 1742 (700) Disk Array Device 
hdisk6 Available 11-08-01 1742 (700) Disk Array Device 
hdisk7 Available 11-08-01 1742 (700) Disk Array Device 

There is an error message while running cfgmgr: 

Method error (/etc/methods/cfgfdar -l dar0 ): 
0514-002 Cannot initialize the ODM. 
cfgmgr: 0514-621 WARNING: The following device packages are required for 
device support but are not currently installed. 
devices.scsi 

What may have cause the problem ? 
How ca I solve this problem? 
Any advice is truly welcome. 

A:

hi, I had met the same problem just as 
yours. 3 LPARs(AIX 5300-02) on a p570 
connect FastT600(Ds4300) with 2 HBA cards each, using SAN fibre switch. 2 of the 
LPARs reported hdisk twice, and 1 of them 
reported normally. And I found that the HBA cards on the normal one are in the PCI 
Slots belong to different BUSs, and the HBA cards on unnormal ones are in the same 
BUSs. Then I changed HBA cards to different BUSs' slots, deleted all the dar 
dac and HBA cards in the system, and cfgmgr at last. The problem got solved. I guess there must be some thing wrong with 
the BUS design. Some one told me that he solved the problem by install the last 
patch (AIX 5300-03). So my advice is that 
you should chang the HBA cards to differet 
slots, clear the system and cfgmgr. Or 
maybe update your AIX with the last patch. 
Just try and tell me the result. Good luck!


Note 17: cfgmgr method errors:
------------------------------

ed.malina@uvm.edu (Ed) wrote in message news:<bb30127.0311120759.171bdc46@posting.google.com>... 
> I deleted a scsi device from my 4.3.3 configuration with the following 
> command: 
> rmdev -l scsi2 -dR 
> 
> The device is a dual channel ultra scsi 3 card. I deleted it to try 
> to resolve some performance problems with a drawer connected to the 
> device. Incidentally, scsi3 which is the other side of the dual 
> channel card, is working fine. 
> 
> When I try to reconfigure the device with: 
> cfgmgr -v -lscsi2 
> 
> I get the following error: 
> 
> Method error (/usr/lib/methods/cfgncr_scsi -l scsi2 ): 
> 0514-034 The following attributes do not have valid values: 
> 
> Any thoughts on how to fix it? For the timebeing I can't reboot the 
> machine. Would a reboot be able to resolve the problem if there is no 
> other solution? 
> 
> Thanks! 
> -- Ed 

#>> Ed, 


what you probably should do is run the cfgmgr comand without the 
device name behind it. Because you deleted the scsi device with the 
options -dR you also removed any child devices. 


try this: cfgmgr -v 


Note 18: cfgmgr method errors:
------------------------------

Q:

Hi... 

Does someone know what to do with an SDD driver which can't detect vpaths 
from an ESS F20 but hdisks are already available on AIX? 

showvpath, cfgvpath, datapath query commands don't display or found anything 

By the way, rebooting the system didn't help 

I accept any suggestions. 

Regards 

Luis A. Rojas

A:

Thank you all for your suggestions 

I solve the problem using the hd2vp command which converts the logical 
hdisk 
to its related vpath. And Wal? !.. vpaths suddenly were recognized by 
cfgvpath command. 

I don't know why this happened, but, everything is OK now. 

To those people with similar problems, please check these following 
commands: dpovgfix, hd2vp, vp2hd 

Best Regards 



Note 19: fget_config:
---------------------

how to show the current state and volume (hdisk) ownership in a IBM DS4000 
Description

The fget_config command shows the current state and volume (hdisk) ownership.

To display controllers and hdisks that are associated with a specified DS4000 (dar):

# fget_config

To display the state of each controller in a DS4000 array, and the current path that is being used 
for I/O for each hdisk:

# fget_config -A 
Example

fget_config -A 


Note 20:
--------

Q:

dpovgfix, hd2vp, vp2hd
Asked by RandallGoff on 1/23/2007 9:38:00 AM  

What filesets do dpovgfix, hd2vp and vp2hd belong to. I installed my sdd 
driver and can see everything but can't find these commands. 

A:

They are part of your SDD drivers. You probably installed the devices.xxx filesets. Did you also 
install the host attachment script... the ibm2105 filesets?


Note 21:
--------

thread

Q:

Hi 

I have several AIX LPARS running on SVC controlled disks. Right now i have SDD SW 1.6.1.2. After configuration 
i have some vpath devices that can be managed using the datapath command. 
Now in a recent training of SVC i was asked to install the new SDDPCM driver in order to get some of the benefits 
of this SW driver. 

SDDPCM does not use the concept of vpath anymore, instead a hdisk device object is created. 
This object has definitions and attributes in ODM files. 

Recently i had to change a faulty HBA under SDD drivers. I was able to: 

1- datapath query device: in order to check hdisk devices belonging to the faulty adaptr. 
2- datapath query adapter: in order to check the faulty adapter. 
3- datapath set adapter XX offline: in order to put the faulty HAB offline. 
4- datapath remove adapter XX 
5- Used the diag Hot Plug option to remove the PCI-x HBA and install a new one. 
   Configured the system and modified the corresponden zone. 

How to do the same with SDDPCM even when there's no concept of vpath anymore. 

Thanks in advanced

A:

Hello , 
You can do the same with sddpcm , either using the MPIO commands or smitty screens , smitty devices ---> MPIO devices 
there you can list paths , remove paths , adapters. 
IN the SDD user guide there is a complete section describing what you can do , but same functions you use 
for the vpath , you can use for sddpcm. 
Here is the link for the latest user guide 
http://www-1.ibm.com/support/docview.wss?rsP3&con text=ST52G7&dc=DA490&dc=DA4A30&dc=DA480&dc=D700&dc =DA410&dc=DA4A20&dc=DA460&dc=DA470&dc=DA400&uid=ss g1 S7000303&loc=en_US&cs=utf-8&lang=en


Note 22:
--------

thread

Q:

Greetings: 

Has anyone encountered the 0516-1182 ( mkvg: Open Failure on vpath ) or 
0516-826 ( mkvg: Unable to create volume group ) 
errors while trying to create a new volume group ? 

I attempted to create a new volume group using a couple of newly added 
vpath devices and received 
those errors. 

Any help will be greatly appreciated. 

Thanks in advance. 

Jay. 

A:

Hi 

If using vpath devices then you can confirm that you can open any given device by running: 

datapath query device 

and confirm there's no error in the HBA communications. 

Also you can review the errpt reports in order to look for VPATH OPEN messages. You can also use 
the lquerypr command in order to check for SCSI reservations in the SAN box previously set 
by another host (in case of a cluster). 

Hope this helps


Example lquerypr output

# lquerypr -Vh /dev/hdisk12
connection type: fscsi1
open dev: /dev/hdisk12

Attempt to read reservation key...

Attempt to read registration keys...
Read Keys parameter
        Generation :  52
        Additional Length:  32
        Key0 :  c8ca9d09
        Key1 :  c8ca9d09
        Key2 :  c8cabd09
        Key3 :  c8cabd09
Reserve Key provided by current host = c8cabd09
Not reserved.




Note 23:
--------

thread

Q:

All, 

I'm in the process of preparing for our upcoming disaster recovery exercise 
which is happening in a few weeks. Our plan is to create one big volume 
group, instead of a bunch of little ones like we have in our production 
environment, to try and save some time. 

My question is, is there a way to script using a for/next loop to assign 
each hdisk/vpath when creating a new volume group instead of going into smit 
and assigning them one by one by hand? The hdisks will be sequential and 
will probably be over a hundred in number so you can imagine how tedious 
this will be. Also, this will need to be bigvg enabled. 

Any of you scripters out there have any suggestions? Thanks for your help in 
advance!


A:

Create the VG 
>mkvg -B -y datavg vpathN 

Extend it 
for i in `lspv | grep vpath | grep None | awk '{print #1}'` 
do 
extendvg datavg $i 
done 

That would assign all unused vpaths to the VG. BTW Use the vpath and 
not the hdisk. You could add a count into it to limit the number of 
disks you assign.


Note 24:
--------

thread

Q:

Is anyone aware of a problem if i do a

cfgmgr -vl dp0
and once the vpaths are made
it shows as
vpathxx none None

and then i add the vpath to VG

#extendvg VGname vpathxx

Does this create a problem ?

A:

it sound like the vpath is showing correctly after cfgmgr so thats OK.
But you need to use extendvg4vp and not just extendvg
Do a 'smitty vg' and choose
'Add a Data Path Volume to a Volume Group'

Once its added to a VG then it will show more info in lspv



Note 25: cfgmgr Method error (/usr/sbin/fcppcmmap > /etc/essmap.out):
---------------------------------------------------------------------

Method error (/usr/sbin/fcppcmmap > /etc/essmap.out):
        0514-001 System error:







Note 26: mkpath, lspath commands:
---------------------------------

Examples mkpath:

--To define and configure an already defined path between scsi0 and the hdisk1 device at SCSI ID 5 
and LUN 0 (i.e., connection 5,0), enter: 
# mkpath -l hdisk1 -p scsi0 -w 5,0

The system displays a message similar to the following: 
path available

--To configure an already defined path from 'fscsi0' to fiber channel disk 'hdisk1', the command would be: 
# mkpath -l hdisk1 -p fscsi0

The message would look similar to: 
path available

--To only add to the Customized Paths object class a path definition between scsi0 and the hdisk1 disk device 
at SCSI ID 5 and LUN 0, enter: 
# mkpath -d -l hdisk1 -p scsi0 -w 5,0

The system displays a message similar to the following: 
path defined


Examples lspath:

lspath displays information about paths to an MultiPath I/O (MPIO) capable device.

Examples of displaying path status:

-- To display the status of all paths to hdisk1 with column headers, enter: 
# lspath -H -l hdisk1

The system will display a message similar to the following: 
status    device   parent
enabled   hdisk1   scsi0
disabled  hdisk1   scsi1
missing   hdisk1   scsi2

-- To display, without column headers, the set of paths whose operational status is disabled, enter: 
# lspath -s disabled

The system will display a message similar to the following: 
disabled  hdisk1   scsi1
disabled  hdisk2   scsi1
disabled  hdisk23  scsi8
disabled  hdisk25  scsi8

--To display the set of paths whose operational status is failed, enter: 
# lspath -s failed

The system will display a message similar to the following: 
failed  hdisk1   scsi1
failed  hdisk2   scsi1
failed  hdisk23  scsi8
failed  hdisk25  scsi8

-- To display in a user-specified format, without column headers, the set of paths to hdisk1 whose path status 
is available enter: 
# lspath -l hdisk1 -s available -F"connection:parent:path_status:status"

The system will display a message similar to the following: 
5,0:scsi0:available:enabled
6,0:scsi1:available:disabled

Note that this output shows both the path status and the operational status of the device. 
The path status simply indicates whether the path is configured or not. The operational status indicates 
how the path is being used with respect to path selection processing in the device driver. 
Only paths with a path status of available also have an operational status. If a path is not currently configured 
into the device driver, it does not have an operational status.
Examples of displaying path attributes:

--If the target device is a SCSI disk, to display all attributes for the path to parent scsi0 at connection 5,0, 
use the command: 
# lspath -AHE -l hdisk10 -p scsi0 -w "5,0"
The system will display a message similar to the following: 
attribute  value  description                       user_settable
weight     1      Order of path failover selection  true


Note 26: About FastT and DS Storage:
------------------------------------

IBM TotalStorager FAStT has been renamed IBM TotalStorage DS4000 series 

DS4100 formerly FAStT100

DS4300 formerly FAStT600

DS4300 Turbo formerly FAStT600 Turbo

DS4400 formerly FAStT700

DS4500 formerly FAStT900


Note 27: from GPFS FAQ: 
-----------------------

Q20:

What's the difference between using an ESS with or without SDD or SDDPCM installed on the host? 

A20: 
The use of SDD or SDDPCM gives the AIX host the ability to access multiple paths to a single LUN 
within an ESS. This ability to access a single LUN on multiple paths allows for a higher degree of 
data availability in the event of a path failure. Data can continue to be accessed within the ESS 
as long as there is at least one available path. Without one of these installed, you will lose access 
to the LUN in the event of a path failure. 
However, your choice of whether to use SDD or SDDPCM impacts your ability to use single-node quourm:

Single-node quorum is not supported if SDD is installed. 
Single-node quorum is support if SDDPCM is installed.
To determine the GPFS disk support guidelines for SDD and SDDPCM for your cluster type, see

Q3: What disk support guidelines must be followed when running GPFS in an sp cluster type? 
Q6: What disk support guidelines must be followed when running GPFS in an rpd cluster type? 
Q9:What are the disk support guidelines that must be followed when running GPFS in an hacmp cluster type


Note 28: changing attributes of a fcs0 device:
----------------------------------------------

Examples:

# chdev -l fscsi0 -a fc_err_recov=fast_fail
# chdev -l fscsi0 -a dyntrk=yes

Display attributes:

# lsattr -El fscsi0

attach       switch       How this adapter is CONNECTED         False
dyntrk       no           Dynamic Tracking of FC Devices        True
fc_err_recov fast_fail    FC Fabric Event Error RECOVERY Policy True
scsi_id      0x741113     Adapter SCSI ID                       False
sw_fc_class  3            FC Class for Fabric                   True




Note 29: Flash alerts:
----------------------


IBM Flash Alert on AIX migration with vpaths:
---------------------------------------------

http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S1002295&loc=en_US&cs=utf-8&lang=en

All hdisks and vpath devices must be removed from host system before upgrading to SDD host attachment script 
32.6.100.21 and above. All MPIO hdisks must be removed from host system before upgrading to SDDPCM host attachment 
script 33.6.100.9. 
 Flash (Alert) 
  
Abstract 
When upgrading from SDDPCM host attachment script devices.fcp.disk.ibm2105.mpio.rte version 33.6.100.8 or below 
to 33.6.100.9, all SDDPCM MPIO hdisks must be removed from the AIX host system before the upgrade. 
When upgrading from SDD host attachment script ibm2105.rte version 32.6.100.18 or below to 32.6.100.21 or later, 
all AIX hdisks and SDD vpath devices must be removed from the AIX host system before the upgrade.  
  
Content 
Please note that this document contains the following sections:


Problem description, symptoms, and information 
SDD/host attachment upgrade procedures 
Recovery procedures should the ODM become corrupted 
Recovery procedures should the associations become corrupted 
Procedures for upgrading if rootvg is on an ESS disk

- Problem description, symptoms, and information:

Starting with SDDPCM host attachment script devices.fcp.disk.ibm2105.mpio.rte version 33.6.100.9 and 
SDD host attachment script ibm2105.rte version 32.6.100.21, ESS FCP devices are configured as "IBM MPIO FC 2105" 
for MPIO devices, and "IBM FC 2105" for ESS devices. This information can be seen in the "lsdev -Cc disk" output. 
Prior to these host attachment script versions, ESS FCP devices were configured as "IBM MPIO FC 2105XXX" for 
MPIO devices and "IBM FC 2105XXX" for ESS devices, where 'XXX' is the ESS device module, such as F20 or 800. 

If a host system is upgraded without removing all of the hdisks first, then the AIX host system ODM will 
be corrupted. Additionally, if all he hdisks are removed without removing all SDD vpath devices, 
then the associations between an SDD vpath device and its hdisks may be corrupted because the hdisk's device 
minor number may change after reconfiguration. The ODM corruption may look something like the following in the 
"lsdev -Cc disk" output:

# lsdev -Cc disk
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device hdisk1.
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device hdisk2.
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device hdisk3.
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device hdisk4.
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device hdisk5.
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device hdisk6.
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device hdisk7.
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device hdisk8.
hdisk0 Available 10-60-00-8,0 16 Bit SCSI Disk Drive
hdisk1 Available 20-60-01 N/A
hdisk2 Available 20-60-01 N/A
hdisk3 Available 20-60-01 N/A
hdisk4 Available 20-60-01 N/A
hdisk5 Available 20-60-01 N/A
hdisk6 Available 20-60-01 N/A
hdisk7 Available 20-60-01 N/A
hdisk8 Available 20-60-01 N/A

- SDD/host attachment upgrade procedures:

In order to prevent ODM corruption and vpath/hdisk association corruption, all hdisks and SDD vpath devices 
must be removed prior to the upgrade. The following procedure should be used when you want to upgrade:

- AIX OS only*
- Host attachment + AIX OS*
- SDD + AIX OS*
- Host attachment + SDD
- Host attachment only
- SDD + Host attachment + AIX OS*

* Upgrading the AIX OS will always require you to install the SDD which corresponds to the new AIX OS level.

To upgrade SDD only, follow the procedure in the SDD User's Guide.

1. Ensure rootvg is on local scsi disks. If this is not possible, see "Procedures for upgrading if rootvg is on 
   an ESS disk" below.
2. Stop all applications running on SDD Volume Groups/File Systems.
3. Unmount all File Systems of SDD volume group.
4. Varyoff all SDD volume groups.
5. If upgrading OS, save output of lspv command to remember pvids of VGs.
6. If upgrading OS, export volume groups with exportvg.
7. Remove SDD vpath devices with rmdev command.
8. Remove 2105 hdisk devices with rmdev command.
9. If upgrading OS, run 'stopsrc -s sddsrv' to stop sddsrv daemon.
10. If upgrading OS, uninstall SDD.
11. If required, upgrade ibm2105.rte. The recommended version is 32.6.100.18 if support for ESS model 750 is 
    not needed. Version 32.6.100.21 is required to support ESS model 750.
12. If upgrading OS, migrate AIX OS level.
13. If OS upgraded, boot to new AIX level with no disk groups online except rootvg, which is on local scsi disks. 
    /* reboot will automatically start at the end of migration */
14. If OS upgraded, install SDD for the new OS level. Otherwise, if required, upgrade SDD.
15. If OS not upgraded, configure hdisks with the 'cfgmgr -vl fcsX' command.
16. Configure SDD vpath devices by running 'cfallvpath'.
17. If OS upgraded, use lspv command to find out one physical volume which has a pvid matching the previous 
    SDD VG's pv.

Example:
===================================================
Previous lspv output (from step 4):
hdisk0 000bc67da3945d3c None 
hdisk1 000bc67d531c699f rootvg active
hdisk2 none None 
hdisk3 none None 
hdisk4 none None 
hdisk5 none None 
hdisk6 none None 
hdisk7 none None 
hdisk8 none None 
hdisk9 none None 
hdisk10 none None 
hdisk11 none None 
hdisk12 none None 
hdisk13 none None 
hdisk14 none None 
hdisk15 none None 
hdisk16 none None 
hdisk17 none None 
hdisk18 none None 
hdisk19 none None 
hdisk20 none None 
hdisk21 none None 
vpath0 000bc67d318fb8ea SDDVG0 
vpath1 000bc67d318fde50 SDDVG1 
vpath2 000bc67d318ffbb0 SDDVG2 
vpath3 000bc67d319018f3 SDDVG3 
vpath4 000bc67d319035b2 SDDVG4
Current lspv output (from this step):
hdisk0 000bc67da3945d3c None 
hdisk1 000bc67d531c699f rootvg active
hdisk2 000bc67d318fb8ea None 
hdisk3 000bc67d318fde50 None 
hdisk4 000bc67d318ffbb0 None 
hdisk5 000bc67d319018f3 None 
hdisk6 000bc67d319035b2 None 
hdisk7 000bc67d318fb8ea None 
hdisk8 000bc67d318fde50 None 
hdisk9 000bc67d318ffbb0 None 
hdisk10 000bc67d319018f3 None 
hdisk11 000bc67d319035b2 None 
hdisk12 000bc67d318fb8ea None 
hdisk13 000bc67d318fde50 None 
hdisk14 000bc67d318ffbb0 None 
hdisk15 000bc67d319018f3 None 
hdisk16 000bc67d319035b2 None 
hdisk17 000bc67d318fb8ea None 
hdisk18 000bc67d318fde50 None 
hdisk19 000bc67d318ffbb0 None 
hdisk20 000bc67d319018f3 None 
hdisk21 000bc67d319035b2 None 
vpath0 none None 
vpath1 none None 
vpath2 none None 
vpath3 none None 
vpath4 none None 

In this case, hdisk2, hdisk7, hdisk12, and hdisk17 from the current lspv output
has the pvid which matches the pvid of SDDVG0 from the previous lspv output. 
So, use either hdisk2, hdisk7, hdisk12, or hdisk17 to import the volume group 
with the name SDDVG0

18. Run hd2vp on all SDD volume groups.
19. Vary on all SDD volume groups.
20. Mount all file system back.

- Recovery procedures should the ODM become corrupted:

If the host system's ODM is already corrupted as a result of upgrading without removing the hdisks, 
please contact IBM Customer Support at 1-800-IBM-SERV to request a script to fix the corrupted ODM. 

- Recovery procedures should the associations become corrupted:

If vpath/hdisk association corruption has occurred because hdisks were removed without removing SDD vpath devices, 
all SDD vpath devices must be removed and reconfigured in order to correct this corrupted association.

- Procedures for upgrading if rootvg is on an ESS disk:

If rootvg is on an ESS device and cannot be moved to local scsi disks, all hdisks cannot be removed prior 
to the upgrade. In this case, the following procedure should be used to upgrade the SDD host attachment script 
to version 32.6.100.21 or later:

. Contact IBM Customer Support at 1-800-IBM-SERV to request a script to fix the corrupted ODM referenced above. 
. Without removing ESS hdisks, use smitty to upgrade the SDD host attachment script on the host system. 
. Immediately run the script to fix the corrupted ODM on the host system. 
. Run bosboot on the host system. 
. Reboot the host system so that the hdisks can be configured with the new ODM attributes. 
. Return to the "SDD/host attachment upgrade procedures" above and follow the appropriate upgrade steps now that 
  the SDD host attachment script upgrade is complete. 

This issue only occurs when upgrading to devices.fcp.disk.ibm2105.mpio.rte version 33.6.100.9 and SDD host 
attachment script ibm2105.rte version 32.6.100.21 and above.  
  
 


IBM Flash Alert: SDD 1.6.2.0 requires minimum AIX code levels; possible 0514-035 error:
---------------------------------------------------------------------------------------
 Flash (Alert) 
  
Abstract 
SDD 1.6.2.0 requires minimum AIX code levels. Not upgrading to correct AIX version and level can result in 
0514-035 error when attempting removal of dpo or vpath device  
  
Content 
Starting from SDD version 1.6.2.0, a unique ID attribute is added to SDD vpath devices, in order to 
support AIX5.3 VIO future features. AIX device configure methods have been changed in both AIX52 TL8 and 
AIX53 TL4 for this support.

Following are the requirements for this version of SDD with:

AIX5.2 and AIX5.3:  
AIX52 TL8 & above with PTF U804193 (IY76991)
AIX53 TL4 & above with PTF U804397 (IY76997)

Please view 1.6.2.0 readme for further details

If upgraded to SDD 1.6.2.0 and above without first upgrading AIX to the levels listed above the following error 
will be experienced when attempting to remove any vpath devices using the:

# rmdev -dl dpo -R

or the 

# rmdev -dl vpathX command.
                                                   
Method error (/usr/lib/methods/ucfgdevice):                           
0514-035 Cannot perform the requested function because of missing predefined information in the device 
configuration database. 

Solution:
1) Upgrade AIX to correct level and ptf, or
2) Contact SDD support at 1-800-IBM-SERV for steps to clean up ODM to allow for downgrading the SDD level 
   from 1.6.2.0, if unable to upgrade AIX to a newer technology level.  
 


Note 30:
--------

Suppose the following happens:

# rmdev -dRl fcs0

fcnet0 deleted
fscsi0 deleted
fcs0 deleted

# cfgmgr

Method error (/usr/lib/methods/cfgefscsi -l fscsi0 ):
        0514-061 Cannot find a child device.

root@n5114l02:/root#
adapter checked with several commands
connection with san seems impossible.
root@n5114l02:/root#lsattr -El fscsi0
attach       none         How this adapter is CONNECTED         False
dyntrk       no           Dynamic Tracking of FC Devices        True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id                   Adapter SCSI ID                       False
sw_fc_class  3            FC Class for Fabric                   True


Note 31:
--------

IY83872: AFTER CHVG -T, VG IS IN INCONSISTENT STATE 

 A fix is available 
Obtain fix for this APAR
 


APAR status
Closed as program error.

Error description 
#---------------------------------------------------
chvg -t renumber pvs that have pv numbers greater than
maxpvs with the new factor. chvg -t is only updating the
new pv_num in lvmrec and not updating the VGDA.
chvg -t leaves the vg is inconsistent state and any changes to
vg may get unpredictable results like a system crash.
Local fix 
Problem summary 
#---------------------------------------------------
chvg -t renumber pvs that have pv numbers greater than
maxpvs with the new factor. chvg -t is only updating the
new pv_num in lvmrec and not updating the VGDA.
chvg -t leaves the vg is inconsistent state and any changes to
vg may get unpredictable results like a system crash.
Problem conclusion 
Fix chvg -t to update the VGDA with the new pv number.
Add a check in hd_kextendlv to make sure that the pvol we
are trying to access is not null.
Temporary fix 
Comments 
APAR information 
APAR number IY83872 
Reported component name AIX 5.3 
Reported component ID 5765G0300 
Reported release 530 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2006-04-11 
Closed date 2006-04-11 
Last modified date 2006-05-03 

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Publications Referenced


Fix information 
Fixed component name AIX 5.3 
Fixed component ID 5765G0300 

Applicable component levels 
R530 PSY U805071    UP06/05/03 I 1000 
 

 
Note 32:
========


ESB-2008.0267 -- [AIX] -- AIX Logical Volume Manager buffer overflow 

--------------------------------------------------------------------------------
 
Date: 14 March 2008 
AusCERT Reference #: ESB-2008.0267

Click here for printable version 
Click here for PGP verifiable version

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

===========================================================================
             AUSCERT External Security Bulletin Redistribution

                          ESB-2008.0267 -- [AIX]
                AIX Logical Volume Manager buffer overflow
                               14 March 2008

===========================================================================

        AusCERT Security Bulletin Summary
        ---------------------------------

Product:              AIX 5.2
                      AIX 5.3
Publisher:            IBM
Operating System:     AIX
Impact:               Root Compromise
Access:               Existing Account

Original Bulletin:    
http://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd?mode=18&ID=4169

- --------------------------BEGIN INCLUDED TEXT--------------------

- -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

IBM SECURITY ADVISORY

First Issued: Tue Jan 22 14:02:18 CST 2008
| Updated: Tue Mar 11 12:55:14 CDT 2008
| IZ10828 availablity updated
===============================================================================
                           VULNERABILITY SUMMARY

VULNERABILITY:   AIX Logical Volume Manager buffer overflow

PLATFORMS:       AIX 5.2, 5.3

SOLUTION:        Apply the fix or workaround as described below.

THREAT:          A local attacker may execute arbitrary code with root
                 privileges.

CERT VU Number:  n/a
CVE Number:      n/a
===============================================================================
                           DETAILED INFORMATION

I. OVERVIEW

    The AIX Logical Volume Manager provides a suite of utilities for
    AIX logical volume management features and functions. The primary
    fileset for the AIX Logical Volume Manager is 'bos.rte.lvm'. In
    addition, AIX provides another suite of utilities for concurrent
    logical volume management across multiple hosts.  The primary
    fileset for the AIX Concurrent Logical Volume Manager is
    'bos.clvm.enh'. Several imporant commands provided by these
    filesets for performing various logical volume management tasks
    have been identified as containing buffer overflow
    vulnerabilities.

II. DESCRIPTION

    Buffer overflow vulnerabilities exist in the 'bos.rte.lvm' and
    'bos.clvm.enh' fileset commands listed below.  A local attacker
    may execute arbitrary code with root privileges because the
    commands are setuid root.  The local attacker must be a member of
    the 'system' group to execute these commands.

    The following 'bos.rte.lvm' commands are vulnerable:

        /usr/sbin/lchangevg
        /usr/sbin/ldeletepv
        /usr/sbin/putlvodm
        /usr/sbin/lvaryoffvg
        /usr/sbin/lvgenminor

    The following 'bos.clvm.enh' command is vulnerable:

        /usr/sbin/tellclvmd

III. IMPACT

    The successful exploitation of this vulnerability allows a
    non-privileged user to execute code with root privileges.

IV. PLATFORM VULNERABILITY ASSESSMENT

    To determine if your system is vulnerable, execute the following
    command:

    lslpp -L bos.rte.lvm bos.clvm.enh

    The following fileset levels are vulnerable:

    AIX Fileset        Lower Level       Upper Level
    ------------------------------------------------
    bos.rte.lvm        5.2.0.0           5.2.0.107
    bos.rte.lvm        5.3.0.0           5.3.0.61
    bos.clvm.enh       5.2.0.0           5.2.0.105
    bos.clvm.enh       5.3.0.0           5.3.0.60

V. SOLUTIONS

    A. APARS

        IBM provides the following fixes:

        AIX Level           APAR number        Availability
        -----------------------------------------------------
        5.2.0               IZ00559            (available now)
|       5.2.0               IZ10828            05/07/2008
        5.3.0               IY98331            (available now)
        5.3.0               IY98340            (available now)
        5.3.0               IY99537            (available now)

        Subscribe to the APARs here:

        http://www.ibm.com/support/docview.wss?uid=isg1IZ00559
        http://www.ibm.com/support/docview.wss?uid=isg1IZ10828
        http://www.ibm.com/support/docview.wss?uid=isg1IY98331
        http://www.ibm.com/support/docview.wss?uid=isg1IY98340
        http://www.ibm.com/support/docview.wss?uid=isg1IY99537

        By subscribing, you will receive periodic email alerting you
        to the status of the APAR, and a link to download the fix once
        it becomes available.

    B. FIXES

        Fixes are available.  The fixes can be downloaded via ftp
        from:

        ftp://aix.software.ibm.com/aix/efixes/security/lvm_ifix.tar

        The link above is to a tar file containing this signed
        advisory, fix packages, and PGP signatures for each package.
        The fixes below include prerequisite checking. This will
        enforce the correct mapping between the fixes and AIX
        Technology Levels.

        AIX Fileset         AIX Level            Fix and Interim Fix
        -----------------------------------------------------------------
        bos.lvm.rte         5200-08              IZ10828_08.071212.epkg.Z
        bos.lvm.rte         5200-08              IZ00559_8a.071212.epkg.Z
        bos.clvm.enh        5200-08              IZ00559_8b.071212.epkg.Z

        bos.lvm.rte         5200-09              IZ10828_09.071212.epkg.Z
        bos.lvm.rte         5200-09              IZ00559_9a.071211.epkg.Z
        bos.clvm.enh        5200-09              IZ00559_9b.071211.epkg.Z

        bos.lvm.rte         5200-10              IZ10828_10.071212.epkg.Z
        bos.lvm.rte         5200-10              bos.rte.lvm.5.2.0.107.U
        bos.clvm.enh        5200-10              bos.clvm.enh.5.2.0.107.U

        bos.lvm.rte         5300-05              IY98331_05.071212.epkg.Z
        bos.lvm.rte         5300-05              IY99537_05.071212.epkg.Z
        bos.lvm.rte         5300-05              IY98340_5a.071211.epkg.Z
        bos.clvm.enh        5300-05              IY98340_5b.071211.epkg.Z

        bos.lvm.rte         5300-06              bos.rte.lvm.5.3.0.63.U
        bos.clvm.enh        5300-06              bos.clvm.enh.5.3.0.61.U

        To extract the fixes from the tar file:

        tar xvf lvm_ifix.tar
        cd lvm_ifix

        Verify you have retrieved the fixes intact:

        The checksums below were generated using the "sum", "cksum",
        "csum -h MD5" (md5sum), and "csum -h SHA1" (sha1sum) commands
        and are as follows:

        sum         filename
        ------------------------------------
        14660    17 IY98331_05.071212.epkg.Z
        26095     9 IY98340_5a.071211.epkg.Z
        40761     8 IY98340_5b.071211.epkg.Z
        10885    16 IY99537_05.071212.epkg.Z
        24909    10 IZ00559_8a.071212.epkg.Z
        64769     9 IZ00559_8b.071212.epkg.Z
        65110    10 IZ00559_9a.071211.epkg.Z
        25389     9 IZ00559_9b.071211.epkg.Z
        26812    26 IZ10828_08.071212.epkg.Z
        55064    26 IZ10828_09.071212.epkg.Z
        55484    26 IZ10828_10.071212.epkg.Z
        03885   157 bos.clvm.enh.5.2.0.107.U
        30581   128 bos.clvm.enh.5.3.0.61.U
        48971  1989 bos.rte.lvm.5.2.0.107.U
        64179  2603 bos.rte.lvm.5.3.0.63.U

        cksum              filename
        -------------------------------------------
        3121912357 16875   IY98331_05.071212.epkg.Z
        107751313  9190    IY98340_5a.071211.epkg.Z
        1129637178 7735    IY98340_5b.071211.epkg.Z
        4019303479 16201   IY99537_05.071212.epkg.Z
        1791374386 9289    IZ00559_8a.071212.epkg.Z
        3287090389 8299    IZ00559_8b.071212.epkg.Z
        565672617  9294    IZ00559_9a.071211.epkg.Z
        257555679  8302    IZ00559_9b.071211.epkg.Z
        3930477686 26525   IZ10828_08.071212.epkg.Z
        1199269029 26533   IZ10828_09.071212.epkg.Z
        358657844  26480   IZ10828_10.071212.epkg.Z
        3753492719 160768  bos.clvm.enh.5.2.0.107.U
        4180839749 131072  bos.clvm.enh.5.3.0.61.U
        3765659627 2036736 bos.rte.lvm.5.2.0.107.U
        3338925192 2665472 bos.rte.lvm.5.3.0.63.U

        csum -h MD5 (md5sum)              filename
        ----------------------------------------------------------
        73bcf7604dd13f26a7500e45468ff5f7  IY98331_05.071212.epkg.Z
        5f32179fc2156bb6e29e775aa7bff623  IY98340_5a.071211.epkg.Z
        7c47e56cadabcba0a105ffa7fc1d40fc  IY98340_5b.071211.epkg.Z
        ef3e4512c3b55091893ce733c707e1a2  IY99537_05.071212.epkg.Z
        db04be33e56169b6a8e8fd747e6948da  IZ00559_8a.071212.epkg.Z
        553f31ccf6a265333938d81eeae6dabc  IZ00559_8b.071212.epkg.Z
        2921b9d2a3dbd84591d60fddf0663798  IZ00559_9a.071211.epkg.Z
        93ce34dec8f4fa9681a2c7c86be065fc  IZ00559_9b.071211.epkg.Z
        e6b0a4a91ba197de0005bd800f06ba4e  IZ10828_08.071212.epkg.Z
        602a8c777cc27e51c3d3dbfa8ebd69be  IZ10828_09.071212.epkg.Z
        b84a5cae03921d30675e522da29da1aa  IZ10828_10.071212.epkg.Z
        2aa4b9b43ca55f74b0fac6be7bc48b66  bos.clvm.enh.5.2.0.107.U
        844e1f2ef9d388d2ddd8cf3ef6251f06  bos.clvm.enh.5.3.0.61.U
        0c73aa8f0211c400455feaa6fb8a95c4  bos.rte.lvm.5.2.0.107.U
        1b5a08eabe984d957db9a145e2a4fd06  bos.rte.lvm.5.3.0.63.U

        csum -h SHA1 (sha1sum)                    filename
        ------------------------------------------------------------------
        d9929214a4d85b986fb2e06c9b265c768c7178a9  IY98331_05.071212.epkg.Z
        0f5fbcdfbbbf505366dad160c8dec1c1ce75285e  IY98340_5a.071211.epkg.Z
        cf2cda3b8d19b73d06b69eeec7e4bae192bec689  IY98340_5b.071211.epkg.Z
        9d8727b5733bc34b8daba267b82864ef17b7156f  IY99537_05.071212.epkg.Z
        e7a366956ae7a08deb93cbd52bbbbf451d0f5565  IZ00559_8a.071212.epkg.Z
        1898733cdf6098e4f54ec36132a03ebbe0682a7e  IZ00559_8b.071212.epkg.Z
        f68c458c817f99730b193ecbd02ae24b9e51cc67  IZ00559_9a.071211.epkg.Z
        185954838c439a3c7f8e5b769aa6cc7d31123b59  IZ00559_9b.071211.epkg.Z
        6244138dc98f3fd16928b2bbcba3c5b4734e9942  IZ10828_08.071212.epkg.Z
        98bfaf44ba4bc6eba452ea074e276b8e87b41c9d  IZ10828_09.071212.epkg.Z
        2a9c0dd75bc79eba153d0a4e966d930151121d45  IZ10828_10.071212.epkg.Z
        96706ec5afd792852350d433d1bf8d8981b67336  bos.clvm.enh.5.2.0.107.U
        91f6d3a4d9ffd15d258f4bda51594dbce7011d8a  bos.clvm.enh.5.3.0.61.U
        4589a5bca998f437aac5c3bc2c222eaa51490dab  bos.rte.lvm.5.2.0.107.U
        3449afd795c24594c7a0c496f225c7148b4071ab  bos.rte.lvm.5.3.0.63.U

        To verify the sums, use the text of this advisory as input to
        csum, md5sum, or sha1sum. For example:

        csum -h SHA1 -i Advisory.asc
        md5sum -c Advisory.asc
        sha1sum -c Advisory.asc

        These sums should match exactly. The PGP signatures in the tar
        file and on this advisory can also be used to verify the
        integrity of the fixes.  If the sums or signatures cannot be
        confirmed, contact IBM AIX Security at
        security-alert@austin.ibm.com and describe the discrepancy.

     C. FIX AND INTERIM FIX INSTALLATION

        IMPORTANT: If possible, it is recommended that a mksysb backup
        of the system be created.  Verify it is both bootable and
        readable before proceeding.

        To preview a fix installation:

        installp -a -d . -p all

        To install a fix package:

        installp -a -d . -X all

        Interim fixes have had limited functional and regression
        testing but not the full regression testing that takes place
        for Service Packs; thus, IBM does not warrant the fully
        correct functionality of an interim fix.

        Interim fix management documentation can be found at:

        http://www14.software.ibm.com/webapp/set2/sas/f/aix.efixmgmt/home.html

        To preview an interim fix installation:

        emgr -e ipkg_name -p         # where ipkg_name is the name of the
                                     # interim fix package being previewed.

        To install an interim fix package:

        emgr -e ipkg_name -X         # where ipkg_name is the name of the
                                     # interim fix package being installed.

VI. WORKAROUNDS

    There are two workarounds available.

    A. OPTION 1

        Change the permissions of these commands to remove the setuid
        bit using the following commands:

        chmod 500 /usr/sbin/lchangevg
        chmod 500 /usr/sbin/ldeletepv
        chmod 500 /usr/sbin/putlvodm
        chmod 500 /usr/sbin/lvaryoffvg
        chmod 500 /usr/sbin/lvgenminor
        chmod 500 /usr/sbin/tellclvmd

        NOTE: chmod will disable functionality of these commands for
        all users except root.

    B. OPTION 2 (AIX 6.1, AIX 5.3 TL6 and TL7)

        Use the File Permissions Manager (fpm) command to manage
        setuid and setgid programs.

        fpm documentation can be found in the AIX 6 Security Redbook
        at:

        http://www.redbooks.ibm.com/abstracts/sg247430.html

        An fpm level of high will remove the setuid bit from the
        affected commands.  For example:

        fpm -l high -p    # to preview changes
        fpm -l high       # to execute changes

        NOTE: Please review the documentation before execution.  fpm
        will disable functionality of multiple commands for all users
        except root.

VII. OBTAINING FIXES

    AIX security related fixes can be downloaded from:

        ftp://aix.software.ibm.com/aix/efixes/security

    AIX fixes can be downloaded from:

        http://www.ibm.com/eserver/support/fixes/fixcentral/main/pseries/aix

    NOTE: Affected customers are urged to upgrade to the latest
    applicable Technology Level and Service Pack.

VIII. CONTACT INFORMATION

    If you would like to receive AIX Security Advisories via email,
    please visit:

        http://www14.software.ibm.com/webapp/set2/subscriptions/pqvcmjd
 
    Comments regarding the content of this announcement can be
    directed to:

        security-alert@austin.ibm.com

    To request the PGP public key that can be used to communicate
    securely with the AIX Security Team you can either:

        A. Send an email with "get key" in the subject line to:

            security-alert@austin.ibm.com

        B. Download the key from a PGP Public Key Server. The key ID is:

            0xA6A36CCC

    Please contact your local IBM AIX support center for any
    assistance.

    eServer is a trademark of International Business Machines
    Corporation.  IBM, AIX and pSeries are registered trademarks of
    International Business Machines Corporation.  All other trademarks
    are property of their respective holders.

IX. ACKNOWLEDGMENTS

    IBM discovered and fixed this vulnerability as part of its
    commitment to secure the AIX operating system.






31.6 Other filesystem commands:
===============================


df command:
-----------

df Command

Purpose
Reports information about space on file systems. This document describes the AIXr df command as well as 
the System V version of df.

Syntax
df [ [ -P ] | [  -I | -M | -i | -t | -v ] ] [ -k ] [ -m ] [ -g ] [ -s ] [FileSystem ... | File... ]

Description
The df command displays information about total space and available space on a file system. 
The FileSystem parameter specifies the name of the device on which the file system resides, the directory 
on which the file system is mounted, or the relative path name of a file system. The File parameter specifies 
a file or a directory that is not a mount point. 
If the File parameter is specified, the df command displays information for the file system on which the file 
or directory resides. 
If you do not specify the FileSystem or File parameter, the df command displays information for all 
currently mounted file systems. 
File system statistics are displayed in units of 512-byte blocks by default.

The df command gets file system space statistics from the statfs system call. However, specifying the -s flag 
gets the statistics from the virtual file system (VFS) specific file system helper. If you do not specify 
arguments with the -s flag and the helper fails to get the statistics, the statfs system call statistics 
are used. Under certain exceptional conditions, such as when a file system is being modified while 
the df command is running, the statistics displayed by the df command might not be accurate.

Note:
Some remote file systems, such as the Network File System (NFS), do not provide all the information 
that the df command needs. The df command prints blanks for statistics that the server does not provide.

flags:

-g Displays statistics in units of GB blocks. The output values for the file system statistics would be in floating point numbers 
  as value of each unit in bytes is significantly high. 
-i Displays the number of free and used i-nodes for the file system; this output is the default when the specified file system is mounted. 
-I Displays information on the total number of blocks, the used space, the free space, the percentage of used space, and the mount point for the file system. 
-k Displays statistics in units of 1024-byte blocks. 
-m Displays statistics in units of MB blocks. The output values for the file system statistics would be in floating point numbers 
   as value of each unit in bytes is significantly high. 
-M Displays the mount point information for the file system in the second column. 
-P Displays information on the file system in POSIX portable format.  
-s Gets file system statistics from the VFS specific file system helper instead of the statfs system call.
   Any arguments given when using the -s flag must be a JFS or Enhanced JFS filesystem mount point or device. 
   The filesystem must also be listed in /etc/filesystems. 
-t Includes figures for total allocated space in the output. 
-v Displays all information for the specified file system. 

examples:

To display information about all mounted file systems, enter: 

df
If your system has the /, /usr, /site, and /usr/venus file systems mounted, the output from the df command 
resembles the following: 

Filesystem 512-blocks Free   %Used   Iused  %Iused  Mounted on
/dev/hd0    19368     9976    48%     4714    5%     /
/dev/hd1    24212     4808    80%     5031   19%     /usr
/dev/hd2     9744     9352     4%     1900    4%     /site
/dev/hd3     3868     3856     0%      986    0%     /usr/venus 


To display information about /test file system in 1024-byte blocks, enter: 
df -k /test
Filesystem    1024 blocks    Free    %Used   Iused  %Iused  Mounted on 
/dev/lv11         16384     15824       4%      18      1%  /tmp/ravi1
This displays the file system statistics in 1024-byte disk blocks. 


To display information about /test file system in MB blocks, enter: 
df -m /test
Filesystem    MB blocks    Free    %Used    Iused  %Iused  Mounted on 
/dev/lv11       16.00     15.46       4%       18      1%  /tmp/ravi1
This displays file system statistics in MB disk blocks rounded off to nearest 2nd decimal digit. 


To display information about the /test file system in GB blocks, enter: 
df -g /test
Filesystem    GB blocks   Free     %Used    Iused  %Iused  Mounted on 
/dev/lv11          0.02   0.02        0%       18      1%  /tmp/ravi1
This displays file system statistics in GB disk blocks rounded off to nearest 2nd decimal digit. 


To display available space on the file system in which your current directory resides, enter: 

cd/
df .
The output from this command resembles the following: 

Device   512-blocks  free   %used   iused   %iused  Mounted on
/dev/hd4    19368    9976    48%     4714    5%     / 


The defragfs command:
---------------------

defragfs Command

Purpose
Increases a file system's contiguous free space.

Syntax
defragfs [ -q | -r | -s] { Device | FileSystem }

Description
The defragfs command increases a file system's contiguous free space by reorganizing allocations to be 
contiguous rather than scattered across the disk. The file system to be defragmented can be specified 
with the Device variable, which is the path name of the logical volume (for example, /dev/hd4). 
It can also be specified with the FileSystem variable, which is the mount point in the /etc/filesystems file.

The defragfs command is intended for fragmented and compressed file systems. However, you can use 
the defragfs command to increase contiguous free space in nonfragmented file systems.

You must mount the file system read-write for this command to run successfully. Using the -q flag, 
the -r flag or the -s flag generates a fragmentation report. These flags do not alter the file system.

The defragfs command is slow against a JFS2 file system with a snapshot due to the amount of data 
that must be copied into snapshot storage object. The defragfs command issues a warning message 
if there are snapshots. The snapshot command can be used to delete the snapshots and then used again 
to create a new snapshot after the defragfs command completes.

Flags

-q Reports the current state of the file system. 
-r Reports the current state of the file system and the state that would result if 
   the defragfs command is run without either the -q, -r or -s flag. 
-s Reports the fragmentation in the file system. This option causes defragfs to pass through 
   meta data in the file system which may result in degraded performance. 

Output
On a JFS filesystem, the definitions for the messages reported by the defragfs command are as follows:

Number of free fragments 
The number of free fragments in the file system. 
Number of allocated fragments 
The number of allocated fragments in the file system. 
Number of free spaces shorter than a block 
The number of free spaces within the file system that are shorter than a block. A free space is a set of contiguous fragments that are not allocated. 
Number of free fragments in short free spaces 
The total number of fragments in all the short free spaces. A short free space is one that is shorter than a block. 
Number of fragments moved 
The total number of fragments moved. 
Number of logical blocks moved 
The total number of logical blocks moved. 
Number of allocation attempts 
The number of times free fragments were reallocated. 
Number of exact matches 
The number of times the fragments that are moved would fit exactly in some free space. 
Total number of fragments 
The total number of fragments in the file system. 
Number of fragments that may be migrated 
The number of fragments that may be moved during defragmentation. 
FileSystem filesystem is n percent fragmented 
Shows to what extent the file system is fragmented in percentage. 
On a JFS2 filesystem the definitions for the messages reported by the defragfs command are as follows:

Total allocation groups 
The number of allocation groups in the file system. Allocation groups divide the space on a file system into chunks. Allocation groups allow JFS2 resource allocation policies to use well known methods for achieving good I/O performance. 
Allocation groups defragmented 
The number of allocation groups that were defragmented. 
Allocation groups skipped - entirely free 
The number of allocation groups that were skipped because they were entirely free. 
Allocation groups skipped - too few free blocks 
The number of allocation groups that were skipped because there were too few free blocks in them for reallocation. 
Allocation groups skipped - contains a large contiguous free space 
The number of allocation groups that were skipped because they contained a large contiguous free space which is not worth defragmenting. 
Allocation groups are candidates for defragmenting 
The number of allocation groups that are fit for defragmenting. 
Average number of free runs in candidate allocation groups 
The average number of free runs per allocation group, for allocation groups that are found fit for defragmentation. A free run is a contiguous set of blocks which are not allocated. 
Total number of blocks 
The total number of blocks in the file system. 
Number of blocks that may be migrated 
The number of blocks that may be moved during defragmentation. 
FileSystem filesystem is n percent fragmented 
Shows to what extent the file system is fragmented in percentage. 


Examples:
To defragment the /data1 file system located on the /dev/lv00 logical volume, enter: 
defragfs /data1

To defragment the /data1 file system by specifying its mount point, enter: 
defragfs /data1

To generate a report on the /data1 file system that indicates its current status as well as its status 
after being defragmented, enter: 
defragfs  -r /data1

To generate a report on the fragmentation in the /data1 file system, enter: 
defragfs -s /data1


The fsck command:
-----------------

Purpose
Checks file system consistency and interactively repairs the file system.

Syntax
fsck [ -n ] [ -p ] [ -y ] [ -dBlockNumber ] [ -f ] [ -ii-NodeNumber ] [ -o Options ] [ -tFile ] 
     [ -V VfsName ] [ FileSystem1 - FileSystem2 ... ]

Description
Attention: Always run the fsck command on file systems after a system malfunction. Corrective actions 
may result in some loss of data. The default action for each consistency correction is to wait for the operator 
to enter yes or no. If you do not have write permission for an affected file system, the fsck command defaults 
to a no response in spite of your actual response.

Notes:
The fsck command does not make corrections to a mounted file system. 
The fsck command can be run on a mounted file system for reasons other than repairs. 
However, inaccurate error messages may be returned when the file system is mounted. 
The fsck command checks and interactively repairs inconsistent file systems. You should run this command 
before mounting any file system. You must be able to read the device file on which the file system resides 
(for example, the /dev/hd0 device). Normally, the file system is consistent, and the fsck command merely reports 
on the number of files, used blocks, and free blocks in the file system. If the file system is inconsistent, 
the fsck command displays information about the inconsistencies found and prompts you for permission to repair them.

The fsck command is conservative in its repair efforts and tries to avoid actions that might result in the 
loss of valid data. In certain cases, however, the fsck command recommends the destruction of a damaged file. 
If you do not allow the fsck command to perform the necessary repairs, an inconsistent file system may result. 
Mounting an inconsistent file system may result in a system crash.

If a JFS2 file system has snapshots, the fsck command will attempt to preserve them. If this action fails, 
the snapshots cannot be guaranteed to contain all of the before-images from the snapped file system. 
The fsck command will delete the snapshots and the snapshot logical volumes.

If you do not specify a file system with the FileSystem parameter, the fsck command checks all file systems 
listed in the /etc/filesystems file for which the check attribute is set to True. You can enable this type of 
checking by adding a line in the stanza, as follows:

check=true
You can also perform checks on multiple file systems by grouping the file systems in the /etc/filesystems file. 
To do so, change the check attribute in the /etc/filesystems file as follows:

check=Number
The Number parameter tells the fsck command which group contains a particular file system. 
File systems that use a common log device should be placed in the same group. File systems are checked, 
one at a time, in group order, and then in the order that they are listed in the /etc/filesystems file. 
All check=true file systems are in group 1. The fsck command attempts to check the root file system before 
any other file system regardless of the order specified on the command line or in the /etc/filesystems file.

The fsck command checks for the following inconsistencies:

-Blocks or fragments allocated to multiple files. 
-i-nodes containing block or fragment numbers that overlap. 
-i-nodes containing block or fragment numbers out of range. 
-Discrepancies between the number of directory references to a file and the link count of the file. 
-Illegally allocated blocks or fragments. 
-i-nodes containing block or fragment numbers that are marked free in the disk map. 
-i-nodes containing corrupt block or fragment numbers. 
-A fragment that is not the last disk address in an i-node. This check does not apply to compressed file systems. 
-Files larger than 32KB containing a fragment. This check does not apply to compressed file systems. 
-Size checks: 
 Incorrect number of blocks. 
 Directory size not a multiple of 512 bytes.
 These checks do not apply to compressed file systems. 
-Directory checks: 
 Directory entry containing an i-node number marked free in the i-node map. 
 i-node number out of range. 
 Dot (.) link missing or not pointing to itself. 
 Dot dot (..) link missing or not pointing to the parent directory. 
 Files that are not referenced or directories that are not reachable.
-Inconsistent disk map. 
-Inconsistent i-node map.
-Orphaned files and directories (those that cannot be reached) are, if you allow it, reconnected by placing them 
 in the lost+found subdirectory in the root directory of the file system. The name assigned is the i-node number. 
 If you do not allow the fsck command to reattach an orphaned file, it requests permission to destroy the file.

In addition to its messages, the fsck command records the outcome of its checks and repairs through its exit value. 
This exit value can be any sum of the following conditions:

0 All checked file systems are now okay. 
2 The fsck command was interrupted before it could complete checks or repairs. 
4 The fsck command changed the file system; the user must restart the system immediately. 
8 The file system contains unrepaired damage. 

When the system is booted from a disk, the boot process explicitly runs the fsck command, 
specified with the -f and -p flags on the /, /usr, /var, and /tmp file systems. If the fsck command 
is unsuccessful on any of these file systems, the system does not boot. Booting from removable media and 
performing maintenance work will then be required before such a system will boot.

If the fsck command successfully runs on /, /usr, /var, and /tmp, normal system initialization continues. 
During normal system initialization, the fsck command specified with the -f and -p flags runs from the 
/etc/rc file. This command sequence checks all file systems in which the check attribute is set to True (check=true). 
If the fsck command executed from the /etc/rc file is unable to guarantee the consistency of any file system, 
system initialization continues. However, the mount of any inconsistent file systems may fail. 
A mount failure may cause incomplete system initialization.

Note:
By default, the /, /usr, /var, and /tmp file systems have the check attribute set to False (check=false) 
in their /etc/filesystem stanzas. The attribute is set to False for the following reasons: 
The boot process explicitly runs the fsck command on the /, /usr, /var, and /tmp file systems. 
The /, /usr, /var, and /tmp file systems are mounted when the /etc/rc file is executed. The fsck command 
will not modify a mounted file system. Furthermore, the fsck command run on a mounted file system produces 
unreliable results.
You can use the File Systems application in Web-based System Manager (wsm) to change file system characteristics. 
You could also use the System Management Interface Tool (SMIT) smit fsck fast path to run this command.

Flags

-dBlockNumber Searches for references to a specified disk block. Whenever the fsck command encounters a file that 
contains a specified block, it displays the i-node number and all path names that refer to it. 
For JFS2 filesystems, the i-node numbers referencing the specified block will be displayed but not 
their path names." 
-f Performs a fast check. Under normal circumstances, the only file systems likely to be affected by halting 
the system without shutting down properly are those that are mounted when the system stops. The -f flag prompts 
the fsck command not to check file systems that were unmounted successfully. The fsck command determines this 
by inspecting the s_fmod flag in the file system superblock. 
This flag is set whenever a file system is mounted and cleared when it is unmounted successfully. 
If a file system is unmounted successfully, it is unlikely to have any problems. Because most file systems 
are unmounted successfully, not checking those file systems can reduce the checking time.
 
-ii-NodeNumber Searches for references to a specified i-node. Whenever the fsck command encounters a directory 
 reference to a specified i-node, it displays the full path name of the reference. 
-n Assumes a no response to all questions asked by the fsck command; does not open the specified file system 
 for writing. 
-o Options Passes comma-separated options to the fsck command. The following options are currently supported 
 for JFS (these options are obsolete for newer file systems and can be ignored): 
mountable 
Causes the fsck command to exit with success, returning a value of 0, if the file system in question is mountable (clean). 
If the file system is not mountable, the fsck command exits returning with a value of 8. 
mytype 
Causes the fsck command to exit with success (0) if the file system in question is of the same type as either specified in the 
/etc/filesystems file or by the -V flag on the command line. Otherwise, 8 is returned. For example, 
fsck -o mytype -V jfs / exits with a value of 0 if / (the root file system) is a journaled file system.  
-p Does not display messages about minor problems but fixes them automatically. This flag does not grant the wholesale license that the -y flag does and is useful for performing automatic checks when the system is started normally. You should use this flag as part of the system startup procedures, whenever the system is being run automatically. 
If the primary superblock is corrupt, the secondary superblock is verified and copied to the primary superblock. 
-tFile Specifies a File parameter as a scratch file on a file system other than the one being checked, if the fsck command cannot obtain enough memory to keep its tables. If you do not specify the -t flag and the fsck command needs a scratch file, it prompts you for the name of the scratch file. However, if you have specified the -p flag, the fsck command is unsuccessful. If the scratch file is not a special file, it is removed when the fsck command ends. 
-V VfsName Uses the description of the virtual file system specified by the VFSName variable for the file system instead of using the /etc/filesystems file to determine the description. If the -V VfsName flag is not specified on the command line, the /etc/filesystems file is checked and the vfs=Attribute of the matching stanza is assumed to be the correct file system type. 
-y Assumes a yes response to all questions asked by the fsck command. This flag lets the fsck command take any action it considers necessary. Use this flag only on severely damaged file systems. 

Examples
To check all the default file systems, enter: 

fsck
This command checks all the file systems marked check=true in the /etc/filesystems file. 
This form of the fsck command asks you for permission before making any changes to a file system.

To fix minor problems with the default file systems automatically, enter: 

fsck -p
To check a specific file system, enter: 

fsck /dev/hd1
This command checks the unmounted file system located on the /dev/hd1 device.



31.6 DESCRIPTOR AREA'S:
-----------------------

- 1. VOLUME GROUP DESCRIPTOR AREA, VGDA 

Global to the VG:
The VGDA, located at the beginning of each physical volume, contains information that describes all
the LV's and all the PV's that belong to the VG of which that PV is a member.
The VGDA makes a VG selfdescribing. An AIX System can read the VGDA on a disk, and from that, can
determine what PV's and LV's are part of this VG.
There are one or two copies per disk.

- 2. VOLUME GROUP STATUS AREA, VGSA

Tracks the state of mirrorred copies.
The VGSA contains state information about physical partitions and physical volumes.
For example, the VGSA knows if one PV in a VG is unavailable.

Each PV has at least one VGDA/VGSA. The number of VGDA's contained on a single disk
varies according to the number of disks in the VG.

- 3. LOGICAL VOLUME CONTROL BLOCK, LVCB

Contains LV attributes (policies, number of copies).
The LVCB is located at the start of every LV. It contains information about the logical volume. 
You can however, use the mklv command with the -T option, to request that the LVCB will not
be stored in the beginning of the LV. 

With Scalable VG's, LVCM info is no longer stored in the first user block of any LV.
All relevant LVCM info is kept in the VGDA.



31.7 The lqueryvg command:
--------------------------

The lqueryvg command reads the VGDA from a specified disk in a VG.

Example:

# lqueryvg -p hdisk1 -At
# lqueryvg -Atp hdisk0

-p: which PV
-A: show all available information
-t: show descriptive tags

Example:

#lqueryvg -Atp hdisk0
Max LVs:        256
PP Size:        25
Free PPs:       468
LV count:       20
PV count:       2
Total VGDAs:    3
Conc Allowed:   0
MAX PPs per PV  1016
MAX PVs:        32
Conc Autovaryo  0
Varied on Conc  0
Logical:        00c665ed00004c0000000112b7408848.1   hd5 1
                00c665ed00004c0000000112b7408848.2   hd6 1
                00c665ed00004c0000000112b7408848.3   hd8 1
                00c665ed00004c0000000112b7408848.4   hd4 1
                00c665ed00004c0000000112b7408848.5   hd2 1
                00c665ed00004c0000000112b7408848.6   hd9var 1
                00c665ed00004c0000000112b7408848.7   hd3 1
                00c665ed00004c0000000112b7408848.8   hd1 1
                00c665ed00004c0000000112b7408848.9   hd10opt 1
                00c665ed00004c0000000112b7408848.10  hd7 1
                00c665ed00004c0000000112b7408848.11  hd7x 1
                00c665ed00004c0000000112b7408848.12  beheerlv 1
                00c665ed00004c0000000112b7408848.13  varperflv 1
                00c665ed00004c0000000112b7408848.14  loglv00 1
                00c665ed00004c0000000112b7408848.15  db2_server_v8 1
                00c665ed00004c0000000112b7408848.16  db2_var_v8 1
                00c665ed00004c0000000112b7408848.17  db2_admin_v8 1
                00c665ed00004c0000000112b7408848.18  db2_adminlog_v8 1
                00c665ed00004c0000000112b7408848.19  db2_dasscr_v8 1
                00c665ed00004c0000000112b7408848.20  db2_Fixpak10 1
Physical:       00c665edb74079bc                2   0
                00c665edb7f2987a                1   0
Total PPs:      1022
LTG size:       128
HOT SPARE:      0
AUTO SYNC:      0
VG PERMISSION:  0
SNAPSHOT VG:    0
IS_PRIMARY VG:  0
PSNFSTPP:       4352
VARYON MODE:    0
VG Type:        0
Max PPs:        32512





31.8 The lquerypv command:
--------------------------

-------
How do I find out what the maximum supported logical track group (LTG) size of my hard disk? 

You can use the lquerypv command with the -M flag. The output gives the LTG size in KB. For instance, 
the LTG size for hdisk0 in the following example is 256 KB.

/usr/sbin/lquerypv -M hdisk0
256
------ 

run 

lquerypv -h core 6b0 

to find the executable (probably man, but man may have called 
something else in the background) 

then run 

dbx path_/to_/executable core 

and run the subcommand 


dbx> where 

and paste the stack output, should be able to find it from there. also 
paste the level of fileset you are on for the executable 


lslpp -w /path_/to_/executable -> this will give fileset_name 
lslpp -l fileset_name 

-------

Wie l,sst sich ein Storage Lock auf einer SAN-Disk brechen?
Endlich die ersehnte SAN-Disk bekommen und dann das, es l,sst sich keine Volume Group darauf anlegen. 

# mkvg -f vpath100 

gibt einen I/O Error. Was tun? 
H"chstwahrscheinlich befindet sich noch ein Lock auf der SAN-Disk. Dies l,sst sich mit dem Befehl 

# lquerypv -ch /dev/vpath100

aufbrechen und die Volume Group kann angelegt werden. 


-------

# lquerypv -h /dev/hdisk9 80 10
  00000080   00001155 583CD4B0 00000000 00000000  |...UX<..........|


# lquerypv -h /dev/hdisk1
00000000   C9C2D4C1 00000000 00000000 00000000  |................|
00000010   00000000 00000000 00000000 00000000  |................|
00000020   00000000 00000000 00000000 00000000  |................|
00000030   00000000 00000000 00000000 00000000  |................|
00000040   00000000 00000000 00000000 00000000  |................|
00000050   00000000 00000000 00000000 00000000  |................|
00000060   00000000 00000000 00000000 00000000  |................|
00000070   00000000 00000000 00000000 00000000  |................|
00000080   00C665ED B7F2987A 00000000 00000000  |..e....z........|
00000090   00000000 00000000 00000000 00000000  |................|
000000A0   00000000 00000000 00000000 00000000  |................|
000000B0   00000000 00000000 00000000 00000000  |................|
000000C0   00000000 00000000 00000000 00000000  |................|
000000D0   00000000 00000000 00000000 00000000  |................|
000000E0   00000000 00000000 00000000 00000000  |................|
000000F0   00000000 00000000 00000000 00000000  |................|

# lquerypv -h /dev/hdisk0 80 10

root@zd93l12:/root#lquerypv -h /dev/hdisk0 80 10
00000080   00C665ED B74079BC 00000000 00000000  |..e..@y.........|



31.9 The getlvcb command:
-------------------------

The LVCB stores attributes of a LV. The getlvcb command reads the LVCB of a specified LV.
Displays a formatted output of the data in the LVCB of a LV.

Example:

# getlvcb -At hd2

# getlvcb -TA hd3 
Displays the information held in the LVCB of LV hd3. 


31.10 The putlvcb command:
--------------------------

Writes the control block information (only the specified fields) into block 0 of a logical volume (LVCB).


# putlvcb -t jfs lvdata
writes the LV type jfs to the LVCB of LV lvdata. 




32. Some Filesystem related errors in AIX:
==========================================



32.1 The root / Filesystem is full:
===================================


Dealing with a 100% full root (/) filesystem in AIX

Number one - DON'T Re-boot.
 Do a chfs -a size=+1 /  (enter).  The root filesystem will be increased by one
physical partition.

If the box is re-booted, shutdown, or crashes do the following:

Load the AIX Installation CD #1 and type shutdown -Fr.
Upon re-boot press F1 to enter the Systems Management Services (SMS) Menu.
Click on the Multi-Boot icon.


The bootlist needs to be changed so that CD0 is the first boot device.
Shutdown and re-boot.

Press F1 and enter.
Press 1 and enter.
Select Maintenance Mode option (3?).
Select Access a Root Volume Group.
Select  the option that does NOT mount the filesystems.
At the prompt, type mount /dev/hd4 (this is where the root filesystem lives)
/mnt
At the prompt type mount /dev/hd2 /usr

Type df and enter.  Note filesystem sizes.

Now, chfs -a size=+1 /

Type:  df and enter.  Note that the filesystem / is larger.
Type:  sync

You need to change your bootlist to boot off of hdisk0:
 Type:  bootlist -m normal hdisk0 hdisk1 rmt0 cd0 and  enter.
Type:  shutdown -Fr.

the system will re-boot and should come back online in it's proper state.



32.2 Fixing ODM problems on a VG which is not the rootvg:
=========================================================

In the following examle, the VG is called "myvg" consisting of the Physical Volume hdisk3.

1. Unmount all filesystems in that VG first, otherwise you cannot varyoff the VG.
Then varyoff the VG.

# varyoffvg myvg

2. Now remove the complete information of that VG from ODM. The VGDA and LVCB
on the actual disks are NOT touched by the exportvg command.

# exportvg myvg

3. Now import the VG and create new ODM objects associated with that VG:

# importvg -y myvg hdisk3

You only need to specify one intact PV of the VG in the above command. Any disk in the VG
will have a VGDA which contains all neccessary information.
The importvg command reads the VGDA and LVCB on that disk and creates completely new ODM entries.



32.3 Fixing ODM problems on the rootvg:
=======================================

rvgrecover:
-----------

You can try to use the "rvgrecover" shell script.
The rootvg cannot be varied off, like an ordinary VG, so the solution from the
former section cannot be used.
But the script "rvgrecover" issues a series of odmdelete statements, just like exportvg does.
At the end of the script, an importvg is done.
The importvg command, reads the VGDA and LVCB from the boot disk, resulting in new ODM entries.

The rvgrecover script has the following contents:

Reinitializing the rootvg Volume Group 
To reinitialize the rootvg volume group, copy the shell script to /bin/rvgrecover and run 
the following to make that file executable: 

chmod +x /bin/rvgrecover 
Then run: 

/bin/rvgrecover
Use the following shell script to reinitialize the ODM entries for the rootvg volume group: 

PV=/dev/ipldevice  # PV=hdisk0
VG=rootvg
    cp /etc/objrepos/CuAt /etc/objrepos/CuAt.$$
    cp /etc/objrepos/CuDep /etc/objrepos/CuDep.$$
    cp /etc/objrepos/CuDv /etc/objrepos/CuDv.$$
    cp /etc/objrepos/CuDvDr /etc/objrepos/CuDvDr.$$
    lqueryvg -Lp $PV | awk '{ print $2 }' | while read LVname; do
        odmdelete -q "name = $LVname" -o CuAt
        odmdelete -q "name = $LVname" -o CuDv
        odmdelete -q "value3 = $LVname" -o CuDvDr
    done
    odmdelete -q "name = $VG" -o CuAt
    odmdelete -q "parent = $VG" -o CuDv
    odmdelete -q "name = $VG" -o CuDv
    odmdelete -q "name = $VG" -o CuDep
    odmdelete -q "dependency = $VG" -o CuDep
    odmdelete -q "value1 = 10" -o CuDvDr
    odmdelete -q "value3 = $VG" -o CuDvDr
    importvg -y $VG $PV      # ignore lvaryoffvg errors
    varyonvg $VG



redefinevg:
-----------

redefinevg Command

Purpose
Redefines the set of physical volumes of the given volume group in the device configuration database. 

Syntax
redefinevg { -d Device | -i Vgid } VolumeGroup

Description
During normal operations the device configuration database remains consistent with the 
Logical Volume Manager (LVM) information in the reserved area on the physical volumes. 
If inconsistencies occur between the device configuration database and the LVM, the redefinevg command 
determines which physical volumes belong to the specified volume group and re-enters this information 
in the device configuration database. The redefinevg command checks for inconsistencies by reading 
the reserved areas of all the configured physical volumes attached to the system.


Note: To use this command, you must either have root user authority or be a member of the system group.

Flags

-d Device The volume group ID, Vgid, is read from the specified physical volume device. 
   You can specify the Vgid of any physical volume belonging to the volume group that you are redefining. 
-i Vgid The volume group identification number of the volume group to be redefined. 

Example

To redefine rootvg physical volumes in the Device Configuration Database, enter a command similar to the following:

# redefinevg -d hdisk0 rootvg


synclvodm:
----------

synclvodm Command 
Purpose
Synchronizes or rebuilds the logical volume control block, the device configuration database, 
and the volume group descriptor areas on the physical volumes. 

Syntax
synclvodm [ -v ] VolumeGroup [ LogicalVolume ... ] 


Description
During normal operations, the device configuration database remains consistent with the 
logical volume manager information in the logical volume control blocks and the volume group descriptor 
areas on the physical volumes. If for some reason the device configuration database is not consistent 
with Logical Volume Manager information, the synclvodm command can be used to resynchronize the database. 
The volume group must be active for the resynchronization to occur (see varyonvg). 
If logical volume names are specified, only the information related to those logical volumes is updated. 

Attention: Do not remove the /dev entries for volume groups or logical volumes. Do not change the 
device configuration database entries for volume groups or logical volumes using the object data manager. 
Note: To use this command, you must either have root user authority or be a member of the system group.
Flags
-v verbose 

Example

To synchronize the device configuration database with the logical volume manager information for rootvg, 
enter the following: 

synclvodm rootvg



32.4 How to Replace a Disk?: 
============================

1. Short version for normal VG (not rootvg) and the disk is working:
--------------------------------------------------------------------

extendvg VolumeGroupName hdiskY
migratepv hdiskX hdiskY
reducevg -d VolumeGroupName hdiskX


2. More Detail:
---------------

2.1 The disk is mirrored:
-------------------------

1. Remove all copies from the disk:
   # unmirrorvg vg_name hdiskX

2. Remove disk from VG:
   # reducevg vg_name hdiskX

3. Remove disk from ODM:
   # rmdev -l hdiskX -d

4. Add new disk to the system.

5. Add the new disk to the VG:
   # extendvg vg_name hdiskY

6. Create new copies:
   # mirrorvg vg_name 
   # syncvg vg_name


2.2 The disk was not mirrored, or you want to replace a working disk:
---------------------------------------------------------------------

1. Add the new disk to the system.

2. Add the disk to the VG:
   # extendvg vg_name hdiskY

3. Migrate old disk to new disk:
   # migratepv hdiskX hdiskY

4. Remove old disk from VG:
   # reducevg vg_name hdiskX

5. Remove old disk from ODM:
   # rmdev -l hdiskX -d


2.3 Replace the disk in the rootvg:
-----------------------------------

1. Add the new disk to the system.

2. Add the disk to the VG:
   # extendvg rootvg hdiskY

3. The diskX contains hd5? If so:

   # migratepv -l hd5 hdiskX hdiskY
   # bosboot -ad /dev/hdiskY
   # chpv -c hdiskX
   # bootlist -m normal hdiskY

   If hdiskX contains the primary dump device, you must deactivate it:
   # sysdumpdev -p /dev/sysdumpnull

4. Migrate old disk to new disk:
   # migratepv hdiskX hdiskY

   If the primary dump device has been deactivated, activate it again
   # sysdumpdev -p /dev/hdX

5. Remove old disk from VG:
   # reducevg rootvg hdiskX

6. Remove old disk from ODM:
   # rmdev -l hdiskX -d






32.5 Filesystem errors:
=======================



32.5.1 ksh: Invalid file system control data detected:
======================================================

Note 1:
-------

Q:

Anybody recognize this? This directory seems to be missing the ".", I can't 
umount, can't remove the directory, can't copy a good directory over it, 
etc. 

spiderman# cd probes 
spiderman# pwd 
/opt/diagnostics/probes 
spiderman# ls -la 
ls: 0653-341 The file . does not exist. 
spiderman# cd .. 
spiderman# ls -la probes 
ls: probes: Invalid file system control data detected. 
total 0 
spiderman# 

spiderman# fuser /opt 
/opt: 
spiderman# umount /opt 
umount: 0506-349 Cannot unmount /dev/hd10opt: The requested resource is 
busy. 
spiderman# umount /dev/hd10opt 
umount: 0506-349 Cannot unmount /dev/hd10opt: The requested resource is 
busy. 

spiderman# fsck /opt 

** Checking /dev/hd10opt (/opt) MOUNTED FILE SYSTEM; WRITING SUPPRESSED; 
Checking a mounted filesystem does not produce dependable results. 
** Phase 1 - Check Blocks and Sizes 
** Phase 2 - Check Pathnames 
DIRECTORY CORRUPTED (NOT FIXED) 
DIRECTORY CORRUPTED (NOT FIXED) 
Directory /diagnostics/probes, '.' entry is missing. (NOT FIXED) 
Directory /diagnostics/probes, '..' entry is missing. (NOT FIXED) 
** Phase 3 - Check Connectivity 
** Phase 4 - Check Reference Counts 
link count directory I@98 owner=bin mode$0755 
sizeQ2 mtime=May 13 14:54 2005 
count 3 should be 2 (NOT ADJUSTED) 
link count directory I@99 owner=bin mode$0755 
size24 mtime=Jan 10 13:45 2005 
count 2 should be 1 (NOT ADJUSTED) 
Unreferenced file IA06 owner=bin mode0555 
sizee56 mtime=Jul 07 14:25 2004 (NOT RECONNECTED) 
Unreferenced file IA06 (NOT CLEARED) 
Unreferenced file IA07 owner=bin mode0555 
size)12 mtime=Jul 07 14:25 2004 (NOT RECONNECTED) 
etc....


A:

Some good news here. Yes, your directory is hosed, but the important 
things is that all a directory is a repository for storing inode numbers 
and associated (human readable) file names. Since fsck is so nicely 
generating all of those now currently inaccessible inode numbers, a find 
command can be used to move them into a new directory. Once the old 
directory is empty, you can (hopefully) rm -r it. 

Here's what you need to do. 

a) Get all the inode numbers generated from your fsck 
b) put them into a variable (e.g. lost_inodes="4099 4106....etc." 
c) Make a target directory for the lost inodes to be moved into: 
mkdir /tmp/recovery 
d) cd into your problem File System: 
cd /opt 
d) Run a loop using find: 
for i in ${lost_inodes} 
do 
find . -inum ${i} mv * /tmp/recovery \; 
echo "Moved and recovered inode # ${i}" 
done 

That should do it. Let me know if it works ok! BTW, the new "file 
name" should be the inode number of the file. You will have to rename 
the files as needed. 


Note 2: IY94101: J2_DMAP_CORRUPT ERROR REPORT AFTER SHRINKING JFS2 FILESYSTEM
-----------------------------------------------------------------------------

http://www-1.ibm.com/support/docview.wss?uid=isg1IY94101

IY94101: J2_DMAP_CORRUPT ERROR REPORT AFTER SHRINKING JFS2 FILESYSTEM

APAR status
Closed as program error.

Error description 
After shrinking a filesystem, J2_DMAP_CORRUPT reports
appear in the error report and some file creates/writes
fail with "Invalid file system control data detected".
Local fix 
Problem summary 
Problem conclusion 
Temporary fix 
Comments 
APAR information 
APAR number IY94101 
Reported component name AIX 5.3 
Reported component ID 5765G0300 
Reported release 530 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2007-01-26 
Closed date 2007-01-29 
Last modified date 2007-05-25 

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Publications Referenced


Fix information 
Fixed component name AIX 5.3 
Fixed component ID 5765G0300 



Note 3:
-------

Q:

Since applying ML7 for AIX 5.1 I have been getting file corruption error 
messages on a particular filesystem and the only way to fix it is to umount 
the filesystem and fsck it. I thought it might be a hardware problem but 
now it is also happening on another machine I put the ML7 on and it is 
happening to the same filesystem (one machine is a test server of the 
other). The only unique thing about the filesystem is that it is not in 
rootvg and it is large -1281228 1024-blocks. Has anyone heard of this? 
Below is the error I am getting: 
LABEL: JFS_META_CORRUPTION 
IDENTIFIER: 684A365B 


Date/Time: Tue Apr 26 13:45:26 EDT 
Sequence Number: 2023 
Machine Id: 0000F11F4C00 
Node Id: XX00 
Class: U 
Type: UNKN 
Resource Name: SYSPFS 
Resource Class: NONE 
Resource Type: NONE 
Location: NONE 
VPD: 


Description 
FILE SYSTEM CORRUPTION 


Probable Causes 
INVALID FILE SYSTEM CONTROL DATA 


        Recommended Actions 
        PERFORM FULL FILE SYSTEM RECOVERY USING FSCK UTILITY OBTAIN 
DUMP 
        CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES 


Failure Causes 
ADAPTER HARDWARE OR MICROCODE 
DISK DRIVE HARDWARE OR MICROCODE 
SOFTWARE PROGRAM 
STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED 


        Recommended Actions 
        CHECK CABLES AND THEIR CONNECTIONS 
        INSTALL LATEST ADAPTER AND DRIVE MICROCODE 
        INSTALL LATEST STORAGE DEVICE DRIVERS 
        IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE 


Detail Data 
FILE NAME 
xix_lookup.c 
LINE NO. 
         300 
MAJOR/MINOR DEVICE NUMBER 
0026 0006 
ADDITIONAL INFORMATION 
4A46 5345 426E 8C46 0000 000E 0000 001D 0003 0610 0000 0000 0000 0000 0000 
0002 
164D A330 0001 86D3 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
0000 
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
0000 
--------------------------------------------------------------------------- 
LABEL: JFS_FSCK_REQUIRED 
IDENTIFIER: CD546B25 


Date/Time: Tue Apr 26 13:45:26 EDT 
Sequence Number: 2022 
Machine Id: 0000F11F4C00 
Node Id: XX00 
Class: O 
Type: INFO 
Resource Name: SYSPFS 


Description 
FILE SYSTEM RECOVERY REQUIRED 


        Recommended Actions 
        PERFORM FULL FILE SYSTEM RECOVERY USING FSCK UTILITY 


Detail Data 
MAJOR/MINOR DEVICE NUMBER 
0026 0006 
FILE SYSTEM DEVICE AND MOUNT POINT 
/dev/lv04, /opt/egate 


Note 3:
-------

Q: 

How can I remove a bizarre, irremovable file from a directory? I've tried every way of using 
/bin/rm and nothing works." 

A: 

In some rare cases a strangely-named file will show itself in your directory and appear to be 
un-removable with the rm command. Here is will the use of ls -li and find with its -inum [inode] 
primary does the job. 
Let's say that ls -l shows your irremovable as 

-rw-------  1 smith  smith  0 Feb  1 09:22 ?*?*P

Type: 

ls -li

to get the index node, or inode. 

153805 -rw-------  1 smith  smith  0 Feb  1 09:22 ?*?^P

The inode for this file is 153805. Use find -inum [inode] to make sure that the file is correctly identified. 


%  find -inum 153805 -print
./?*?*P

Here, we see that it is. Then used the -exec functionality to do the remove. . 
  
% find . -inum 153805 -print -exec /bin/rm {} \;

Note that if this strangely named file were not of zero-length, it might contain accidentally misplaced 
and wanted data. Then you might want to determine what kind of data the file contains and move the file 
to some temporary directory for further investigation, for example: 

% find . -inum 153805 -print -exec /bin/mv {} unknown.file \;

Will rename the file to unknown.file, so you can easily inspect it. 

Another way to remove strangely-named files is to use "ls -q" or "cat -v" to show the special characters, 
and then use shell's globbing mechanism to delete the file. 

$ ls
-????*'?
$ ls | cat -v
-^B^C?^?*'

$ rm ./-'^B'*           -- achieved by typing control-V control-B
$ ls


the argument given to rm is a judicious selection of glob wildcards (*'s) and sufficient control characters 
to uniquely identify the file. The leading "./" is useful when the file begins with a hyphen. 
These binary name files are caused by: 

* accidental cut-and-pastes to shell prompts - especially when you paste something of the form: "junk > garbage" 
because the shell creates the file "garbage" before trying to execute the command "junk" 

* filesystem corruption (in which case touching the filesystem any more can really stuff things up) 
If you discover that you have two files of the same name, one of the files probably has a bizarre 
(and unprintable) character in its name. Most probably, this unprintable character is a backspace. 

For example: 


    $ ls
    filename filename
    $ ls -q
    filename fl?ilename
    $ ls | cat -v
    filename
    fl^Hilename




32.5.2 More on Filesystem errors (1):
=====================================

Note 1:
-------

Q:

Hi all, 

I have a error message complaining about filesystem being full. 
but df does not sure any filesystem being full. 
The error report gives me the major/minor number: 0027/0004 
I went to /dev dir, and searched for the numbers, but it turns out to be ptyp4. 
Why is that? What does this mean? 

Any suggestion? 

A:

Those numbers are reported in hex, the actual major/minor #'s 
are 39 and 4

A:

Convert the errpt #'s to hex. The use ls -l to find them. 


Note 2:
-------

Q:

Hi, 
I get a error concerning a filesystem. 
Now I have 2 questions: 


- What is the way to find out which filesystems is concerned? 
- What can I do? Because all fs have unused space. I cannot find any fs 
with 100% in use. 

LABEL:            J2_FS_FULL
IDENTIFIER: CED6B4B5
Date/Time:       Mon Dec 27 12:49:35 NFT
Sequence Number: 3420
Machine Id:      00599DDD4C00
Node Id:         srvdms0
Class:           O
Type:            INFO
Resource Name:   SYSJ2
Description
UNABLE TO ALLOCATE SPACE IN FILE SYSTEM
Probable Causes
FILE SYSTEM FULL
 Recommended Actions
 INCREASE THE SIZE OF THE ASSOCIATED FILE SYSTEM  REMOVE UNNECESSARY
DATA FROM FILE SYSTEM  USE FUSER UTILITY TO LOCATE UNLINKED FILES STILL
REFERENCED
Detail Data
JFS2 MAJOR/MINOR DEVICE NUMBER
 002B 000B
 

A:

002b is 2*16+11 -->43 
ls -l /dev|grep 43, 
000b is 11 --> look for 43, 11 

Date:         Wed, 29 Dec 2004 11:06:27 +0000
To: aix-l@Princeton.EDU

Q:

Subject 
Re: error concerning filesystem [Virus checked] 

Hi Holger, 

A small query...how did you arrive at this figure of 43 from the error 
code. 
The decimal value of B is 11 but I could not understand the 2*16.. 

can you please exp this.... 

A:

The major/minor numbers (002B 000B) are in hex: hex abcd = 
a*16^3+b*16^2+c*16^1+d therefore hex 002B=0*16^3+0*16^2+2*16^1+11=2*16+11 


Note 3: AIX superblock issues:
------------------------------

-- Hint 1 for AIX:
-- ---------------

thread:

Use this command in case the superblock is corrupted. This will restore the BACKUP COPY of the superblock 
to the CURRENT copy.

# dd count=1 bs=4k skip=31 seek=1 if=/dev/hd4 of=/dev/hd4

# fsck /dev/hd4 2>&1 | tee /tmp/fsck.errors


Note:

fuser
Identifies processes using a file or file system

# fuser -u /dev/hd3
Sample output: /dev/hd3: 2964(root) 6615c(root) 8465(casado) 11290(bonner)


-- Hint 2 for AIX:
-- ---------------

http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.howtos/doc/howto/HT_baseadmn_badmagnumber.htm


Fixing a corrupted magic number in the file system superblock
If the superblock of a file system is damaged, the file system cannot be accessed. You can fix a 
corrupted magic number in the file system superblock.

Most damage to the superblock cannot be repaired. The following procedure describes how to repair a superblock 
in a JFS file system when the problem is caused by a corrupted magic number. If the primary superblock is corrupted 
in a JFS2 file system, use the fsck command to automatically copy the secondary superblock and repair the primary 
superblock.

In the following scenario, assume /home/myfs is a JFS file system on the physical volume /dev/lv02.

The information in this how-to was tested using AIXr 5.2. If you are using a different version or level of AIX, 
the results you obtain might vary significantly. 

1. Unmount the /home/myfs file system, which you suspect might be damaged, using the following command: 

# umount /home/myfs

2. To confirm damage to the file system, run the fsck command against the file system. For example: 

# fsck -p /dev/lv02

If the problem is damage to the superblock, the fsck command returns one of the following messages: 

fsck: Not an AIXV5 file system
OR 
Not a recognized filesystem type

3. With root authority, use the od command to display the superblock for the file system, 
as shown in the following example: 

# od -x -N 64 /dev/lv02 +0x1000

Where the -x flag displays output in hexadecimal format and the -N flag instructs the system to format 
no more than 64 input bytes from the offset parameter (+), which specifies the point in the file where 
the file output begins. The following is an example output: 

0001000  1234 0234 0000 0000 0000 4000 0000 000a
0001010  0001 8000 1000 0000 2f6c 7633 0000 6c76
0001020  3300 0000 000a 0003 0100 0000 2f28 0383
0001030  0000 0001 0000 0200 0000 2000 0000 0000
0001040

In the preceding output, note the corrupted magic value at 0x1000 (1234 0234). If all defaults were taken 
when the file system was created, the magic number should be 0x43218765. If any defaults were overridden, 
the magic number should be 0x65872143. 

4. Use the od command to check the secondary superblock for a correct magic number. An example command 
and its output follows: 

# od -x -N 64 /dev/lv02 +0x1f000

001f000  6587 2143 0000 0000 0000 4000 0000 000a
001f010  0001 8000 1000 0000 2f6c 7633 0000 6c76
001f020  3300 0000 000a 0003 0100 0000 2f28 0383
001f030  0000 0001 0000 0200 0000 2000 0000 0000
001f040

Note the correct magic value at 0x1f000. 

5. Copy the secondary superblock to the primary superblock. An example command and output follows: 

# dd count=1 bs=4k skip=31 seek=1 if=/dev/lv02 of=/dev/lv02

dd: 1+0 records in.
dd: 1+0 records out.

Use the fsck command to clean up inconsistent files caused by using the secondary superblock. For example: 

# fsck /dev/lv02 2>&1 | tee /tmp/fsck.errs

For more information

The fsck and od command descriptions in AIX 5L Version 5.3 Commands Reference, Volume 4 
AIX Logical Volume Manager from A to Z: Introduction and Concepts, an IBM Redbook 
AIX Logical Volume Manager from A to Z: Troubleshooting and Commands, an IBM Redbook 
"Boot Problems" in Problem Solving and Troubleshooting in AIX 5L, an IBM Redbook 




Note 4: Linux superblock issues:
--------------------------------

1.

DAMAGED SUPERBLOCK


If a filesystem check fails and returns the error message "Damaged Superblock" you're lost . . . . . . . 
or not ?
Well, not really, the damaged "superblock" can be restored from a backup. There are several backups stored 
on the harddisk. But let me first have a go at explaining what a "superblock"is.

A superblock is located at position 0 of every partition, contains vital information about the filesystem 
and is needed at a fielsystem check.

The information stored in the superblock are about what sort of fiesystem is used, the I-Node counts, 
block counts, free blocks and I-Nodes, the numer of times the filesystem was mounted, date of the 
last filesystem check and the first I-Node where / is located.

Thus, a damaged superblock means that the filesystem check will fail. 

Our luck is that there are backups of the superblock located on several positions and we can restore 
them with a simple command.

The usual ( and only ) positions are: 8193, 32768, 98304, 163840, 229376 and 294912. ( 8193 in many cases 
only on older systems, 32768 is the most current position for the first backup )
You can check this out and have a lot more info about a particular partition you have on your HD by:


CODE  
# dumpe2fs /dev/hda5 

You will see that the primary superblock is located at position 0, and the first backup on position 32768.
O.K. let's get serious now, suppose you get a "Damaged Superblock" error message at filesystem check 
( after a power failure ) and you get a root-prompt in a recovery console, then you give the command:

CODE  
# e2fsck -b 32768 /dev/hda5 


don't try this on a mounted filesystem

It will then check the filesystem with the information stored in that backup superblock and if the check 
was successful it will restore the backup to position 0.
Now imagine the backup at position 32768 was damaged too . . . then you just try again with the backup 
stored at position 98304, and 163840, and 229376 etc. etc. until you find an undamaged backup  
( there are five backups so if at least one of those five is okay it's bingo ! )

So next time don't panic . . just get the paper where you printed out this Tip and give the magic command
 
CODE  
# e2fsck -b 32768 /dev/hda5  




32.6 Undelete programs:
=======================

Note 1: AIX and JFS
-------------------

/*****************************************************************************
 * rsb.c - Read Super Block. Allows a jfs superblock to be dumped, inode
 * table to be listed or specific inodes data pointers to be chased and
 * dumped to standard out (undelete).
 *
 * Phil Gibbs - Trinem Consulting (pgibbs@trinem.co.uk)
 ****************************************************************************/
#include <stdio.h>
#include <jfs/filsys.h>
#include <jfs/ino.h>
#include <sys/types.h>
#include <pwd.h>
#include <grp.h>
#include <unistd.h>
#include <time.h>

#define FOUR_MB		(1024*1024*4)
#define THIRTY_TWO_KB	(1024*32)

extern int optind;
extern int Optopt;
extern int Opterr;
extern char *optarg;

void PrintSep()
{
	int k=80;

	while (k)
	{
		putchar('-');
		k--;
	}
	putchar('\n');
}

char *UserName(uid_t uid)
33333{
char replystr[10];
struct passwd *res;

res=getpwuid(uid);
if (res->pw_name[0])
{
	return res->pw_name;
}
else
{
	sprintf(replystr,"%d",uid);
	return replystr;
}
}

char *GroupName(gid_t gid)
{
struct group *res;
res=getgrgid(gid);
return res->gr_name;
}


ulong NumberOfInodes(struct superblock *sb)
{
	ulong MaxInodes;
	ulong TotalFrags;

	if (sb->s_version==fsv3pvers)
	{
		TotalFrags=(sb->s_fsize*512)/sb->s_fragsize;
		MaxInodes=(TotalFrags/sb->s_agsize)*sb->s_iagsize;
	}
	else
	{
		MaxInodes=(sb->s_fsize*512)/sb->s_bsize;
	}
	return MaxInodes;
}


void AnalyseSuperBlock(struct superblock *sb)
{
	ulong TotalFrags;

	PrintSep();
	printf("SuperBlock Details:\n-------------------\n");
	printf("File system size:  %ld x 512 bytes (%ld Mb)\n",
				sb->s_fsize,
				(sb->s_fsize*512)/(1024*1024));
	printf("Block size:        %d bytes\n",sb->s_bsize);
	printf("Flags:             ");
	switch (sb->s_fmod)
	{
		case (char)FM_CLEAN:
			break;
		case (char)FM_MOUNT:
			printf("mounted ");
			break;
		case (char)FM_MDIRTY:
			printf("mounted dirty ");
			break;
		case (char)FM_LOGREDO:
			printf("log redo failed ");
			break;
		default:
			printf("Unknown flag ");
			break;
	}
	if (sb->s_ronly) printf("(read-only)");
	printf("\n");
	printf("Last SB update at: %s",ctime(&(sb->s_time)));
	printf("Version:           %s\n",
	sb->s_version?"1 - fsv3pvers":"0 - fsv3vers");
	printf("\n");
	if (sb->s_version==fsv3pvers)
	{
		TotalFrags=(sb->s_fsize*512)/sb->s_fragsize;
		printf("Fragment size:     %5d         ",sb->s_fragsize);
		printf("inodes per alloc:  %8d\n",sb->s_iagsize);
		printf("Frags per alloc:   %5d         ",sb->s_agsize);
		printf("Total Fragments:   %8d\n",TotalFrags);
		printf("Total Alloc Grps:  %5d         ",
						TotalFrags/sb->s_agsize);
		printf("Max inodes:        %8ld\n",NumberOfInodes(sb));
	}
	else
	{
		printf("Total Alloc Grps:  %5d         ",
				(sb->s_fsize*512)/sb->s_agsize);
		printf("inodes per alloc:  %8d\n",sb->s_agsize);
		printf("Max inodes:      %8ld\n",NumberOfInodes(sb));
	}
	PrintSep();
}

void ReadInode(	FILE *in,
		ulong StartInum,
		struct dinode *inode,
		ulong InodesPerAllocBlock,
		ulong AllocBlockSize)
{
	off_t			SeekPoint;
	long			BlockNumber;
	int			OffsetInBlock;
	static struct dinode	I_NODES[PAGESIZE/DILENGTH];
	ulong			AllocBlock;
	ulong			inum;
	static off_t		LastSeekPoint=-1;

	AllocBlock=(StartInum/InodesPerAllocBlock);
	BlockNumber=(StartInum-(AllocBlock*InodesPerAllocBlock))/
			(PAGESIZE/DILENGTH);
	OffsetInBlock=(StartInum-(AllocBlock*InodesPerAllocBlock))-
			(BlockNumber*(PAGESIZE/DILENGTH));
	SeekPoint=(AllocBlock)?
		(BlockNumber*PAGESIZE)+(AllocBlock*AllocBlockSize):
		(BlockNumber*PAGESIZE)+(INODES_B*PAGESIZE);
	if (SeekPoint!=LastSeekPoint)
	{
		sync();
		fseek(in,SeekPoint,SEEK_SET);
		fread(I_NODES,PAGESIZE,1,in);
		LastSeekPoint=SeekPoint;
	}
	*inode=I_NODES[OffsetInBlock];
}

void DumpInodeContents(	long	inode,
			FILE	*in,
			ulong	InodesPerAllocBlock,
			ulong	AllocBlockSize,
			ulong	Mask,
			ulong	Multiplier)
{
	struct dinode		DiskInode;
	ulong			SeekPoint;
	char			Buffer[4096];
	ulong			FileSize;
	int			k;
	int			BytesToRead;
	ulong			*DiskPointers;
	int			NumPtrs;

	ReadInode(	in,
			inode,
			&DiskInode,
			InodesPerAllocBlock,
			AllocBlockSize);
	FileSize=DiskInode.di_size;

	if (FileSize>FOUR_MB)
	{
		/* Double indirect mapping */
	}
	else
	if (FileSize>THIRTY_TWO_KB)
	{
		/* Indirect mapping */
		SeekPoint=DiskInode.di_rindirect & Mask;
		SeekPoint=SeekPoint*Multiplier;
		DiskPointers=(ulong *)malloc(1024*sizeof(ulong));
		fseek(in,SeekPoint,SEEK_SET);
		fread(DiskPointers,1024*sizeof(ulong),1,in);
		NumPtrs=1024;
	}
	else
	{
		/* Direct Mapping */
		DiskPointers=&(DiskInode.di_rdaddr[0]);
		NumPtrs=8;
	}

	for (k=0;k<=NumPtrs && FileSize;k++)
	{
		SeekPoint=(DiskPointers[k] & Mask);
		SeekPoint=SeekPoint*Multiplier;

		BytesToRead=(FileSize>sizeof(Buffer))?sizeof(Buffer):FileSize;
		fseek(in,SeekPoint,SEEK_SET);
		fread(Buffer,BytesToRead,1,in);
		FileSize=FileSize-BytesToRead;
		write(1,Buffer,BytesToRead);
	}
}

void DumpInodeList(	FILE	*in,
			ulong	MaxInodes,
			ulong	InodesPerAllocBlock,
			ulong	AllocBlockSize)
{
	long			inode;
	struct dinode		DiskInode;
	struct tm		*TimeStruct;

	printf("   Inode Links     User    Group     Size    ModDate\n");
	printf("-------- ----- -------- -------- --------    -------\n");
	for (inode=0;inode<=MaxInodes;inode++)
	{
		ReadInode(	in,
				inode,
				&DiskInode,
				InodesPerAllocBlock,
				AllocBlockSize);
		if (DiskInode.di_mtime)
		{
			TimeStruct=localtime((long *)&DiskInode.di_mtime);
			printf("%8d %5d %8s %8s %8d %02d/%02d/%4d\n",
				inode,
				DiskInode.di_nlink,
				UserName(DiskInode.di_uid),
				GroupName(DiskInode.di_gid),
				DiskInode.di_size,
				TimeStruct->tm_mday,
				TimeStruct->tm_mon,
				TimeStruct->tm_year+1900);
		}
	}
}

void ExitWithUsageMessage()
{
	fprintf(stderr,"USAGE: rsb [-i inode] [-d] [-s] <block_device>\n");
	exit(1);
}

main(int argc,char **argv)
{
	FILE			*in;
	struct superblock	SuperBlock;
	short			Valid;
	long			inode=0;
	struct dinode		DiskInode;
	ulong			AllocBlockSize;
	ulong			InodesPerAllocBlock;
	ulong			MaxInodes;
	ulong			Mask;
	ulong			Multiplier;
	int			option;
	int			DumpSuperBlockFlag=0;
	int			DumpFlag=0;

	while ((option=getopt(argc,argv,"i:ds")) != EOF)
	{
		switch(option)
		{
			case 'i':
				/* Inode specified */
				inode=atol(optarg);
				break;
			case 'd':
				/* Dump flag */
				DumpFlag=1;
				break;
			case 's':
				/* List Superblock flag */
				DumpSuperBlockFlag=1;
				break;
			default:
				break;
		}
	}

	if (strlen(argv[optind])) in=fopen(argv[optind],"r");
	else ExitWithUsageMessage();

	if (in)
	{
		fseek(in,SUPER_B*PAGESIZE,SEEK_SET);
		fread(&SuperBlock,sizeof(SuperBlock),1,in);
		switch (SuperBlock.s_version)
		{
			case fsv3pvers:
				Valid=!strncmp(SuperBlock.s_magic,fsv3pmagic,4);
				InodesPerAllocBlock=SuperBlock.s_iagsize;
				AllocBlockSize=
				SuperBlock.s_fragsize*SuperBlock.s_agsize;
				Multiplier=SuperBlock.s_fragsize;
				Mask=0x3ffffff;
				break;
			case fsv3vers:
				Valid=!strncmp(SuperBlock.s_magic,fsv3magic,4);
				InodesPerAllocBlock=SuperBlock.s_agsize;
				AllocBlockSize=SuperBlock.s_agsize*PAGESIZE;
				Multiplier=SuperBlock.s_bsize;
				Mask=0xfffffff;
				break;
			default:
				Valid=0;
				break;
		}
		if (Valid)
		{
			if (DumpSuperBlockFlag==1)
			{
				AnalyseSuperBlock(&SuperBlock);
			}
			MaxInodes=NumberOfInodes(&SuperBlock);
			if (DumpFlag==1)
			{
				if (inode)
				DumpInodeContents(inode,in,InodesPerAllocBlock,AllocBlockSize,Mask,Multiplier);
				else
				DumpInodeList(in,MaxInodes,InodesPerAllocBlock,AllocBlockSize);
			}
		}
		else
		{
			fprintf(stderr,"Superblock - bad magic number\n");
			exit(1);
		}
	}
	else
	{
		fprintf(stderr,"couldn't open ");
		perror(argv[optind]);
		exit(1);
	}
}



Note 2: Undelete a text file on most unixes (no garantee):
----------------------------------------------------------

Works mainly on Linux Distro's

Using grep (traditional UNIX way) to recover files
Use following grep syntax:

# grep -b 'search-text' /dev/partition > file.txt
OR
# grep -a -B[size before] -A[size after] `text' /dev/[your_partition] > file.txt

Where,

-i : Ignore case distinctions in both the PATTERN and the input files i.e. match both uppercase and lowercase character. 
-a : Process a binary file as if it were text 
-B Print number lines/size of leading context before matching lines. 
-A: Print number lines/size of trailing context after matching lines. 
To recover text file starting with "nixCraft" word on /dev/sda1 you can try following command:
# grep -i -a -B10 -A100 'nixCraft' /dev/sda1 > file.txt
Next use vi to see file.txt. This method is ONLY useful if deleted file is text file. 
If you are using ext2 file system, try out recover command. .


Note 3:
-------

For AIX there are undelete tools: http://www.compunix.com/


Note 4: lsof and Linux:
-----------------------

Bring back deleted files with lsof
By Michael Stutz on November 16, 2006 (8:00:00 AM) 

Briefly, a file as it appears somewhere on a Linux filesystem is actually just a link to an inode, 
which contains all of the file's properties, such as permissions and ownership, as well as the addresses 
of the data blocks where the file's content is stored on disk. When you rm a file, you're removing the link 
that points to its inode, but not the inode itself; other processes (such as your audio player) might still 
have it open. It's only after they're through and all links are removed that an inode and the data blocks 
it pointed to are made available for writing.

This delay is your key to a quick and happy recovery: if a process still has the file open, the data's there 
somewhere, even though according to the directory listing the file already appears to be gone.

This is where the Linux process pseudo-filesystem, the /proc directory, comes into play. Every process on 
the system has a directory here with its name on it, inside of which lies many things -- 
including an fd ("file descriptor") subdirectory containing links to all files that the process has open. 
Even if a file has been removed from the filesystem, a copy of the data will be right here:

/proc/process id/fd/file descriptor 

To know where to go, you need to get the id of the process that has the file open, and the file descriptor. 
These you get with lsof, whose name means "list open files." (It actually does a whole lot more than this 
and is so useful that almost every system has it installed. If yours isn't one of them, you can grab the latest 
version straight from its author.)

Once you get that information from lsof, you can just copy the data out of /proc and call it a day.

This whole thing is best demonstrated with a live example. First, create a text file that you can delete 
and then bring back:

$ man lsof | col -b > myfile 

Then have a look at the contents of the file that you just created:

$ less myfile 

You should see a plaintext version of lsof's huge man page looking out at you, courtesy of less.

Now press Ctrl-Z to suspend less. Back at a shell prompt make sure your file is still there:

$ ls -l myfile
-rw-r--r--  1 jimbo jimbo 114383 Oct 31 16:14 myfile
$ stat myfile
  File: `myfile'
  Size: 114383          Blocks: 232        IO Block: 4096   regular file
Device: 341h/833d       Inode: 1276722     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1010/    jimbo)   Gid: ( 1010/    jimbo)
Access: 2006-10-31 16:15:08.423715488 -0400
Modify: 2006-10-31 16:14:52.684417746 -0400
Change: 2006-10-31 16:14:52.684417746 -0400
Yup, it's there all right. OK, go ahead and oops it:

$ rm myfile
$ ls -l myfile
ls: myfile: No such file or directory
$ stat myfile
stat: cannot stat `myfile': No such file or directory
$
It's gone.

At this point, you must not allow the process still using the file to exit, because once that happens, 
the file will really be gone and your troubles will intensify. Your background less process in this walkthrough 
isn't going anywhere (unless you kill the process or exit the shell), but if this were a video or sound file that 
you were playing, the first thing to do at the point where you realize you deleted the file would be to 
immediately pause the application playback, or otherwise freeze the process, so that it doesn't eventually 
stop playing the file and exit. 

Now to bring the file back. First see what lsof has to say about it:

$ lsof | grep myfile
less      4158    jimbo    4r      REG       3,65   114383   1276722 /home/jimbo/myfile (deleted)
The first column gives you the name of the command associated with the process, the second column is the 
process id, and the number in the fourth column is the file descriptor (the "r" means that it's a regular file). 
Now you know that process 4158 still has the file open, and you know the file descriptor, 4. That's everything 
you have to know to copy it out of /proc.

You might think that using the -a flag with cp is the right thing to do here, since you're restoring the file -- 
but it's actually important that you don't do that. Otherwise, instead of copying the literal data contained 
in the file, you'll be copying a now-broken symbolic link to the file as it once was listed in its original directory:

$ ls -l /proc/4158/fd/4
lr-x------  1 jimbo jimbo 64 Oct 31 16:18 /proc/4158/fd/4 -> /home/jimbo/myfile (deleted)
$ cp -a /proc/4158/fd/4 myfile.wrong
$ ls -l myfile.wrong
lrwxr-xr-x  1 jimbo jimbo 24 Oct 31 16:22 myfile.wrong -> /home/jimbo/myfile (deleted)
$ file myfile.wrong
myfile.wrong: broken symbolic link to `/home/jimbo/myfile (deleted)'
$ file /proc/4158/fd/4
/proc/4158/fd/4: broken symbolic link to `/home/jimbo/myfile (deleted)'
So instead of all that, just a plain old cp will do the trick:

$ cp /proc/4158/fd/4 myfile.saved 

And finally, verify that you've done good:

$ ls -l myfile.saved
-rw-r--r--  1 jimbo jimbo 114383 Oct 31 16:25 myfile.saved
$ man lsof | col -b > myfile.new
$ cmp myfile.saved myfile.new
No complaints from cmp -- your restoration is the real deal.

Incidentally, there are a lot of useful things you can do with lsof in addition to rescuing lost files.




32.7 Some notes about disks on x86 systems: MBR and Partition Bootsector:
=========================================================================

The following applies to PC's and x86 based Servers.

There are two sectors on the disk that are critical to starting the computer:

- Master Boot Record
- Partition Boot Sector

The MBR is created when you create the first partition on the harddisk.
The location is always cylinder 0, head 0 and sector 1.

The MBR contains the Partition Table for the disk and a small amount of executable code.
On x86 machines, this executable code examines the Partition Table and identifies
the system partition. The code then finds the system partition's starting location on the disk,
and loads an copy of its Partition Boot Sector into memory.

If you would take a look at the MBR, you would find:

The first 446 bytes in the sector is the MBR.
After that, you would see the Partition Table, a 64 byte structure. Each table entry is 16 bytes long,
the first byte being the Boot Indicator field. This tells the code which partition is bootable.

The Partition Boot Sector, has its own "layout" depending on the type of system.


32.8 How to get LUN ID's:
=========================

# lscfg -vl hdiskx
# lsattr -El hdiskx

ZD110L05
600507680190014DC000000000000304

ZD110L08
600507680190014DC000000000000305

ZD111L05
600507680190014DC000000000000306

ZD111L08
600507680190014DC000000000000307





#############################
33. Filesystems in Linux:
#############################


33.1 Disks:
===========

Linux on x86 systems, have the following (storage) devices:

-- Entire harddisks are listed as devices without numbers, such as "/dev/hda" or "/dev/sda".

- IDE:

/dev/hda    is the primary IDE master drive,
/dev/hdb    is the primary IDE slave drive,
/dev/hdc    is the secondary IDE master,
/dev/hdd    is the secondary IDE slave,

- SCSI:
/dev/sda   is the first SCSI interface and 1st device id number
etc..

-- Partitions on a disk are referred to with a number such as

/dev/hda1


Floppydrive:

/dev/fd0
# mount -t auto /dev/fd0 /mnt/floppy
# mount -t vfat /dev/fd0 /mnt/floppy
# mount /dev/fd0 /mnt/floppy

Zipdrive:

# insmod ppa       # load the module
# mount -t vfat /dev/sda /mnt/zip


33.2 Filesystems:
=================

Linux supports a huge number of filesystems, including FAT, JFS, NTFS etc.. But the most common are ext2 and ext3.
For the "native" filesystems, we take a look at the following FS's:

- ReiserFS   
A journaled filesystem

- Ext2
The most popular filesystem for years. But it does not use a log/jounal,
so gradually it becomes less important.

- Ext3
Very related to Ext2, but this one supports journaling.
An Ext2 filesystem can easily be upgraded to Ext3.


33.3 Adding a disk in Linux:
============================

Suppose you have SCSI card on with a disk is attached.  
The disk as a whole would be refferred to as "/dev/sda" and the
first partition would be referred to as "/dev/sda1".

But we have a new disk here.
If you cannot find the device files /dev/sda in /dev, you might
create it with the /dev/MAKEDEV script:

# cd /dev
# ./MAKEDEV sda

The disk is now ready to be partitioned. In this example, we plan
to create 3 partitions, including a swap partition.

# fdisk /dev/sda
The number of cylinders for this disk is set to ..
(.. more output..)
Command:

The fdisk program is interactive; pressing m displays a list of all its commands.

Command: new
Command action
  e extended
  p primary partition (1-4): 1
(.. more output..)

Command: print

Device           Boot    Start   End   Blocks   Id   System
/dev/sda1                1       255   2048256  83   Linux

So we have created our first partition.
We now create the swap partition:

Command: new
Command action
  e extended
  p primary partition (1-4): 2
(.. more output..)

Command: type
Partition number (1-4): 2
Hex code: 82              # which is a Linix swap partition
Changed system type of partition 2 to 82 (Linux swap)

The third partition can be created in a similar way.
We now would like to see a listing of our partitions

Command: print

Device           Boot    Start   End   Blocks   Id   System
/dev/sda1                1       255   2048256  83   Linux
/dev/sda2                256     511   2056320  82   Swap
/dev/sda3                512    5721  41849325  83   Linux


Now, save the label to the disk:

Command: write
(.. more output..)

Ofcourse, we now would like to create the filesystems and the swap.

If you want to use the Ext2 filesystem on partition one, use the following command:

# mke2fs /dev/sda1 2048256       ( or # mkfs -t ext2 -b 4096 /dev/sda1 )

Lets check the filesystem with fsck:
# fsck -f /dev/sda1

A new filesystem can be mounted as soon as the mount point is created.

# mkdir /bkroot
# mount /dev/sda1 /bkroot

Lets now create the swap space:
# mkswap -c /dev/sda2 2056320

and activate it using the command:

# swapon /dev/sda2

See also section 34.3 for administering swap space on Linux.



33.4 Notes about Linux and LVM:
==============================


Note 1:
=======


-What is RAID and LVM 
-Initial setup of a RAID-5 array 
-Initial setup of LVM on top of RAID 
-Handling a Drive Failure 
-Common Glitches 
-Other Useful Resources 
-Expanding an Array/Filesytem 

--------------------------------------------------------------------------------

-What is RAID and LVM
RAID is usually defined as Redundant Array of Inexpensive disks. It is normally used to spread data among several 
physical hard drives with enough redundancy that should any drive fail the data will still be intact. 
Once created a RAID array appears to be one device which can be used pretty much like a regular partition. 
There are several kinds of RAID but I will only refer to the two most common here. 
The first is RAID-1 which is also known as mirroring. With RAID-1 it's basically done with two essentially 
identical drives, each with a complete set of data. The second, the one I will mostly refer to in this guide 
is RAID-5 which is set up using three or more drives with the data spread in a way that any one drive failing 
will not result in data loss. The Red Hat website has a great overview of the RAID Levels. 

There is one limitation with Linux Software RAID that a /boot parition can only reside on a RAID-1 array. 

Linux supports both several hardware RAID devices but also software RAID which allows you to use any IDE or 
SCSI drives as the physical devices. In all cases I'll refer to software RAID. 

LVM stands for Logical Volume Manager and is a way of grouping drives and/or partition in a way where instead 
of dealing with hard and fast physical partitions the data is managed in a virtual basis where the virtual 
partitions can be resized. The Red Hat website has a great overview of the Logical Volume Manager. 

There is one limitation that a LVM cannot be used for the /boot. 


--------------------------------------------------------------------------------

Initial set of a RAID-5 array
I recommend you experiment with setting up and managing RAID and LVM systems before using it on an 
important filesystem. One way I was able to do it was to take old hard drive and create a bunch of 
partitions on it (8 or so should be enough) and try combining them into RAID arrays. 
In my testing I created two RAID-5 arrays each with 3 partitions. You can then manually fail and hot remove 
the partitions from the array and then add them back to see how the recovery process works. You'll get a warning 
about the partitions sharing a physical disc but you can ignore that since it's only for experimentation. 
In my case I have two systems with RAID arrays, one with two 73G SCSI drives running RAID-1 (mirroring) and my other 
test system is configured with three 120G IDE drives running RAID-5. In most cases I will refer to my RAID-5 
configuration as that will be more typical. 

I have an extra IDE controller in my system to allow me to support the use of more than 4 IDE devices which caused a very odd drive assignment. 
The order doesn't seem to bother the Linux kernel so it doesn't bother me. My basic configuration is as follows: 

hda 120G drive
hdb 120G drive
hde 60G boot drive not on RAID array
hdf 120G drive
hdg CD-ROM drive

The first step is to create the physical partitions on each drive that will be part of the RAID array. 
In my case I want to use each 120G drive in the array in it's entirety. All the drives are partitioned identically 
so for example, this is how hda is partitioned: 

Disk /dev/hda: 120.0 GB, 120034123776 bytes
16 heads, 63 sectors/track, 232581 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1      232581   117220792+  fd  Linux raid autodetect

So now with all three drives with a partitioned with id fd Linux raid autodetect you can go ahead and combine 
the paritions into a RAID array: 

# /sbin/mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 \
	/dev/hdb1 /dev/hda1 /dev/hdf1

Wow, that was easy. That created a special device /dev/md0 which can be used instead of a physical parition. 
You can check on the status of that RAID array with the mdadm command: 

# /sbin/mdadm --detail /dev/md0
        Version : 00.90.01
  Creation Time : Wed May 11 20:00:18 2005
     Raid Level : raid5
     Array Size : 234436352 (223.58 GiB 240.06 GB)
    Device Size : 117218176 (111.79 GiB 120.03 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun 10 04:13:11 2005
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 36161bdd:a9018a79:60e0757a:e27bb7ca
         Events : 0.10670

    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1       3       65        1      active sync   /dev/hdb1
       2      33       65        2      active sync   /dev/hdf1

The important lines to see are the State line which should say clean otherwise there might be a problem. 
At the bottom you should make sure that the State column always says active sync which says each device 
is actively in the array. You could potentially have a spare device that's on-hand should any drive should fail. 
If you have a spare you'll see it listed as such here. 
One thing you'll see above if you're paying attention is the fact that the size of the array is 240G but I 
have three 120G drives as part of the array. That's because the extra space is used as extra parity data that is 
needed to survive the failure of one of the drives. 


--------------------------------------------------------------------------------

- Initial set of LVM on top of RAID
Now that we have /dev/md0 device you can create a Logical Volume on top of it. Why would you want to do that? 
If I were to build an ext3 filesystem on top of the RAID device and someday wanted to increase it's capacity 
I wouldn't be able to do that without backing up the data, building a new RAID array and restoring my data. 
Using LVM allows me to expand (or contract) the size of the filesystem without disturbing the existing data. 
Anyway, here are the steps to then add this RAID array to the LVM system. The first command pvcreate will 
"initialize a disk or parition for use by LVM". The second command vgcreate will then create the Volume Group, 
in my case I called it lvm-raid: 

# pvcreate /dev/md0
# vgcreate lvm-raid /dev/md0

The default value for the physical extent size can be too low for a large RAID array. In those cases you'll need 
to specify the -s option with a larger than default physical extent size. The default is only 4MB as of the 
version in Fedora Core 5. For example, to successfully create a 550G RAID array a size of 2G works well: 

# vgcreate -s 2G <volume group name>

Ok, you've created a blank receptacle but now you have to tell how many Physical Extents from the 
physical device (/dev/md0 in this case) will be allocated to this Volume Group. In my case I wanted all the data 
from /dev/md0 to be allocated to this Volume Group. If later I wanted to add additional space I would create 
a new RAID array and add that physical device to this Volume Group. 
To find out how many PEs are available to me use the vgdisplay command to find out how many are available 
and now I can create a Logical Volume using all (or some) of the space in the Volume Group. 
In my case I call the Logical Volume lvm0. 

# vgdisplay lvm-raid
	.
	.
   Free  PE / Size       57235 / 223.57 GB

# lvcreate -l 57235 lvm-raid -n lvm0

In the end you will have a device you can use very much like a plain 'ol parition called /dev/lvm-raid/lvm0. 
You can now check on the status of the Logical Volume with the lvdisplay command. The device can then be used to to create a filesystem on. 

# lvdisplay /dev/lvm-raid/lvm0 
  --- Logical volume ---
  LV Name                /dev/lvm-raid/lvm0
  VG Name                lvm-raid
  LV UUID                FFX673-dGlX-tsEL-6UXl-1hLs-6b3Y-rkO9O2
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                223.57 GB
  Current LE             57235
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:2

# mkfs.ext3 /dev/lvm-raid/lvm0
	.
	.
# mount /dev/lvm-raid/lvm0 /mnt

# df -h /mnt
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/lvm--raid-lvm0
                       224G   93M  224G   1% /mnt


--------------------------------------------------------------------------------

- Handling a Drive Failure
As everything eventually does break (some sooner than others) a drive in the array will fail. It is a very good idea 
to run smartd on all drives in your array (and probably ALL drives period) to be notified of a failure 
or a pending failure as soon as possible. You can also manually fail a partition, meaning to take it out 
of the RAID array, with the following command: 

# /sbin/mdadm /dev/md0 -f /dev/hdb1
mdadm: set /dev/hdb1 faulty in /dev/md0

Once the system has determined a drive has failed or is otherwise missing (you can shut down and pull out a drive 
and reboot to similate a drive failure or use the command to manually fail a drive above it will show something 
like this in mdadm: 

# /sbin/mdadm --detail /dev/md0
     Update Time : Wed Jun 15 11:30:59 2005
           State : clean, degraded
  Active Devices : 2
 Working Devices : 2
  Failed Devices : 1
   Spare Devices : 0
	.
	.
     Number   Major   Minor   RaidDevice State
        0       3        1        0      active sync   /dev/hda1
        1       0        0        -      removed
        2      33       65        2      active sync   /dev/hdf1

You'll notice in this case I had /dev/hdb fail. I replaced it with a new drive with the same capacity and was able 
to add it back to the array. The first step is to partition the new drive just like when first creating the array. 
Then you can simply add the partition back to the array and watch the status as the data is rebuilt onto the newly replace drive. 

# /sbin/mdadm /dev/md0 -a /dev/hdb1
# /sbin/mdadm --detail /dev/md0
     Update Time : Wed Jun 15 12:11:23 2005
           State : clean, degraded, recovering
  Active Devices : 2
 Working Devices : 3
  Failed Devices : 0
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 64K

  Rebuild Status : 2% complete
	.
	.

During the rebuild process the system performance may be somewhat impacted but the data should remain in-tact. 
--------------------------------------------------------------------------------

- Expanding an Array/Filesytem
The answer to how to expand a RAID-5 array is very simple: You can't. 
I'm used to working with a NetApp Filer where you plug in a drive, type a simple command and that drive was added 
to the existing RAID array, no muss, no fuss. While you can't add space to a RAID-5 array directly in Linux you CAN 
add space to an existing Logical Volume and then expand the ext3  filesytem on top of it. That's the main reason you 
want to run LVM on top of RAID. 

Before you start it's probably a good idea to back up your data just in case something goes wrong. 

Assuming you want your data to be protected from a drive failing you'll need to create another RAID array 
per the instructions above. In my case I called it /dev/md1  so after partitioning I can create the array: 

# /sbin/mdadm --create --verbose /dev/md1 --level=5 --raid-devices=3 \
	/dev/hde1 /dev/hdg1 /dev/hdh1
# /sbin/mdadm --detail /dev/md1

The next couple steps will add the space from the new RAID array to the space available to be used by Logical Volumes. 
You then check to see how many Physical Extents you have and add them to the Logical Volume you're using. 
Remember that since you can have multiple Logical Volumes on top of a physical RAID array you need to do this extra step. 

# vgextend lvm-raid /dev/md1
# vgdisplay lvm-raid
	.
	.
	.
  Alloc PE / Size       57235 / 223.57 GB
  Free  PE / Size       57235 / 223.57 GB
# lvextend -l 57235 lvm-raid -n lvm0

There, you now have a much larger Logical Volume which is using space on two separate RAID arrays. 
You're not done yet, you now have to extend your filesystem to make use of all that new space. Fortunately this 
is easy on FC4 and RHEL4 since there is a command to expand a ext3  filesytem without even unmounting it! 
Be patient, expanding the file system takes a while. 

# lvdisplay /dev/lvm-raid/lvm0
	.
	.
  LV Size                447.14 GB
	.
# df /raid-array
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/lvm--raid-lvm0
                     230755476  40901348 178132400  19% /raid-array
# ext2online /dev/lvm-raid1/lvm0 447g
Get yourself a sandwich
# df /raid-array
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/lvm--raid-lvm0
                     461510952  40901348 40887876   9% /raid-array

Congrats, you now have more space. Now go fill it with something. 



Note 2:
=======

Creating a LVM in Linux
 

I am sure anybody who have used windows (2000 and above) have come across the term dynamic disks. 
Linux/Unix also have its own dynamic disk management called LVM.

What is an LVM ?

LVM stands for Logical Disk Manager which is the fundamental way to manage UNIX/Linux storage systems 
in a scalable manner. An LVM abstracts disk devices into pools of storage space called Volume Groups. 
These volume groups are in turn subdivided into virtual disks called Logical Volumes. The logical volumes 
may be used just like regular disks with filesystem created on them and mounted in the Unix/Linux 
filesystem tree. The logical volumes can span multiple disks. Even though a lot of companies have implemented 
their own LVM's for *nixes, the one created by Open Software Foundation (OSF) was integrated into many 
Unix systems which serves as a base for the Linux implementation of LVM.

Note: Sun Solaris ships with LVM from Veritas which is substantially different from the OSF implementation.

Benefits of Logical Volume Management

LVM created in conjunction with RAID can provide fault tolerance coupled with scalability and easy disk management. 
Create a logical volume and filesystem which spans multiple disks.

By creating virtual pools of space, an administrator can create dozens of small filesystems for different projects 
and add space to them as needed without (much) disruption. When a project ends, he can remove the space a
nd put it back into the pool of free space.

Note : Before you move to implement LVM's in linux, make sure your kernel is 2.4 and above. Or else you will have 
to recompile your kernel from source to include support for LVM.

LVM Creation
To create a LVM, we follow a three step process.

Step One : We need to select the physical storage resources that are going to be used for LVM. Typically, these 
are standard partitions but can also be Linux software RAID volumes that we've created. In LVM terminology, 
these storage resources are called "physical volumes" (eg: /dev/hda1, /dev/hda2 ... etc).

Our first step in setting up LVM involves properly initializing these partitions so that they can be recognized 
by the LVM system. This involves setting the correct partition type (usually using the fdisk command, and entering 
the type of partition as 'Linux LVM' - 0x8e ) if we're adding a physical partition; and then running 
the pvcreate command.

# pvcreate /dev/hda1 /dev/hda2 /dev/hda3
# pvscan

The above step creates a physical volume from 3 partitions which I want to initialize for inclusion 
in a volume group.

Step Two : Creating a volume group. You can think of a volume group as a pool of storage that consists of one 
or more physical volumes. While LVM is running, we can add physical volumes to the volume group or even remove them.

First initialize the /etc/lvmtab and /etc/lvmtab.d files by running the following command:

# vgscan

Now you can create a volume group and assign one or more physical volumes to the volume group.

# vgcreate my_vol_grp /dev/hda1 /dev/hda2

Behind the scenes, the LVM system allocates storage in equal-sized "chunks", called extents. 
We can specify the particular extent size to use at volume group creation time. The size of an extent 
defaults to 4Mb, which is perfect for most uses.You can use the -s flag to change the size of the extent. 
The extent affects the minimum size of changes which can be made to a logical volume in the volume group, 
and the maximum size of logical and physical volumes in the volume group. A logical volume can contain at most 
65534 extents, so the default extent size (4 MB) limits the volume to about 256 GB; a size of 1 TB would require 
extents of atleast 16 MB. So to accomodate a 1 TB size, the above command can be rewriten as :

# vgcreate -s 16M my_vol_grp /dev/hda1 /dev/hda2

You can check the result of your work at this stage by entering the command:

# vgdisplay

This command displays the total physical extends in a volume group, size of each extent, 
the allocated size and so on.

Step Three : This step involves the creation of one or more "logical volumes" using our volume group storage pool. 
The logical volumes are created from volume groups, and may have arbitary names. The size of the new volume 
may be requested in either extents (-l switch) or in KB, MB, GB or TB ( -L switch) rounding up to whole extents.

# lvcreate -l 50 -n my_logical_vol my_vol_grp

The above command allocates 50 extents of space in my_vol_grp to the newly created my_logical_vol. 
The -n switch specifies the name of the logical volume we are creating.

Now you can check if you got the desired results by using the command :

# lvdisplay

which shows the information of your newly created logical volume.

Once a logical volume is created, we can go ahead and put a filesystem on it, mount it, and start using 
the volume to store our files. For creating a filesystem, we do the following:

# mke2fs -j /dev/my_vol_grp/my_logical_vol

The -j signifies journaling support for the ext3 filesystem we are creating.
Mount the newly created file system :

# mount /dev/my_vol_grp/my_logical_vol /data
Also do not forget to append the corresponding line in the /etc/fstab file:

#File: /etc/fstab
/dev/my_vol_grp/my_logical_vol /data ext3 defaults 0 0
Now you can start using the newly created logical volume accessable at /data mount point.
Next : Resizing Logical Volumes


Some more on Linux LVM commands:


Linux vgcreate command:
=======================

Linux / Unix Command: vgcreate 
 
 Command Library  

NAME
vgcreate - create a volume group   
SYNOPSIS
vgcreate [-A|--autobackup {y|n}] [-d|--debug] [-h|--help] [-l|--maxlogicalvolumes MaxLogicalVolumes] 
[-p|--maxphysicalvolumes MaxPhysicalVolumes] [-s|--physicalextentsize PhysicalExtentSize[kKmMgGtT]] 
[-v|--verbose] [--version] VolumeGroupName PhysicalVolumePath [PhysicalVolumePath...]   

DESCRIPTION
vgcreate creates a new volume group called VolumeGroupName using the block special device 
PhysicalVolumePath previously configured for LVM with pvcreate(8).   

OPTIONS
-A, --autobackup {y|n} 
      Controls automatic backup of VG metadata after the change (see vgcfgbackup(8)). Default is yes. 
-d, --debug 
      Enables additional debugging output (if compiled with DEBUG). 
-h, --help 
      Print a usage message on standard output and exit successfully. 
-l, --maxlogicalvolumes MaxLogicalVolumes 
      Sets the maximum possible logical volume count. More logical volumes can't be created in this volume group. 
      Absolute maximum is 256. 
-p, --maxphysicalvolumes MaxPhysicalVolumes 
      Sets the maximum possible physical volume count. More physical volumes can't be included in this volume group. Absolute maximum is 256. 
-s, --physicalextentsize PhysicalExtentSize[kKmMgGtT] 
      Sets the physical extent size on physical volumes of this volume group. A size suffix 
      (k for kilobytes up to t for terabytes) is optional, megabytes is the default if no suffix is present. 
      Values can be from 8 KB to 16 GB in powers of 2. The default of 4 MB causes maximum LV sizes of ~256GB 
      because as many as ~64k extents are supported per LV. In case larger maximum LV sizes are needed (later), 
      you need to set the PE size to a larger value as well. Later changes of the PE size in an existing VG are 
      not supported. 
-v, --verbose 
      Display verbose runtime information about vgcreate's activities. 
--version 
      Display tool and IOP version and exit successfully. 
  
EXAMPLES
To create a volume group named test_vg using physical volumes /dev/hdk1, /dev/hdl1, and /dev/hdm1 
with default physical extent size of 4MB: 

# vgcreate test_vg /dev/sd[k-m]1

To create a volume group named test_vg using physical volumes /dev/hdk1, and /dev/hdl1 with default 
physical extent size of 4MB:

# vgcreate test_vg /dev/sdk1 /dev/sdl1

NOTE: If you are using devfs it is essential to use the full devfs name of the device rather than the 
symlinked name in /dev. so: the above could be 

# vgcreate test_vg /dev/scsi/host1/bus0/target[1-3]/lun0/part1



Linux vgextend command:
=======================


Linux / Unix Command: vgextend 
 
 Command Library  

NAME
vgextend - add physical volumes to a volume group   

SYNOPSIS
vgextend [-A|--autobackup{y|n}] [-d|--debug] [-h|--help] [-v|--verbose] VolumeGroupName 
         PhysicalVolumePath [PhysicalVolumePath...]   

DESCRIPTION
vgextend allows you to add one or more initialized physical volumes ( see pvcreate(8) ) to an existing 
volume group to extend it in size.   

OPTIONS
-A, --autobackup y/n 
Controls automatic backup of VG metadata after the change ( see vgcfgbackup(8) ). Default is yes. 
-d, --debug 
Enables additional debugging output (if compiled with DEBUG). 
-h, --help 
Print a usage message on standard output and exit successfully. 
-v, --verbose 
Gives verbose runtime information about lvextend's activities. 
  
Examples

# vgextend vg00 /dev/sda4 /dev/sdn1

tries to extend the existing volume group "vg00" by the new physical volumes (see pvcreate(8) ) 
"/dev/sdn1" and /dev/sda4".   


Linux pvcreate command:
=======================

Linux / Unix Command: pvcreate 
 
 Command Library  

NAME
pvcreate - initialize a disk or partition for use by LVM   

SYNOPSIS
pvcreate [-d|--debug] [-f[f]|--force [--force]] [-y|--yes] [-h|--help] [-v|--verbose] [-V|--version] 
         PhysicalVolume [PhysicalVolume...]   

DESCRIPTION
pvcreate initializes PhysicalVolume for later use by the Logical Volume Manager (LVM). Each PhysicalVolume 
can be a disk partition, whole disk, meta device, or loopback file. For DOS disk partitions, 
the partition id must be set to 0x8e using fdisk(8), cfdisk(8), or a equivalent. For whole disk devices 
only the partition table must be erased, which will effectively destroy all data on that disk. This can be done 
by zeroing the first sector with: 

# dd if=/dev/zero of=PhysicalVolume bs=512 count=1 

Continue with vgcreate(8) to create a new volume group on PhysicalVolume, or vgextend(8) to add PhysicalVolume 
to an existing volume group.   

OPTIONS
-d, --debug 
      Enables additional debugging output (if compiled with DEBUG). 
-f, --force 
      Force the creation without any confirmation. You can not recreate (reinitialize) a physical volume belonging 
      to an existing volume group. In an emergency you can override this behaviour with -ff. In no case case can you 
      initialize an active physical volume with this command. 
-s, --size 
      Overrides the size of the physical volume which is normally retrieved. Useful in rare case where this value 
      is wrong. More useful to fake large physical volumes of up to 2 Terabyes - 1 Kilobyte on smaller devices 
      for testing purposes only where no real access to data in created logical volumes is needed. If you wish 
      to create the supported maximum, use "pvcreate -s 2147483647k PhysicalVolume [PhysicalVolume ...]". 
      All other LVM tools will use this size with the exception of lvmdiskscan(8) 
-y, --yes 
      Answer yes to all questions. 
-h, --help 
      Print a usage message on standard output and exit successfully. 
-v, --verbose 
      Gives verbose runtime information about pvcreate's activities. 
-V, --version 
      Print the version number on standard output and exit successfully. 
  
Example

Initialize partition #4 on the third SCSI disk and the entire fifth SCSI disk for later use by LVM: 

# pvcreate /dev/sdc4 /dev/sde 



33.5 Installing a Cluster filesystem on Linux:
==============================================

Suppose, in this example, we have 2 Linux nodes, and we want to create a scsi attached shared disksystem.
We plan to use OCFS2 as the Clustered FileSystem.

First, we partition the disks to raw volumes.

This example uses /dev/sdb (an empty SCSI disk with no existing partitions) to create a single partition for the entire disk (36 GB). 
We will do this for all disks.


Ex:
# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 4427.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
 (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 255 heads, 63 sectors, 4427 cylinders
Units = cylinders of 16065 * 512 bytes

 Device Boot Start End Blocks Id System

Command (m for help): n
Command action
 e extended
 p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-4427, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-4427, default 4427):
Using default value 4427

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: If you have created or modified any DOS 6.x
partitions, please see the fdisk manual page for additional
information.
Syncing disks.


Now verify the new partition: 
Ex:
# fdisk -l /dev/sdb

Disk /dev/sdb: 36.4 GB, 36420075008 bytes
255 heads, 63 sectors/track, 4427 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1        4427    35559846   83  Linux

Repeat the above steps for each disk to be partitioned.   Disk partitioning should be done from one node only.  
When finished partitioning, run the 'partprobe' command as root on each of the remaining cluster nodes in order to assure 
that the new partitions are configured.

Ex:
# partprobe

  
Oracle Cluster File System (OCFS) Release 2
-------------------------------------------

OCFS2 is a general-purpose cluster file system that can be used to store Oracle Clusterware files, Oracle RAC database files,
 Oracle software, or any other types of files normally stored on a standard filesystem such as ext3.  
This is a significant change from OCFS Release 1, which only supported Oracle Clusterware files and Oracle RAC database files.   

Obtain OCFS2

OCFS2 is available free of charge from Oracle as a set of three RPMs:  a kernel module, support tools, and a console.  
There are different kernel module RPMs for each supported Linux kernel so be sure to get the OCFS2 kernel module for your Linux kernel.  
OCFS2 kernel modules may be downloaded from http://oss.oracle.com/projects/ocfs2/files/ and the tools and console may be downloaded from 
http://oss.oracle.com/projects/ocfs2-tools/files/.  

To determine the kernel-specific module that you need, use uname -r. 

# uname -r
2.6.9-22.ELsmp

For this example I downloaded:
ocfs2console-1.0.3-1.i386.rpm
ocfs2-tools-1.0.3-1.i386.rpm
ocfs2-2.6.9-22.ELsmp-1.0.7-1.i686.rpm 

>>> Install OCFS2 as root on each cluster node 

# rpm -ivh ocfs2console-1.0.3-1.i386.rpm \
ocfs2-tools-1.0.3-1.i386.rpm \
ocfs2-2.6.9-22.ELsmp-1.0.7-1.i686.rpm

Preparing...                ########################################### [100%]
   1:ocfs2-tools            ########################################### [ 33%]
   2:ocfs2console           ########################################### [ 67%]
   3:ocfs2-2.6.9-22.ELsmp   ########################################### [100%]
Configure OCFS2 

Run ocfs2console as root: 
# ocfs2console


Now a Graphical interface will appear:

Select Cluster ? Configure Nodes
Click on Add and enter the Name and IP Address of each node in the cluster

Once all of the nodes have been added, click on Cluster --> Propagate Configuration.  This will copy the OCFS2 configuration file 
to each node in the cluster.  You may be prompted for root passwords as ocfs2console uses ssh to propagate the configuration file.  
Leave the OCFS2 console by clicking on File --> Quit.  It is possible to format and mount the OCFS2 partitions using the ocfs2console GUI; however, 
this guide will use the command line utilities. 


>>> Enable OCFS2 to start at system boot: 

As root, execute the following command on each cluster node to allow the OCFS2 cluster stack to load at boot time:
/etc/init.d/o2cb enable
Ex:
# /etc/init.d/o2cb enable


Writing O2CB configuration: OK
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK


 Starting cluster ocfs2: OK


>>> Create a mount point for the OCFS filesystem 

As root on each of the cluster nodes, create the mount point directory for the OCFS2 filesystem
Ex:
# mkdir /u03



>>> Create the OCFS2 filesystem on the unused disk partition:

The example below creates an OCFS2 filesystem on the unused /dev/sdc1 partition with a volume label of "/u03" (-L /u03), a block size of 4K (-b 4K) 
and a cluster size of 32K (-C 32K) with 4 node slots (-N 4).  See the OCFS2 Users Guide for more information on mkfs.ocfs2 command line options.

Ex:
# mkfs.ocfs2 -b 4K -C 32K -N 4 -L /u03 /dev/sdc1

mkfs.ocfs2 1.0.3
Filesystem label=/u03
Block size=4096 (bits=12)
Cluster size=32768 (bits=15)
Volume size=36413280256 (1111245 clusters) (8889960 blocks)
35 cluster groups (tail covers 14541 clusters, rest cover 32256 clusters)
Journal size=33554432
Initial number of node slots: 4
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing lost+found: done
mkfs.ocfs2 successful


>>> Mount the OCFS2 filesystem:

Since this filesystem will contain the Oracle Clusterware files and Oracle RAC database files, we must ensure that all I/O 
to these files uses direct I/O (O_DIRECT).  Use the "datavolume" option whenever mounting the OCFS2 filesystem to enable direct I/O.  
Failure to do this can lead to data loss in the event of system failure.

Ex:
# mount -t ocfs2 -L /u03 -o datavolume /u03

Notice that the mount command uses the filesystem label (-L  u03) used during the creation of the filesystem. This is a handy way to refer 
to the filesystem without having to remember the device name. 

To verify that the OCFS2 filesystem is mounted, issue the mount command or run df: 

# mount -t ocfs2
/dev/sdc1 on /u03 type ocfs2 (rw,_netdev,datavolume)

# df /u03
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdc1             35559840    138432  35421408   1% /u03

The OCFS2 filesystem can now be mounted on the other cluster nodes. 

To automatically mount the OCFS2 filesystem at system boot, add a line similar to the one below to /etc/fstab on each cluster node: 
LABEL=/u03   /u03    ocfs2   _netdev,datavolume,nointr 0 0


Create the directories for shared files 
CRS files
mkdir /u03/oracrs
chown oracle:oinstall /u03/oracrs
chmod 775 /u03/oracrs

Database files
mkdir /u03/oradata
chown oracle:oinstall /u03/oradata
chmod 775 /u03/oradata




34. SWAP space:
===============


34.1 Solaris:
-------------

-- View swap space:
-- ----------------

The /usr/sbib/swap utility provides a method of adding, deleting, and monitoring the system swap areas
used by the memory manager.

# swap -l

The -l option can be used to list swap space. The system displays information like:
swapfile           dev      swaplo    blocks    free
/dev/dsk/c0t0d0s3  136,3        16    302384    302384

path  : the pathname for the swaparea. In this example the pathname is swapfile.
dev   : the major/minor device number is in decimal if it's a block special device; zeroes otherwise
swaplo: the offset in 512 byte blocks where usable swapspace begins
blocks: size in 512 byte blocks. The swaplen value can be adjusted as a kernel parameter.
free  : free 512 byte blocks.
The swap -l command does not include physical memory in it's calculation of swap space.

# swap -s

The -s option can be used to list a summary of the system's virtual swap space.
total: 31760k bytes allocated + 5952k reserved = 37712k used, 202928k available

These numbers are in 1024 byte blocks.

-- Add swap area's:
-- ----------------

There are 2 methods available for adding more swap to your system.

(1) create a secondary swap partition:
(2) create a swapfile in an existing UFS file system

(1) Creating a secondary swap partition requires additional unused diskspace. You must use the format coommand
to create a new partition and filesystem on a disk.
Suppose we have the /data directory currently on slice 5 and is 200MB in size.
- free up the /data directory (save the contents to another location )
- unmount /dev/dsk/c0t0d0s5
- use format:
  Enter partition id tag (unassigned): swap
  Enter partition permission flags (wm): wu
  Enter new starting cil(3400): return
  Enter partition size: return
  Then label the disk as follows
  Partition> la
  Ready to label disk? y

- Run the newfs command on that partition to create a fresh filesystem on slice 5
  newfs /dev/rdsk/c0t0d0s5
- Make an entry to the /etc/vfstab file
- Run the swapadd script to add the swap to your system as follows:
  /sbin/swapadd
- verify that the swap has been added with swap -l


(2) The other method to add more swap space is to use the mkfile and swap commands
to designate a part of an existing UFS filesystem as a supplementary swap area.
You can use it as a temporary solution, or as a solution for longer duration as well,
but a swap file is just another file in the filesystem, so you cannot unmount that
filesystem while the swapfile is in use.
The following steps enable you to add more swap space without repartitioning a disk.
- As root, use df -k to locate a suitable filesystem. Suppose /data looks allright
  for this purpose
- Use the mkfile command to add a 50MB swapfile named swapfile in the /data partition.

  mkfile 50m /data/swapfile

- use ls -l /data to verify that the file has been created.
  Notice that the sticky bit has automatically been set.
- Activate the swaparea with the swap command as follows:

  /usr/sbin/swap -a /data/swapfile

- verify that the swap has been added with swap -l
  The system responds something like this:
  
swapfile           dev      swaplo    blocks    free
/dev/dsk/c0t0d0s3  136,3        16    302384    302384
/data/swapfile       -          16    102384    102384

If this will be a permanent swaparea, add an entry for the swapfile in the vfstab file.
/data/swapfile - - swap - no -

-- Removing a swapfile:
-- --------------------

As root use the swap -d command to remove a swaparea is follows

swap -d /dev/dsk/c0t0d0s5  for a swap partition
swap -d /data/swapfile     for a swapfile

Use the swap -l command to verify that the swaparea is gone.
Edit the /etc/vfstab file and delete the entry for the swapfile if neccessary.

In case of a swapfile, just remove the file with rm /data/swapfile

-- Creating a Temporary File System:
-- ---------------------------------

Create a directory which will serve as the mount point for the TMPFS file system.
There is no command such as newfs to create a TMPFS file system before mounting it.
The TMPFS file system actually gets created in RAM when you execute the mount command
and specify a filesystem type of TMPFS. The following example creates a new directory
/export/data and mounts a TMPFS filesystem, limiting it to 25MB.

mount -F tmpfs -o size=25m swap /export/data 



34.2 AIX:
---------

The installation creates a default paging logical volume, hd6, on drive hdisk0,
also referred as primary paging space.

The reports from the "vmstat" and "topas" commands indicate the amount of paging space I/O that is
taking place. 

Showing paging space:
---------------------

The lsps -a command provides a snapshot of the current utilization of each of the paging spaces
on the system, while the lsps -s command provides a summary of the total active paging space
and its current utilization.

# lsps -a
Page Space    Physical Volume    Volume Group     Size    %Used    Active    Auto  Type
paging00      hdisk1             rootvg           80MB    1        yes       yes   lv
hd6           hdisk1             rootvg          256MB    1        yes       yes   lv

The /etc/swapspaces file specifies the paging-space devices that are activated by the swapon -a command.
A pagingspace is added to this file when its created by the mkps -a command, and removed from
the file when rmps is used. 

You can also try:

# pstat -s

Managing Paging space:
----------------------

The following commands are used to manage paging space:

chps      : changes the attributes of a paging space
lsps      : displays the characteristics of a paging space
pstat -s  : displays the characteristics of a paging space
mkps      : creates an additional paging space
rmps      : removes an inactive paging space
swapon    : activates a paging space
swapoff   : deactivates one or more paging spaces


Managing Paging behaviour:
--------------------------

Note 1:
-------


There are several page space allocation policies available in AIX.

- Deferred Page Space Allocation (DPSA) 
- Late Page Space Allocation (LPSA) 
- Early Page Space Allocation (EPSA) 
- Deferred page space allocation

The deferred page space allocation policy is the default policy in AIX. 

Late page space allocation LPSA
The AIX operating system provides a way to enable the late page space allocation policy, which means that the disk block 
for a paging space page is only allocated when the corresponding in-memory page is touched. 

Early page space allocation EPSA
If you want to ensure that a process will not be killed due to low paging conditions, this process can 
preallocate paging space by using the early page space allocation policy. 

Choosing between LPSA and DPSA with the vmo command:

Using the "vmo -o defps" command enables turning the deferred page space allocation, or DPSA, 
on or off in order to preserve the late page space allocation policy, or LPSA. 

Paging space and virtual memory
The vmstat command (avm column), ps command (SIZE, SZ), and other utilities report the amount 
of virtual memory actually accessed because with DPSA, the paging space might not get touched. 


Note 2:
-------

High paging space during online backup on AIX
  
 Technote (FAQ) 
  
Question 
During an online backup, you might see a high paging space usage, which will not be released 
even after online backup completion in DB2® Universal Database™ (DB2 UDB) Version 8. 
This problem does not occur during an offline backup.  
  
Cause 
Paging space usage increases during online database backups on AIX® 5.2 and 5.3. 
This is an expected behavior from ML4 of AIX 5L™ 5.2 and ML1 of AIX 5L 5.3 onwards.
During an online database backup operation, file pages are loaded into memory by AIX 
in order for the backup processes to read them. If DB2 UDB runs out of memory, 
AIX has to free memory to fit additional file pages into RAM. It does this by writing 
DB2 UDB shared memory segments out to paging space. When the backup completes, 
these pages in paging space are not released because they are still in use by the other 
DB2 UDB processes. They will only be freed when the database is deactivated.  
  
 
Answer 
To free up paging space without stopping the database, use the AIX tuning parameter lru_file_repage. 
It affects Virtual Memory Manager (VMM) page replacement. By setting this parameter to 0, 
you force the system to only free file pages when you run out of memory and to not write 
working pages out to paging space. This will stop paging use from increasing. 
To set this parameter to zero, use vmo command. For example:

vmo -o lru_file_repage=0

This parameter was introduced in ML4 of AIX5L 5.2 and ML1 of AIX5L 5.3. The default value is 1.  
 

Note 3:
-------

Warning: this is a trick. 

trick I have found to "reset" paging is to
increase page space by 1 PP and then decrease it by 1 PP. 
Decreasing the size of paging space causes the S to create a
new page space, copy everything to the new space, delete the old
recreate it at the new size. You'll need enough free disk space
to create a new page space.
 
Note 4:
-------


The VM kernel parameters minperm% and maxperm% affect the use of physical memory that can be used 
for file system caching and
govern when computational pages of memory get paged (swapped) to paging space. 
If these values have been changed recently, that could explain the results that you describe. 

When the (dynamic) value of numperm% drops below minperm%, it will cause the paging of computational pages to page space.

It would be interesting to know if the minperm% and maxperm% values were changed and, if so, what the former and current values are.


Note 5:
-------



Show paging space usage:

# lsps -a
# lsps -s

Increase paging space:

# chps -s 32 hd6   32x32MB

where we increased the size of hd6 with 30 LP's.

Reducing paging space:

# chps -d 1 hd6



where we decreased the size of hd6 with 1 LP.


mkps:
-----

To Add a Logical Volume for Additional Paging Space
mkps [ -a ] [ -n ] [ -t lv ] -s LogicalPartitions VolumeGroup [ PhysicalVolume ]

To create a paging space in volume group myvg that has four logical partitions and is activated immediately 
and at all subsequent system restarts, enter: 

# mkps  -a  -n  -s 4 myvg

To create a paging space in rootvg on hdisk0

# mkps -a -n -s 30 rootvg hdisk0

rmps:
-----

Before AIX 5L:
Active paging spaces cannot be removed. It must first be made inactive.
Use the chps command so the paging space is not used on the next restart.
After reboot, the paging space is inactive and can be removed with the rmps command.

AIX 51 or later:
Use the swapoff command to dynamically deactive the paging space, then use the rmps command.
# swapoff /dev/paging03
# rmps paging03

chps:
-----

As from AIX 5L you can use the chps -d command, to decrease the size of a paging space, 
without having to deactive it, then reboot, then remove, and then recreate it with a smaller size.
Decrease it with a number of LP's like:
# chps -d 2 paging03

chps -a {y|n} paging00 : specifies that the paging space paging00 is active (y) or inactive (n) at subsequent system restarts.
chps -s 10 paging02 : adds ten LPs to paging02 without rebooting.
chps -d 5 paging01 : removes five LPs from paging01 without rebooting.
chps -d 50 hd6 : removes fifty LPs from hd6 without rebooting.


List the active paging spaces:
------------------------------

# lsps -a     or lsps -s

# pg /etc/swapspaces
hd6:
         dev=/dev/hd6

paging00
         dev=/dev/paging00



Note on paging on AIX:
----------------------

If the amount of paging space is less than the amount of real memory in the system, it's possible the system 
will run out of paging space before real memory. This is because AIX performs early allocation of page space. 
When a page is referenced, real memory and paging space blocks are allocated. If there are less paging space blocks 
then real memory pages, paging space will be exhaused before all of real memory is consumed.

Early allocation algorithm
The second operating system's paging-space-slot-allocation method is intended for use in installations 
where this situation is likely, or where the cost of failure to complete is intolerably high. Aptly called early allocation, 
this algorithm causes the appropriate number of paging-space slots to be allocated at the time the 
virtual-memory address range is allocated, for example, with the malloc() subroutine. If there are not 
enough paging-space slots to support the malloc() subroutine, an error code is set. 
The early-allocation algorithm is invoked as follows:

# export PSALLOC=early
This example causes all future programs to be executed in the environment to use early allocation. 
The currently executing shell is not affected.

Early allocation is of interest to the performance analyst mainly because of its paging-space size implications. 
If early allocation is turned on for those programs, paging-space requirements can increase many times. 
Whereas the normal recommendation for paging-space size is at least twice the size of the system's real memory, 
the recommendation for systems that use PSALLOC=early is at least four times the real memory size. 
Actually, this is just a starting point. Analyze the virtual storage requirements of your workload and 
allocate paging spaces to accommodate them. As an example, at one time, the AIXwindows server required 250 MB of paging space 
when run with early allocation.

When using PSALLOC=early, the user should set a handler for the following SIGSEGV signal by pre-allocating and setting 
the memory as a stack using the sigaltstack function. Even though PSALLOC=early is specified, when there 
is not enough paging space and a program attempts to expand the stack, the program may receive the SIGSEGV signal.

Deferred allocation algorithm
The third operating system's paging-space-slot-allocation method is the default beginning with AIX 4.3.2 
Deferred Page Space Allocation (DPSA) policy delays allocation of paging space until it is necessary to page out the page, 
which results in no wasted paging space allocation. This method can save huge amounts of paging space, which means disk space.
Best to use Deffered.

On some systems, paging space might not ever be needed even if all the pages accessed have been touched. 
This situation is most common on systems with very large amount of RAM. However, this may result in overcommitment 
of paging space in cases where more virtual memory than available RAM is accessed.

To disable DPSA and preserve the Late Page Space Allocation policy, run the following command:

# vmo -o defps=0

To activate DPSA, run the following command:

# vmo -o defps=1

In general, system performance can be improved by DPSA, because the overhead of allocating page space after 
page faults is avoided the. Paging space devices need less disk space if DPSA is used




34.3 Linux:
-----------


-- Check the swapspace:

# cat /proc/meminfo 
# cat /proc/swaps
# /sbin/swapon -s

-- Creating swap space using a partition

Create a partition of the proper size using fdisk.
Format the partition, for example

# mkswap -c /dev/hda4

Enable the swap, for example

# swapon /dev/hd4

If you want the swap space enabled after boot, include the appropriate entry into /etc/fstab, for example
/dev/hda4  swap swap defaults 0 0

If you need to disable the swap, you can do it with
# swapoff /dev/hda4


-- Creating swap space using a swapfile

Create a file with the size of your swapfile
# dd if=/dev/zero of=/swapfile bs=1024 count=8192

Setup the file with the command
# mkswap /swapfile 8192

Enable the swap with the command
# swapon /swapfile

When you are done using the swapfile, you can turn it off and remove with
# swapoff /swapfile
# rm /swapfile


34.4: Note about swap:
----------------------

Page replacement in Linux 2.4 memory management
Rik van Riel 
Conectiva Inc. 
riel@conectiva.com.br, http://www.surriel.com/ 


Abstract 
While the virtual memory management in Linux 2.2 has decent performance for many workloads, it suffers from 
a number of problems. 
The first part of this paper contains a description of how the Linux 2.2 VMM works and an analysis of why 
it has bad behaviour in some situations. 
The way in which a lot of this behaviour has been fixed in the Linux 2.4 kernel is described in the second part of the paper. 
Due to Linux 2.4 being in a code freeze period while these improvements were implemented, only known-good solutions 
have been integrated. A lot of the ideas used are derived from principles used in other operating systems, 
mostly because we have certainty that they work and a good understanding of why, making them suitable for integration 
into the Linux codebase during a code freeze. 


--Linux 2.2 memory management 
The memory management in the Linux 2.2 kernel seems to be focussed on simplicity and low overhead. 
While this works pretty well in practice for most systems, it has some weak points left and simply falls 
apart under some scenarios. 

Memory in Linux is unified, that is all the physical memory is on the same free list and can be allocated 
to any of the following memory pools on demand. Most of these pools can grow and shrink on demand. 
Typically most of a system's memory will be allocated to the data pages of processes and the page and buffer caches. 

The slab cache: this is the kernel's dynamically allocated heap storage. This memory is unswappable, 
but once all objects within one (usually page-sized) area are unused, that area can be reclaimed. 

The page cache: this cache is used to cache file data for both mmap() and read() and is indexed by (inode, index) pairs. 
No dirty data exists in this cache; whenever a program writes to a page, the dirty data is copied to the buffer cache, 
from where the data is written back to disk. 

The buffer cache: this cache is indexed by (block device, block number) tuples and is used to cache raw disk devices, 
inodes, directories and other filesystem metadata. It is also used to perform disk IO on behalf of the page cache 
and the other caches. For disk reads the pagecache bypasses this cache and for network filesystems it isn't used at all. 

The inode cache: this cache resides in the slab cache and contains information about cached files in the system. 
Linux 2.2 cannot shrink this cache, but because of its limited size it does need to reclaim individual entries. 

The dentry cache: this cache contains directory and name information in a filesystem-independent way and is used 
to lookup files and directories. This cache is dynamically grown and shrunk on demand. 

SYSV shared memory: the memory pool containing the SYSV shared memory segments is managed pretty much like the page cache, 
but has its own infrastructure for doing things. 

Process mapped virtual memory: this memory is administrated in the process page tables. Processes can have page cache 
or SYSV shared memory segments mapped, in which case those pages are managed in both the page tables 
and the data structures used for respectively the page cache or the shared memory code. 


--Linux 2.2 page replacement 
The page replacement of Linux 2.2 works as follows. When free memory drops below a certain threshold, the pageout daemon (kswapd) 
is woken up. The pageout daemon should usually be able to keep enough free memory, but if it isn't, user programs will end up calling the pageout code itself. 

The main pageout loop is in the function try_to_free_pages, which starts by freeing unused slabs from the kernel memory pool. 
After that, it calls the following functions in a loop, asking each of them to scan a small part of their part of memory until enough memory has been freed. 


shrink_mmap is a classical clock algorithm, which loops over all physical pages, clearing referenced bits, 
queueing old dirty pages pages for IO and freeing old clean pages. The main disadvantage it has compared to a 
clock algorithm, however, is that it isn't able to free pages which are in use by a program or a shared memory segment. 
Those pages need to be unmapped by swap_out first. 

shm_swap scans the SYSV shared memory segments, swapping out those pages that haven't been referenced recently 
and which aren't mapped into any process. 

swap_out scans the virtual memory of all processes in the system, unmapping pages which haven't been referenced recently, 
starting swapout IO and placing those pages in the page cache. 

shrink_dcache_memory recaims entries from the VFS name cache. This is not directly reusable memory, but as soon as a whole page 
of these entries gets unused we can reclaim that page. 
Some balancing between these memory freeing function is achieved by calling them in a loop, starting of by asking 
each of these functions to scan a little bit of their memory, as each of these funnctions accepts a priority argument which tells them how big a percentage of their memory to scan. If not enough memory is freed in the first loop, the priority is increased and the functions are called again. The idea behind this scheme is that when one memory pool is heavily used, it will not give up its resources lightly and we'll automatically fall through to one of the other memory pools. However, this scheme relies on each of the memory pools to react in a similar way to the priority argument under different load conditions. This doesn't work out in practice because the memory pools just have fundamentally different properties to begin with. 


--Problems with the Linux 2.2 page replacement 

Balancing between evicting pages from the file cache, evicting unused process pages and evicting pages from shm segments. If memory pressure is "just right" shrink_mmap is always successful in freeing cache pages and a process which has been idle for a day is still in memory. This can even happen on a system with a fairly busy filesystem cache, but only with the right phase of moon. 

Simple NRU[Note] replacement cannot accurately identify the working set versus incidentally accessed pages and can lead to extra page faults. This doesn't hurt noticably for most workloads, but it makes a big difference in some workloads and can be fixed easily, mostly since the LFU replacement used in older Linux kernels is known to work. 

Due to the simple clock algorithm in shrink_mmap, sometimes clean, accessed pages can get evicted before dirty, old pages. With a relatively small file cache that mostly consists of dirty data, eg unpacking a tarball, it is possible for the dirty pages to evict the (clean) metadata buffers that are needed to write the dirty data to disk. A few other corner cases with amusing variations on this theme are bound to exist. 

The system reacts badly to variable VM load or to load spikes after a period of no VM activity. Since kswapd, the pageout daemon, only scans when the system is low on memory, the system can end up in a state where some pages have referenced bits from the last 5 seconds, while other pages have referenced bits from 20 minutes ago. This means that on a load spike the system has no clue which are the right pages to evict from memory, this can lead to a swapping storm, where the wrong pages are evicted and almost immediately afterwards faulted back in, leading to the pageout of another random page, etc... 

Under very heavy loads, NRU replacement of pages simply doesn't cut it. More careful and better balanced pageout eviction and flushing is called for. With the fragility of the Linux 2.2 pageout framework this goal doesn't really seem achievable. 
The facts that shrink_mmap is a simple clock algorithm and relies on other functions to make process-mapped pages freeable makes it fairly unpredictable. Add to that the balancing loop in try_to_free_pages and you get a VM subsystem which is extremely sensitive to minute changes in the code and a fragile beast at its best when it comes to maintenance or (shudder) tweaking. 


--Changes in Linux 2.4 
For Linux 2.4 a substantial development effort has gone into things like making the VM subsystem fully fine-grained for SMP systems and supporting machines with more than 1GB of RAM. Changes to the pageout code were done only in the last phase of development and are, because of that, somewhat conservative in nature and only employ known-good methods to deal with the problems that happened in the page replacement of the Linux 2.2 kernel. Before we get to the page replacement changes, however, first a short overview of the other changes in the 2.4 VM: 


More fine-grained SMP locking. The scalability of the VM subsystem has improved a lot for workloads where multiple CPUs are reading or writing the same file simultaneously; for example web or ftp server workloads. This has no real influence on the page replacement code. 

Unification of the buffer cache and the page cache. While in Linux 2.2 the page cache used the buffer cache to write back its data, needing an extra copy of the data and doubling memory requirements for some write loads, in Linux 2.4 dirty page cache pages are simply added in both the buffer and the page cache. The system does disk IO directly to and from the page cache page. That the buffer cache is still maintained separately for filesystem metadata and the caching of raw block devices. Note that the cache was already unified for reads in Linux 2.2, Linux 2.4 just completes the unification. 

Support for systems with up to 64GB of RAM (on x86). The Linux kernel previously had all physical memory directly mapped in the kernel's virtual address space, which limited the amount of supported memory to slightly under 1GB. For Linux 2.4 the kernel also supports additional memory (so called "high memory" or highmem), which can not be used for kernel data structures but only for page cache and user process memory. To do IO on these pages they are temporarily mapped into kernel virtual memory and the data is copied to or from a bounce buffer in "low memory". 
At the same time the memory zone for ISA DMA (0 - 16 MB physical address range) has also been split out into a separate page zone. This means larger x86 systems end up with 3 memory zones, which all need their free memory balanced so we can continue allocating kernel data structures and ISA DMA buffers. The memory zones logic is generalised enough to also work for NUMA systems. 


The SYSV shared memory code has been removed and replaced with a simple memory filesystem which uses the page cache for all its functions. It supports both POSIX SHM and SYSV SHM semantics and can also be used as a swappable memory filesystem (tmpfs). 
Since the changes to the page replacement code took place after all these changes and in the (one and a half year long) code freeze period of the Linux 2.4 kernel, the changes have been kept fairly conservative. On the other hand, we have tried to fix as many of the Linux 2.2 page replacement problems as possible. Here is a short overview of the page replacement changes: they'll be described in more detail below. 


Page aging, which was present in the Linux 1.2 and 2.0 kernels and in FreeBSD has been reintroduced into the VM. However, a few small changes have been made to avoid some artifacts of virtual page based aging. 

To avoid the eviction of "wrong" pages due to interactions from page aging and page flushing, the page aging and flushing has been separated. There are active and inactive page lists. 

Page flushing has been optimised to avoid too much interference by writeout IO on the more time-critical disk read IO. 

Controlled background page aging during periods of little or no VM activity in order to keep the system in a state where it can easily deal with load spikes. 

Streaming IO is detected; we do early eviction on the pages that have already been used and reward the IO stream with more agressive readahead. 

--Linux 2.4 page replacement changes in detail 
The development of the page replacement changes in Linux 2.4 has been influenced by two main factors. Firstly the bad behaviours of Linux 2.2 page replacement had to be fixed, using only known-good strategies because the development of Linux 2.4 had already entered the "code freeze" state. Secondly the page replacement had to be more predictable and easier to understand than Linux 2.2 because tuning the page replacement in Linux 2.2 was deserving of the proverbial label "subtle and quick to upset". This means that only VM ideas that are well understood and have little interactions with the rest of the system were integrated. Lots of ideas were taken from other freely available operating systems and literature. 


--Page aging 
Page aging was the first easy step in making the bad border-case behaviour from Linux 2.2 go away, it works reasonably well in Linux 1.2, Linux 2.0 and FreeBSD. Page aging allows us to make a much finer distinction between pages we want to keep in memory and pages we want to swap out than the NRU aging in Linux 2.2. 
Page aging in these OSes works as follows: for each physical page we keep a counter (called age in Linux, or act_count in FreeBSD) that indicates how desirable it is to keep this page in memory. When scanning through memory for pages to evict, we increase the page age (adding a constant) whenever we find that the page was accessed and we decrease the page age (substracting a constant) whenever we find that the page wasn't accessed. When the page age (or act_count) reaches zero, the page is a candidate for eviction. 

However, in some situations the LFU[Note] page aging of Linux 2.0 is known to have too much CPU overhead and adjust to changes in system load too slowly. Furthermore, research[Smaragdis, Kaplan, Wilson] has shown that recency of access is a more important criteria for page replacement than frequency. 

These two problems are solved by doing exponential decline of the page age (divide by two instead of substracting a constant) whenever we find a page that wasn't accessed, resulting in page replacement which is closer to LRU[Note] than LFU. This reduces the CPU overhead of page aging drastically in some cases; however, no noticable change in swap behaviour has been observed. 

Another artifact comes from the virtual address scanning. In Linux 1.2 and 2.0 the system reduces the page age of a page whenever it sees that the page hasn't been accessed from the page table which it is currently scanning, completely ignoring the fact that the page could have been accessed from other page tables. This can put a severe penalty on heavily shared pages, for example the C library. 

This problem is fixed by simply not doing "downwards" aging from the virtual page scans, but only from the physical-page based scanning of the active list. If we encounter pages which are not referenced, present in the page tables but not on the active list, we simply follow the swapout path to add this page to the swap cache and the active list so we'll be able to lower the page age of this page and swap it out as soon as the page age reaches zero. 


--Multiple page lists 
The bad interactions between page aging and page flushing, where referenced clean pages were freed before old dirty pages, is fixed by keeping the pages which are candidates for eviction separated from the pages we want to keep in memory (page age zero vs. nonzero). We separate the pages out by putting them on various page lists and having separate algorithms deal with each list. 
Pages which are not (yet) candidate for eviction are in process page tables, on the active list or both. Page aging as described above happens on these pages, with the function refill_inactive() balancing between scanning the page tables and scanning the active list. 

When the page age on a page reaches zero, due to a combination of pageout scanning and the page not being actively used, the page is moved to the inactive_dirty list. Pages on this list are not mapped in the page tables of any process and are, or can become, reclaimable. Pages on this list are handled by the function page_launder(), which flushes the dirty pages to disk and moves the clean pages to the inactive_clean list. 

Unlike the active and inactive_dirty lists, the inactive_clean list isn't global but per memory zone. The pages on these lists can be immediately reused by the page allocation code and count as free pages. These pages can also still be faulted back into where it came from, since the data is still there. In BSD this would be called the "cache" queue. 


--Dynamically sized inactive list 
Since we do page aging to select which pages to evict, having a very large statically sized inactive list (like FreeBSD has) doesn't seem to make much sense. In fact, it would cancel out some of the effects of doing the page aging in the first place: why spend much effort selecting which pages to evict[Dillon] when you keep as much as 33% of your swappable pages on the inactive list? Why do careful page aging when 33% of your pages end up as candidates for eviction at the same priority and you've effectively undone the aging for those 33% of pages which are candidates for eviction? 
On the other hand, having lots of inactive pages to choose from when doing page eviction means you have more chances of avoiding writeout IO or doing better IO clustering. It also gives you more of a "buffer" to deal with allocations due to page faults, etc. 

Both a large and a small target size for the inactive page list have their benefits. In Linux 2.4 we have chosen for a middle ground by letting the system dynamically vary the size of the inactive list depending on VM activity, with an artificial upper limit to make sure the system always preserves some aging information. 

Linux 2.4 keeps a floating average of the amount of pages evicted per second and sets the target for the inactive list and the free list combined to the free target plus this average number of page steals per second. Not only does this second give us enough time to do all kinds of page flushing optimisations, it also is small enough to keep page age distribution within the system intact, allowing us to make good choices on which pages to evict and which pages to keep. 


--Optimised page flushing 
Writing out pages from the inactive_dirty list as we encounter them can cause a system to totally destroy read performance because of the extra disk seeks done. A better solution is to delay writeout of dirty pages and let these dirty pages accumulate until we can do better IO clustering so that these pages can be written out to disk with less disk seeks and less interference with read performance. 
Due to the development of the page replacement changes happening in the code freeze, the system currently has a rather simple implementation of what's present in FreeBSD 4.2. As long as there are enough clean inactive pages around, we keep moving those to the inactive_clean list and never bother with syncing out the dirty pages. Note that this catches both clean pages and pages which have been written to disk by the update daemon (which commits filesystem data to disk periodically). 

This means that under loads where data is seldom written we can avoid writing out dirty inactive pages most of the time, giving us much better latencies in freeing pages and letting streaming reads continue without the disk head moving away to write out data all the time. Only under loads where lots of pages are being dirtied quickly does the system suffer a bit from syncing out dirty data irregularly. 

Another alternative would have been the strategy used in FreeBSD 4.3, where dirty pages get to stay in the inactive list longer than clean pages but are synced out before the clean pages are exhausted. This strategy gives more consistent pageout IO in FreeBSD during heavy write loads. However, a big factor causing the irregularities in pageout writes using the simpler strategy above may well be caused because of the huge inactive list target in FreeBSD (33It is not at all clear what this more complicated strategy would do when used on the dynamically sized inactive list on Linux 2.4, because of this Linux 2.4 uses the better understood strategy of evicting clean inactive pages first and only after those are gone start syncing the dirty ones. 


--Background page aging 
On many systems the normal operating mode is that after a period of relative activity a sudden load spike comes in and the system has to deal with that as gracefully as possible. Linux 2.2 has the problem that, with the lack of an inactive page list, it is not clear at all which pages should be evicted when a sudden demand for memory kicks in. 
Linux 2.4 is better in this respect, with the reclaim candidates neatly separated out on the inactive list. However, the inactive list could have any random size the moment VM pressure drops off. We'd like get the system in a more predictable state while the VM pressure is low. In order to achieve this, Linux 2.4 does background scanning of the pages, trying to get a sane amount of pages on the inactive list, but without scanning agressively so only truly idle pages will end up on the inactive list and the scanning overhead stays small. 


--Drop behind 
Streaming IO doesn't just have readahead, but also its natural complement: drop behind. After the program doing the streaming IO is done with a page, we depress its priority heavily so it will be a prime candidate for eviction. Not only does this protect the working set of running processes from being quickly evicted by streaming IO, but it also prevents the streaming IO from competing with the pageouts and pageins of the other running processes, which reduces the number of disk seeks and allows the streaming IO to proceed at a faster speed. Currently readahead and drop-behind only work for read() and write(); mmap()ed files and swap-backed anonymous memory aren't supported yet. 

--Conclusions 
Since the Linux 2.4 kernel's VM subsystem is still being tuned heavily, it is too early to come with conclusive figures on performance. However, initial results seem to indicate that Linux 2.4 generally has better performance than Linux 2.2 on the same hardware. 

Reports from users indicate that performance on typical desktop machines has improved a lot, even though the tuning of the new VM has only just begun. Throughput figures for server machines seem to be better too, but that could also be attributed to the fact that the unification of the page cache and the buffer cache is complete. 

One big difference between the VM in Linux 2.4 and the VM in Linux 2.2 is that the new VM is far less sensitive to subtle changes. While in Linux 2.2 a subtle change in the page flushing logic could upset page replacement, in Linux 2.4 it is possible to tweak the various aspects of the VM with predictable results and little to no side-effects in the rest of the VM. 

The solid performance and relative insensitivity to subtle changes in the environment can be taken as a sign that the Linux 2.4 VM is not just a set of simple fixes for the problems experienced in Linux 2.2, but also a good base for future development. 


Remaining issues 
The Linux 2.4 VM mainly contains easy to implement and obvious to verify solutions for some of the known problems Linux 2.2 suffers from. A number of issues are either too subtle to implement during the code freeze or will have too much impact on the code. The complete list of TODO items can be found on the Linux-MM page[Linux-MM]; here are the most important ones: 


Low memory deadlock prevention: with the arrival of journaling and delayed-allocation filesystems it is possible that the system will need to allocate memory in order to free memory; more precisely, to write out data so memory can become freeable. To remove the possibility for deadlock, we need to limit the number of outstanding transactions to a safe number, possibly letting each of the page flushing functions indicate how much memory it may need and doing bookkeeping of these values. Note that the same problem occurs with swap over network. 

Load control: no matter how good we can get the page replacement code, there will always be a point where the system ends up thrashing to death. Implementing a simple load control system, where processes get suspended in round-robin fashion when the paging load gets too high, can keep the system alive under heavy overload and allow the system to get enough work done to bring itself back to a sane state. 

RSS limits and guarantees: in some situations it is desirable to control the amount of physical memory a process can consume 
(the resident set size, or RSS). With the virtual address based page scanning of Linux' VM subsystem it is trivial to implement 
RSS ulimits and minimal RSS guarantees. Both help to protect processes under heavy load and allow the system administrator 
to better control the use of memory resources. 

VM balancing: in Linux 2.4, the balancing between the eviction of cache pages, swap-backed anonymous memory and the inode and dentry caches is essentially the same as in Linux 2.2. While this seems to work well for most cases there are some possible scenarios where a few of the caches push the other users out of memory, leading to suboptimal system performance. It may be worthwhile to look into improving the balancing algorithm to achieve better performance in "non-standard" situations. 

Unified readahead: currently readahead and drop-behind only works for read() and write(). Ideally they should work for mmap()ed files and anonymous memory too. Having the same set of algorithms for both read()/write(), mmap() and swap-backed anonymous memory will simplify the code and make performance improvements in the readahead and drop-behind code immediately available to all of the system. 


AIX swap notes:
---------------

Note 1:
-------

Q:

Hi All,

I'm seeing an interesting paging behavior (paging out to paging space when I don't think it should) on our AIX 5.3 TL3CSP system. 
First the system particulars:

AIX 5.3 TL3 with CSP
HACMP v5.2
Oracle 10g
28GB memory
8GB paging space
EMC LUNs for Oracle data.
CIO used for Oracle data.

Virtual memory tuned as such
vmo -p -o maxclient%=50 
vmo -p -o maxperm%=50 
vmo -p -o 'lru_file_repage=0' 
vmo -p -o 'minperm%=3' 

So, given that configuration, it is my understanding that AIX, when under memory pressure, will steal memory from the file cache 
instead of paging process memory out to the paging space (lru_file_repage = 0).

Now, this system works for the most part like I understand it should. Via nmon, I can watch it stealing memory from the FileSystemCache 
(numclient values decrease) when the box gets under memory pressure. However, every once in a while when under memory pressure, 
I can see that the system starts writing to the paging space when there is plenty of FileSystemCache available to steal from.

Below is a snapshot from the nmon 'm'emory switch:
nmon.jpg
You can see here that I've got 1.7GB paged out, while numclient is at 21%.

So, my question is, why does AIX page out when under memory pressure instead of stealing from the FileSystemCache memory like I want it to?


A:

Look at the Paging to/from the Paging Space - its zero. Once info is in the paging space its left there until the space is needed 
for something else. So at this point the server isn't actually paging. 

It Has paged in the past however.


Note 2:
------

AIX will always try to use 100% of real memory--> AIX will use the amount of 
memory solicited by your processes. The remaining capacity will be used as 
filesystem cache. 


You can change the minimum and maximum amounts of memory used to cache files 
with vmtune (vmo for 5.2+), and it is advised to do so if your're running 
databases with data on raw devices (since the db engine usually has its own 
cache algorithm, and AIX can't cache data on raw devices). The values to 
modify are minperm, maxperm, minclient and maxpin (use at you own risk!!!). 


Paging space use will be very low: 5% is about right--> A paging space so 
little used seems to be oversized. In general, the paging space should be 
under 40%, and the size must be determined accordingly to the application 
running (i.e. 4X the physical memory size for oracle). In AIX 5L a paging 
space can be reduced without rebooting. Anyway, AIX always uses some paging 
space, even keeping copies of the data on memory and on disk, as a 
"predictive" paging. 

Look in topas for the values "comp mem" (proceses) and "non comp mem" 
(filesystem cache) to see the distribution of the memory usage. Nmon can 
show you the top proceses by memory usage, along with many other statistics. 


There are several tools which can give you a more detailed picture of how 
memory is being used. "svmon" is very comprehensive. Tools such as topas 
and nmon will also give you a bit more information. 

Note 3:
-------

Memory utilization on AIX systems typically runs around 100%. This is often a source of concern. However, high memory utilization 
in AIX does not imply the system is out of memory. By design, AIX leaves files it has accessed in memory. 
This significantly improves performance when AIX reaccesses these files because they can be reread directly from memory, not disk*. 
When AIX needs memory, it discards files using a "least used" algorithm. This generates no I/O and has almost no performance impact 
under normal circumstances. 

Sustained paging activity is the best indication of low memory. Paging activity can be monitored using the "vmstat" command. 
If the "page-in" (PI) and "page-out" (PO) columns show non-zero values over "long" periods of time, then the system is short on memory. 
(All systems will show occasional paging, which is not a concern.) 

Memory requirements for applications can be empirically determined using the AIX "rmss"command. The "rmss" command is a test tool 
that dynamically reduces usable memory. The onset of paging indicates an application's minimum memory requirement. 

Finally, the "svmon" command can be used to list how much memory is used each process. The interpretation of the svmon output 
requires some expertise. See the AIX documentation for details. 




==================================================================
35 Volume group, logical volumes, and filesystem commands in HPUX:
==================================================================


35.1 Filesystems in HPUX:
-------------------------

HFS : used at HP-UX < v. 10
VxFS: used at HP-UX >= v. 10

Ofcourse, CDFS (cdroms), and other filesystem types, are supported.

HP-UX's implementation of a journaled file system, also known as JFS, is based on the version from 
VERITAS Software Inc. called VxFS.

Up through the 10.0 release of HP-UX, HFS has been the only available locally mounted read/write file system. 
Beginning at 10.01, you also have the option of using VxFS. (Note, however, that VxFS cannot be used 
as the root file system.)

As compared to HFS, VxFS allows much shorter recovery times in the event of system failure. 
It is also particularly useful in environments that require high performance or deal with large 
volumes of data. This is because the unit of file storage, called an extent, can be multiple blocks, 
allowing considerably faster I/O than with HFS. It also provides for minimal downtime by allowing 
online backup and administration - that is, unmounting the file system will not be necessary for 
certain tasks. You may not want to configure VxFS, though, on a system with limited memory 
because VxFS memory requirements are considerably larger than that for HFS.

Basic VxFS functionality is included with the HP-UX operating system software. Additional enhancements 
to VxFS are available as a separately orderable product called HP "OnlineJFS", product number B5117AA (Series 700) 
and B3928AA (Series 800). 


35.2 How to create a filesystem in HP-UX: an outline.
-----------------------------------------------------


-- Task 1. Estimate the Size Required for the Logical Volume  
 
-- Task 2. Determine If Sufficient Disk Space Is Available for the Logical Volume within Its Volume Group  
 
Use the vgdisplay command to calculate this information. vgdisplay will output data on one or more volume groups, 
including the physical extent size (under PE Size (Mbytes)) and the number of available physical extents 
(under Free PE). By multiplying these two figures together, you will get the number of megabytes available 
within the volume group. See vgdisplay(1M) for more information.

-- Task 3. Add a Disk to a Volume Group If Necessary 
 
If there is not enough space within a volume group, you will need to add a disk to a volume group.
To add a disk to an existing volume group, use pvcreate(1M) and vgextend(1M). You can also add a disk 
by creating a new volume group with pvcreate(1M) and vgcreate(1M).

-- Task 4. Create the Logical Volume  
 
Use lvcreate to create a logical volume of a certain size in the above volume group. See lvcreate(1M) for details.
Use lvcreate as in the following example:

Create a logical volume of size 100 MB in volume group /dev/vg03:
# lvcreate -L 100 /dev/vg03

-- Task 5. Create the New File System  
 
Create a file system using the newfs command. Note the use of the character device file. For example:
 
# newfs -F hfs /dev/vg02/rlvol1 
 
If you do not use the -F FStype option, by default, newfs creates a file system based on the content 
of your /etc/fstab file. If there is no entry for the file system in /etc/fstab, then the file system type 
is determined from the file /etc/default/fs. For information on additional options, see newfs(1M).

$ cat /etc/default/fs
LOCAL=vxfs


For HFS, you can explicitly specify that newfs create a file system that allows short file names or long file names 
by using either the -S or -L option. By default, these names will as short or long as those allowed 
by the root file system. Short file names are 14 characters maximum. Long file names allow up to 255 characters. 
Generally, you use long file names to gain flexibility in naming files. Also, files created on other systems 
that use long file names can be moved to your system without being renamed.

When creating a VxFS file system, file names will automatically be long.

After creating a filesystem, you need to mount it to make it accesible, for example like:


-- Task 6. mount the new local file system:

Choose an empty directory to serve as the mount point for the file system. Use the mkdir command to 
create the directory if it does not currently exist. For example, enter:
 
# mkdir /test 
 
Mount the file system using the mount command. Use the block device file name that contains the file system. 
You will need to enter this name as an argument to the mount command.

For example, enter
 
# mount /dev/vg01/lvol1 /test 


Note: 
The newfs command is a "friendly" front-end to the mkfs command (see mkfs(1M)). The newfs command 
calculates the appropriate parameters and then builds the file system by invoking the mkfs command.



35.3 HP-UX LVM commands:
========================

-- vgdisplay:
-- ----------

Displays information about volume groups.

Examples:

# vgdisplay
# vgdisplay -v vgdatadir


-- pvdisplay:
-- ----------

Display information about physical volumes within LVM volume group. 

EXAMPLES

Display the status and characteristics of a physical volume: 
# pvdisplay /dev/dsk/c1t0d0 

Display the status, characteristics, and allocation map of a physical volume: 
# pvdisplay -v /dev/dsk/c2t0d0 

# pvdisplay /dev/dsk/c102t9d3

--- Physical volumes ---
PV Name                     /dev/dsk/c43t9d3
PV Name                     /dev/dsk/c102t9d3   Alternate Link
VG Name                     /dev/vgora_e1atlas_data
PV Status                   available
Allocatable                 yes
VGDA                        2
Cur LV                      2
PE Size (Mbytes)            4
Total PE                    1668
Free PE                     102
Allocated PE                1566
Stale PE                    0
IO Timeout (Seconds)        default
Autoswitch                  On


-- lvdisplay:
-- ----------

Displays information about logical volumes.

Examples:

# lvdisplay lvora_p0gencfg_apps
# lvdisplay -v lvora_p0gencfg_apps
# lvdisplay -v /dev/vg00/lvol2

# lvdisplay /dev/vgora_e0etea_data/lvora_e0etea_data
--- Logical volumes ---
LV Name                     /dev/vgora_e0etea_data/lvora_e0etea_data
VG Name                     /dev/vgora_e0etea_data
LV Permission               read/write
LV Status                   available/syncd
Mirror copies               1
Consistency Recovery        MWC
Schedule                    parallel
LV Size (Mbytes)            17020
Current LE                  4255
Allocated PE                8510
Stripes                     0
Stripe Size (Kbytes)        0
Bad block                   on
Allocation                  strict
IO Timeout (Seconds)        default



-- vgchange:
-- ---------

Set volume group availability. This command activates or deactivates one or more volume groups as specified
by the -a option, namely y or n.

Activate a volume group:
# vgchange -a y /dev/vg03

Deactivate a volume group:
# vgchange -a n /dev/vg03



-- vgcreate:
-- ---------


/usr/sbin/vgcreate [-f] [-A autobackup] [-x extensibility] [-e max_pe] [-l max_lv] [-p max_pv] 
                   [-s pe_size] [-g pvg_name] vg_name pv_path ...

The vgcreate command creates a new volume group. vg_name is a symbolic name for the volume group and must be used 
in all references to it. vg_name is the path to a directory entry under /dev that must contain a character 
special file named group. Except for the group entry, the vg_name directory should be empty. 
The vg_name directory and the group file have to be created by the user (see lvm(7)).

vgcreate leaves the volume group in an active state.


EXAMPLES

1. Create a volume group named /dev/vg00 containing two physical volumes
with extent size set to 2 Mbytes.  If directory /dev/vg00 exists with
the character special file group, the volume group is created:

# vgcreate -s 2 /dev/vg00 /dev/dsk/c1d0s2 /dev/dskc2d0s2

2. Create a volume group named /dev/vg01 that can contain a maximum of
three logical volumes, with extent size set to 8 Mbytes:

# vgcreate -l 3 -s 8 /dev/vg01 /dev/dsk/c4d0s2

3. Create a volume group named /dev/vg00 and a physical volume group
named PVG0 with two physical volumes:

# vgcreate -g PVG0 /dev/vg00 /dev/dsk/c1d0s2 /dev/dsk/c2d0s2

3. Create a volume group named /dev/vg00 containing two physical volumes with extent size 
set to 2 MB, from scratch. 

First, create the directory /dev/vg00 with the character special file called group. 

mkdir /dev/vg00 
mknod /dev/vg00/group c 64 0x030000 

The minor number for the group file should be unique among all the volume groups on the system. 
It has the format 0xNN0000, where NN runs from 00 to ff. The maximum value of NN is controlled by the kernel 
tunable parameter maxvgs.

Initialize the disks using pvcreate(1M). 

pvcreate /dev/rdsk/c1t0d0 
pvcreate /dev/rdsk/c1t2d0 

Create the volume group. 

vgcreate -s 2 /dev/vg00 /dev/dsk/c1t0d0 /dev/dsk/c1t2d0 


Note About the "dsk" and "rdsk" notation:
-----------------------------------------

Physical volumes are identified by their device file names, for example

/dev/dsk/cntndn

/dev/rdsk/cntndn

Note that each disk has a block device file and a character or raw device file, the latter identified by the r. 
Which name you use depends on what task you are doing with the disk. In the notation above, the first name 
represents the block device file while the second is the raw device file.

-- Use a physical volume's raw device file for these two tasks only:

-> When creating a physical volume. Here, you use the device file for the disk. For example, 
this might be /dev/rdsk/c3t2d0 if the disk were at card instance 3, target address 2, and device number 0. 
(The absence of a section number beginning with s indicates you are referring to the entire disk.)

-> When restoring your volume group configuration.

For all other tasks, use the block device file. For example, when you add a physical volume to a volume group, 
you use the disk's block device file for the disk, such as /dev/dsk/c5t3d0.



-- vgextend:
-- ---------

Extends a volume group by adding physical volumes to it.

Examples:

Add physical volumes /dev/dsk/c1d0s2 and /dev/dsk/c2d0s2 to volume group /dev/vg03:
# vgextend /dev/vg03 /dev/dsk/c1d0s2 /dev/dsk/c2d0s2

# vgextend vg01 /dev/dsk/c0t4d0


-- pvcreate:
-- ---------

Creates physical volume for use in a volume group.

Examples:

# pvcreate -f /dev/rdsk/c1d0s2

# ioscan -fnC disk
# pvcreate -f /dev/rdsk/c0t1d0


-- lvcreate:
-- ---------

Create logical volume in LVM volume group 

The lvcreate command creates a new logical volume within the volume group specified by vg_name. 
Up to 255 logical volumes can be created in one volume group

SYNOPSIS
      /etc/lvcreate [-d schedule] {-l logical_extents_number | -L
      logical_volume_size} [-m mirror_copies] [-n lv_path] [-p permission]
      [-r relocate] [-s strict] [-C contiguous] [-M mirror_write_cache] [-c
      vol_group_name


Examples:

Create a logical volume in volume group /dev/vg02: 

# lvcreate /dev/vg02 

Create a logical volume in volume group /dev/vg03 with nonstrict allocation policy: 

# lvcreate -s n /dev/vg03 

Create a logical volume of size 100 MB in volume group /dev/vg03: 

# lvcreate -L 100 /dev/vg03 

Create a logical volume of size 90 MB striped across 3 disks with a stripe size of 64 KB: 

# lvcreate -L 90 -i 3 -I 64 /dev/vg03 


-- fstyp:
-- ------

Determines file system type.

SYNOPSIS
/usr/sbin/fstyp [-v] special

The fstyp command allows the user to determine the file system type of a mounted or unmounted file system. 
special represents a device special file (for example: /dev/dsk/c1t6d0).

The file system type is determined by reading the superblock of the supplied special file. If the superblock 
is read successfully, the command prints the file system type identifier on the standard output and exits 
with an exit status of 0. If the type of the file system cannot be identified, the error message 
unknown_fstyp (no matches) is printed and the exit status is 1. Exit status 2 is not currently returned, 
but is reserved for the situation where the file system matches more than one file system type. 
Any other error will cause exit status 3 to be returned.

The file system type is determined by reading the superblock of the supplied special file.

Examples:

Find the type of the file system on a disk, /dev/dsk/c1t6d0: 

# fstyp /dev/dsk/c1t6d0 

Find the type of the file system on a logical volume, /dev/vg00/lvol6: 

# fstyp /dev/vg00/lvol6 

Find the file system type for a particular device file and also information about its super block: 

# fstyp -v /dev/dsk/c1t6d0 



-- mkboot:
-- -------

mkboot is used to install or update boot programs on the specified device file.

The position on device at which boot programs are installed depends on the disk layout of the device. 
mkboot examines device to discover the current layout and uses this as the default. If the disk is uninitialized, 
the default is LVM layout on PA-RISC and Whole Disk on Itanium(R)-based systems. 
The default can be overridden by the -l, -H, or -W options.

Boot programs are stored in the boot area in Logical Interchange Format (LIF), which is similar to a file system. 
For a device to be bootable, the LIF volume on that device must contain at least the ISL 
(the initial system loader) and HPUX (the HP-UX bootstrap utility) LIF files. If, in addition, the device 
is an LVM physical volume, the LABEL file must be present (see lvlnboot(1M) ).

For the VERITAS Volume Manager (VxVM) layout on the Itanium-based system architecture, the only relevant 
LIF file is the LABEL file. All other LIF files are ignored. VxVM uses the LABEL file when the system boots 
to determine the location of the root, stand, swap, and dump volumes.

EXAMPLES

Install default boot programs on the specified disk, treating it as an LVM disk: 

# mkboot -l /dev/dsk/c0t5d0 

Use the existing layout, and install only SYSLIB and ODE files and preserve the EST file on the disk: 

# mkboot -i SYSLIB -i ODE -p EST /dev/rdsk/c0t5d0 

Install only the SYSLIB file and retain the ODE file on the disk. Use the Whole Disk layout. Use the file 
/tmp/bootlf to get the boot programs rather than the default. (The -i ODE option will be ignored): 

# mkboot -b /tmp/bootlf -i SYSLIB -i ODE -p ODE -W /dev/rdsk/c0t5d0 

Install EFI utilities to the EFI partition on an Itanium-based system, treating it as an LVM or VxVM disk: 

# mkboot -e -l /dev/dsk/c3t1d0 

Create AUTO file with the string autofile command on a device. If the device is on an Itanium-based system, 
the file is created as /EFI/HPUX/AUTO in the EFI partition. If the device is on a PA-RISC system, the file 
is created as a LIF file in the boot area. 

# mkboot -a "autofile command" /dev/dsk/c2t0d0 


-- bdf:
-- ----

Report number of free disk blocks.

bdf prints out the amount of free disk space available on the specified filesystem (/dev/dsk/c0d0s0, for example) 
or on the file system in which the specified file ($HOME, for example) is contained.
If no file system is specified, the free space on all of the normally mounted file systems is printed.  
The reported numbers are in kilobytes.
 
Examples:

# bdf

oranh300:/home/se1223>bdf | more
Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol3     434176  165632  266504   38% /
/dev/vg00/lvol1     298928   52272  216760   19% /stand
/dev/vg00/lvol8    2097152 1584488  508928   76% /var
/dev/vg00/lvol11    524288    2440  490421    0% /var/tmp
/dev/vg00/lvucmd     81920    1208   75671    2% /var/opt/universal
/dev/vg00/lvol9    1048576  791925  240664   77% /var/adm
/dev/vg00/lvol10   2064384   47386 1890941    2% /var/adm/crash
/dev/vg00/lvol7    1548288 1262792  283320   82% /usr
/dev/vg00/vsaunixlv
                    311296  185096  118339   61% /usr/local/vsaunix
/dev/vg00/lvol4    1867776    5264 1849784    0% /tmp
/dev/vg00/lvol6    1187840  757456  427064   64% /opt
/dev/vg00/lvol5     262144   34784  225632   13% /home
/dev/vg00/lvbeheer  131072   79046   48833   62% /beheer
/dev/vg00/lvbeheertmp
                    655360   65296  553190   11% /beheer/tmp
/dev/vg00/lvbeheerlog
                    524288   99374  398407   20% /beheer/log
/dev/vg00/lvbeheerhistlog
..
..


# bdf /tmp
Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol4    1867776    5264 1849784    0% /tmp


-- lvextend:
-- ---------

Increase number of physical extents allocated to a logical volume.

/etc/lvextend {-l logical_extents_number | -L logical_volume_size | -m
              mirror_copies} lv_path [physical_volume_path ...  |
              physical_vol_group_name...]

lvextend increases the number of mirrored copies or the size of the lv_path parameter.  
The change is determined according to which command options are specified.

WARNINGS
      The -m option cannot be used on HP-IB devices.

EXAMPLES
- Increase the number of the logical extents of a logical volume to one hundred:

# lvextend -l 100 /dev/vg01/lvol3

- Increase the logical volume size to 400 Mbytes:

# lvextend -L 400 /dev/vg01/lvol4

Allocate two mirrors (that is, three copies) for each logical extent of a logical volume:

# lvextend -m 2 /dev/vg01/lvol5



-- extendfs:
-- ---------

Extend file system size.

/etc/extendfs [-q] [-v] [-s size] special

If the original hfs filesystem image created on special does not make use of all of the available space, 
extendfs can be used to increase the capacity of an hfs filesystem by updating the filesystem structure
to include the extra space.
The command-line parameter special specifies the character device special file of either a logical volume 
or a disk partition. If special refers to a mounted filesystem, special must be un-mounted
before extendfs can be run (see mount(1M)).

The root filesystem cannot be extended using the extendfs command
because the root filesystem is always mounted, and extendfs only works
on unmounted filesystems.


EXAMPLES
To increase the capacity of a filesystem created on a logical volume, enter:

# umount /dev/vg00/lvol1

# lvextend -L larger_size /dev/vg00/lvol1

# extendfs /dev/vg00/rlvol1


-- fsadm:
-- ------

 
EXAMPLES
Convert a HFS file system from a nolargefiles file system to a largefiles file system: 

# fsadm -F hfs -o largefiles /dev/vg02/lvol1 

Display HFS relevant file system statistics: 

# fsadm -F hfs /dev/vg02/lvol1 


-- diskinfo:
-- ---------

diskinfo - describe characteristics of a disk device

SYNOPSIS
     /etc/diskinfo [-b|-v] character_devicefile

DESCRIPTION
      diskinfo determines whether the character special file named by
      character_devicefile is associated with a SCSI, CS/80, or Subset/80
      disk drive; if so, diskinfo summarizes the disk's characteristics.

Example:

# diskinfo /dev/rdsk/c31t1d3
SCSI describe of /dev/rdsk/c31t1d3:
             vendor: IBM
         product id: 2105800
               type: direct access
               size: 13671904 Kbytes
   bytes per sector: 512




35.4 Notes and further examples:
================================


Examples: More on how to create a filesystem on HP-UX:
------------------------------------------------------


Example 1: 
----------

Here we repeat the essentials of section 35.2:

Task 1. Estimate the Size Required for the Logical Volume  
Task 2. Determine If Sufficient Disk Space Is Available for the Logical Volume within Its Volume Group  
Task 3. Add a Disk to a Volume Group If Necessary 
 
Task 4. Create the Logical Volume  
 
Use lvcreate to create a logical volume of a certain size in the above volume group. See lvcreate(1M) for details.
Use lvcreate as in the following example:

Create a logical volume of size 100 MB in volume group /dev/vg03:

# lvcreate -L 100 /dev/vg03

-- Task 5. Create the New File System  
 
Create a file system using the newfs command. Note the use of the character device file. For example:
 
# newfs -F hfs /dev/vg02/rlvol1 
 
-- Task 6. mount the new local file system:

Choose an empty directory to serve as the mount point for the file system. Use the mkdir command to 
create the directory if it does not currently exist. For example, enter:
 
# mkdir /test 
 
Mount the file system using the mount command. Use the block device file name that contains the file system. 
You will need to enter this name as an argument to the mount command.

For example, enter
 
# mount /dev/vg01/lvol1 /test 



Example 2:
----------

This is an example of creating volume group vg01 & logical 
volume/partion data. 

Prepare for logical volume creation: 

root:/> mkdir /dev/vg01 
root:/> mknod /dev/vg01/group c 64 0x010000 
root:/> pvcreate -f /dev/rdsk/c0t5d0 
Physical volume "/dev/rdsk/c0t5d0" has been successfully created. 

root:/> vgcreate vg01 /dev/dsk/c0t5d0 
Volume group "/dev/vg01" has been successfully created. 
Volume Group configuration for /dev/vg01 has been saved in 
/etc/lvmconf/vg01.conf 

root:/> vgdisplay -v vg01 
root:/> lvcreate -L 100 -n data vg01 
Logical volume "/dev/vg01/data" has been successfully created with 
character device "/dev/vg01/rdata". 

Create HFS file system 

root:/> newfs -F hfs /dev/vg01/rdata 

Create Journal or Veritas file system 

root:/> newfs -F vxfs /dev/vg02/rdata 



Example 3:
----------

To create a VxFS file system 12288 sectors in size on VxVM volume, enter: 

# mkfs -F vxfs /dev/vx/rdsk/diskgroup/volume 12288

To use mkfs to create a VxFS file system on /dev/rdsk/c0t6d0: 

# mkfs -F vxfs /dev/rdsk/c0t6d0 1024 

To use mkfs to determine the command that was used to create the VxFS file system on /dev/rdsk/c0t6d0: 

# mkfs -F vxfs -m /dev/rdsk/c0t6d0 

To create a VxFS file system on /dev/vgqa/lvol1, with a Version 4 disk layout and largefiles capability: 

# mkfs -F vxfs -o version=4,largefiles /dev/vgqa/lvol1 


http://www.docs.hp.com/en/B2355-90672/index.html


Example 4:
----------

Example: Creating a Logical Volume Using HP-UX Commands

To create a logical volume:

Select one or more disks. ioscan(1M) shows the disks attached to the system and their device file names.
Initialize each disk as an LVM disk by using the pvcreate command. For example, enter
 
# pvcreate /dev/rdsk/c0t0d0 
 
Note that using pvcreate will result in the loss of any existing data currently on the physical volume.
You use the character device file for the disk.
Once a disk is initialized, it is called a physical volume.

- Pool the physical volumes into a volume group. To complete this step:

Create a directory for the volume group. For example:
 
# mkdir /dev/vgnn 
 
Create a device file named group in the above directory with the mknod command.
 
# mknod /dev/vgnn/group c 64 0xNN0000 
 
The c following the device file name specifies that group is a character device file.
The 64 is the major number for the group device file; it will always be 64.
The 0xNN0000 is the minor number for the group file in hexadecimal. Note that each particular NN must be a 
unique number across all volume groups.

For more information on mknod, see mknod(1M); for more information on major numbers and minor numbers, 
see Configuring HP-UX for Peripherals.

Create the volume group specifying each physical volume to be included using vgcreate. For example:
 
# vgcreate /dev/vgnn /dev/dsk/c0t0d0 
 
Use the block device file to include each disk in your volume group. You can assign all the physical volumes 
to the volume group with one command. No physical volume can already be part of an existing volume group.

Once you have created a volume group, you can now create a logical volume using lvcreate. For example:

# lvcreate /dev/vgnn 
 
Using the above command creates the logical volume /dev/vgnn/lvoln with LVM automatically assigning 
the n in lvoln.

When LVM creates the logical volume, it creates the block and character device files and places them in the directory 
/dev/vgnn.



VxFS can, theoretically, support files up to two terabytes in size because file system structures 
are no longer in fixed locations (see Chapter 2 "Disk Layout"). The maximum size tested and supported 
on HP-UX 11.x systems is one terabyte. Large files are files larger than two gigabytes in size.

 NOTE: Be careful when enabling large file capability. Applications and utilities such as backup may experience 
 problems if they are not aware of large files. 
 
 
Creating a File System with Large Files 

You can create a file system with large file capability by entering the following command:

# mkfs -F vxfs -o largefiles special_device size 
 
Specifying largefiles sets the largefiles flag, which allows the file system to hold files 
up to one terabyte in size. Conversely, the default nolargefiles option clears the flag and limits 
files being created to a size of two gigabytes or less:

# mkfs -F vxfs -o nolargefiles special_device size 




Notes:
------

Note 1: Create a System Mirror Disk:
------------------------------------

This note describes how to configure LVM mirroring of a system disk. In this example the HP server is STSRV1,
the primary boot device is SCSI=6 (/dev/dsk/c2t6d0) and the alternative mirrored bootdevice is 
SCSI=5 (/dev/dsk/c2t5d0). The following commands will do the trick:

# ioscan -fnC disk
# pvcreate -Bf /dev/rdsk/c2t5d0
# mkboot -l /dev/rdsk/c2t5d0
# mkboot -a "hpux -lq (;0)/stand/vmunix" /dev/rdsk/c2t5d0
# vgextend /dev/vg00 /dev/dsk/c2t5d0

# for P in 1 2 3 4 5 6 7 8 9 10
> do
> lvextend -m 1 /dev/vg00/lvol$P /dev/dsk/c2t5d0
> sleep 1
> done


Note 2: Create a System Mirror Disk:
------------------------------------

# ioscan -fnC disk 
Class I H/W Path Driver S/W State H/W Type Description 
===================================================================== 
disk 0 0/0/1/1.2.0 sdisk CLAIMED DEVICE HP 73.4GMAN3735MC 
                         /dev/dsk/c1t2d0 /dev/rdsk/c1t2d0 
disk 1 0/0/2/0.2.0 sdisk CLAIMED DEVICE HP 73.4GATLAS10K3_73_SCA 
                         /dev/dsk/c2t2d0 /dev/rdsk/c2t2d0 
  
Note: c1t2d0 is the boot disk and c2t2d0 is the mirrored disk. 
       
1) Initialize the disk and make it bootable 
        pvcreate -B /dev/rdsk/c2t2d0 
            Note: the -B parameter tells pvcreate that this will be a bootable disk. 
       
2) Add the physical volume to the volume group 
            vgextend /dev/vg00 /dev/dsk/c2t2d0 
       
3) Use mkboot to place the boot utilities in the boot area and add the AUTO file. 
            mkboot /dev/dsk/c2t2d0 
            mkboot -a "hpux -lq" /dev/rdsk/c2t2d0 
       
4) Use mkboot to update the AUTO file on the primary boot disk. 
            mkboot -a "hpux -lq" /dev/rdsk/c1t2d0 
       
5) Mirror the stand, root and swap logical volumes 
            lvextend -m 1 /dev/vg00/lvol1 
            lvextend -m 1 /dev/vg00/lvol2 
            lvextend -m 1 /dev/vg00/lvol3 


Note: LVM will resynchronize the new mirror copies. 


Repeat the lvextend for all other logical volumes on the boot mirror. 
            lvextend -m 1 /dev/vg00/lvol4 
            lvextend -m 1 /dev/vg00/lvol5 
            lvextend -m 1 /dev/vg00/lvol6 
            lvextend -m 1 /dev/vg00/lvol7 
            lvextend -m 1 /dev/vg00/lvol8 


6) Modify your alternate boot path to point to the mirror copy of the boot disk. 
Note: Use the Hardware path for your new boot disk. 
            setboot -a 0/0/2/0.2.0 



Note 3: Increase a filesystem in HP-UX:
---------------------------------------

Example 1:
----------

In this example, you would need to increase the file system size of /var by 10 MB, which actually needs 
to be rounded up to 12 MB.

Increase /var
Follow these steps to increase the size limit of /var.

- Determine if any space is available for the /dev/vg00:

# /sbin/vgdisplay /dev/vg00 

 
The Free PE indicates the number of 4 MB extents available, in this case 79 (equivalent to 316 MB).

- Change to single user state:

/sbin/shutdown

This allows /var to be unmounted.

- View mounted volumes:

# /sbin/mount

You see a display similar to the following:

/ on /dev/vg00/lvol1 defaults on Sat Mar 8 23:19:19 1997
/var on /dev/vg00/lvol7 defaults on Sat Mar 8 23:19:28 1997 


# Determine which logical volume maps to /var. In this example, it is /dev/vg00/lvol7

- Unmount /var:

# /sbin/umount /var

This is required for the next step, because extendfs can only work on unmounted volumes. If you get a 
"device busy" error at this point, reboot the system and log on in single-user mode before continuing.

- Extend the size of the logical volume:

# /sbin/lvextend -L new_size_in_MB /dev/vg00/lvol7

For example, to make this volume 332 MB:

# /sbin/lvextend -L 332 /dev/vg00/lvol7

To extend the file system size to the logical volume size:

# /sbin/extendfs /dev/vg00/rlvol7

Mount /var:

# /sbin/mount /var

Go back to the regular init state: init 3 or init 4, or reboot.


Example 2:
----------

To increase the capacity of a file system created on a logical volume, enter:

# umount /dev/vg00/lvol1
# lvextend -L larger_size /dev/vg00/lvol1
# extendfs -F hfs /dev/vg00/rlvol1          -- For operation like mkfs or extendfs, you should use raw device interface. 
# mount /dev/vg00/lvol1 mount_directory


Example 3:
----------

> 
> Date: 12/14/99 
> Document description: Extending /var, /usr, /tmp without Online JFS 
> Document id: KBRC00000204 
> 
> 
> You may provide feedback on this document 
> 
> 
> Extending /var, /usr, /tmp without Online JFS DocId: KBRC00000204 Updated: 
> 12/14/99 1:14:29 PM 
> 
> PROBLEM 
> Since /var, /usr, /tmp (and sometimes /opt) are always in use by the 
> operating system, they cannot be unmounted with the umount command. In order 
> to extend these filesystems, the system must be in single user mode. 
> 
> RESOLUTION 
> This example will show how to extend /usr to 400MB without Online JFS 
> 
> 
> 1.. Backup the filesystem before extending 
> 
> 
> 2.. Display disk information on the logical volume 
> 
> lvdisplay -v /dev/vg00/lvol4 | more 
> 
> 
> a.. Make sure this is enough Free PE's to increase this filesystem. 
> b.. Make sure that allocation is NOT strict/contiguous. 
> 
> 
> 3.. Reboot the machine 
> 
> shutdown -r now 
> 
> 
> 4.. When prompted, press "ESC" to interrupt the boot. 
> 
> 
> 5.. Boot from the primary device and invoke ISL interaction. 
> 
> bo pri isl 
> 
> NOTE: If prompted to interact with ISL, respond "y" 
> 
> 
> 6.. Boot into single user mode 
> 
> hpux -is 
> 
> NOTE:Nothing will be mounted. 
> 
> 
> 7.. Extend the logical volume that holds the filesystem. 
> 
> /sbin/lvextend -L 400 /dev/vg00/lvol4 
> 
> 
> 8.. Extend the file system. 
> 
> /sbin/extendfs -F hfs /dev/vg00/rlvol4 
> 
> NOTE: The use of the character device. 
> 
> 
> 9.. Ensure the filesystem now reports to be the new size 
> 
> bdf 
> 
> 
> 10.. Reboot the system to its normal running state. 
> 
> shutdown -r now 
> 
> 
> 
The only thing is that you have to have contiguous lvols to do that. The 
best way is to do an Ignite make_tape_recovery -i for vg00 and then 
resize it when you recreate it. If you have vg00 on a seperate disk then 
it is real easy, the backup can run in the background, and the restore 
interactive will take about 2.5 hours for a 9GB root disk, you can make 
the lvols any size you want and it also puts it back in place in order 
so you save space. 


Example 4:
----------

The right way to extend a file system with "OnLine jfs" is using the command "fsadm".
For example, if you want to extend the fs /mk2/toto in the
/dev/vgmk2/lvtoto in from 50Mbytes to 60 you must extend de logical volume

# lvextend -L 60 /dev/vgmk2/lvtoto

Now use fsadm ( I supose you have vxfs, if you are using hfs is not
possible to increase on-line, or at least I don't know how ).

# fsadm -F vxfs -b 61440 /mk2/toto

You will have your fs increased on line ... be carefull if your fs is 100% occupied the comand fsadm will fail, you
need some free space on the file system ( it depends on the fs type, size etc ..).

In general, Online jfs should be increased in the following way:

lvextend -L ???? /dev/vg??/lvol??

fsadm -F vxfs -b ????? /<filesystem name>

oranh300:/home/se1223>cat /etc/inittab | grep enab
vxen::bootwait:/sbin/fs/vxfs/vxenablef -a


Note 4:
-------

Extend OnlineJFS licenses on next D&ST servers:
aavnh400
oranh503
oranh603
orazh500
orazh601
orazh602

commands are:
swagentd -r
swinstall -x mount_all_filesystems=false -x enforce_dependencies=true -s hpdepot.ao.nl.abnamro.com:/beheer/depot/OnlineJFS_License OnlineJFS
swagentd -k



HP-UX errors: Error 23 filetable overflow:
------------------------------------------

Error: 23 is a infamous error, as shown in this thread:

thread:

Doc ID: Note:1018306.102 
Problem Description:
====================
You are backing up your database and are getting the following errors:

HP-UX Error 23: file table overflow

RMAN-569 file not found
LEM-00031 file not found
LEM-00033 lempgfm couldn't open message file
RMAN indicates that Recovery Manager is complete, however the database
and the catalog are not resync'd.
Problem Explanation:
====================
Recovery Manager cannot find or open the message file.
Search Words:
=============
Recovery Manager, LEM-33, LEM-31, RMAN-00569, message file, lempgfm,
error 23, HPUX error 23, HP-UX error 23
Solution Description:
=====================
You may need to increase the value of the unix kernel parameter 'nfile'.
Solution Explanation:
=====================
'nfile' needs to have a value in the thousands for a database server. 
If this parameter is < 1000, increase it to something like 5000 or 
greater. If there is enough memory on your system, this parameter can
be set to values > 30000.



35.5 Some important filesystem related kernel params:
=====================================================


nfile:
------

nfile defines the maximum number of files that can be open simultaneously, system-wide, at any given time.

Acceptable Values:
Minimum 
14 
Maximum 
Memory limited 
Default 
((16*(Nproc+16+MaxUsers)/10)+32+2*(Npty+Nstrpty) 

Specify integer value or use integer formula expression. For more information, see Specifying Parameter Values.

Description
nfile defines the maximum number files that can be open at any one time, system-wide.
It is the number of slots in the file descriptor table. Be generous with this number because the required memory 
is minimal, and not having enough slots restricts system processing capacity.

Related Parameters and System Factors
The value used for nfile must be sufficient to service the number of users and processes allowed by the combination 
of nproc, maxusers, npty , and nstrpty.

Every process uses at least three file descriptors per process (standard input, standard output, 
and standard error).

Every process has two pipes per process (one per side), each of which requires a pty. Stream pipes also use s
treams ptys which are limited by nstrpty.



35.6 HP-UX kernel parameters:
=============================

Take especially notice of the parameters nfile, nflocks, ninodes, nprocs.
They determine how many open files, open locks, simultaneous processes are possible *system-wide*.
Too low values may result in HP-UX errors when dealing with larger databases, huge App Servers
and the like.

Entering Values: 
 
Use the kcweb web interface or the kmtune command to view and change values. kcweb is described 
in the kcweb(1M) manpage and in the program's help topics. You can run kcweb from the command line 
or from the System Administration Manager (SAM); see sam(1M). You run kmtune from the command line; 
see kmtune(1M) for details.




Accounting
 acctresume Resume accounting when free space on the file system where accounting log files reside rises above acctresume plus minfree percent of total usable file system size. Manpage: acctsuspend(5).
 
Accounting
 acctsuspend
 Suspend accounting when free space on the file system where accounting log files reside drops below acctsuspend plus minfree percent of total usable file system size. Manpage: acctsuspend(5).
 
Asynchronous I/O
 aio_listio_max
 Maximum number of POSIX asynchronous I/O operations allowed in a single lio_listio() call. Manpage: aio_listio_max(5).
 
Asynchronous I/O
 aio_max_ops
 System-wide maximum number of POSIX asynchronous I/O operations allowed at one time. Manpage: aio_max_ops(5).
 
Asynchronous I/O
 aio_physmem_pct
 Maximum percentage of total system memory that can be locked for use in POSIX asynchronous I/O operations. Manpage: aio_physmem_pct(5).
 
Asynchronous I/O
 aio_prio_delta_max
 Maximum priority offset (slowdown factor) allowed in a POSIX asynchronous I/O control block (aiocb). Manpage: aio_prio_delta_max(5).
 
Memory Paging
 allocate_fs_swapmap
 Enable or disable preallocation of file system swap space when swapon() is called as opposed to allocating swap space when malloc() is called. Enabling allocation reduces risk of insufficient swap space and is used primarily where high availability is important. Manpage: allocate_fs_swapmap(5).
 
Kernel Crash Dump
 alwaysdump
 Select which classes of system memory pages are to be dumped if a kernel panic occurs. Manpage: alwaysdump(5).
 
Spinlock Pool
 bufcache_hash_locks
 Buffer-cache spinlock pool. NO MANPAGE. 
 
File System: Buffer
 bufpages
 Number of 4 KB pages in file system static buffer cache. Manpage: bufpages(5).
 
Spinlock Pool
 chanq_hash_locks
 Channel queue spinlock pool. Manpage: chanq_hash_locks(5).
 
IPC: Share
 core_addshmem_read
 Flag to include readable shared memory in a process core dump. Manpage: core_addshmem_read(5).
 
IPC: Share
 core_addshmem_write
 Flag to include read/write shared memory in a process core dump. Manpage: core_addshmem_write(5).
 
Miscellaneous: Links
 create_fastlinks
 Create fast symbolic links using a newer, more efficient format to improve access speed by reducing disk block accesses during path name look-up sequences. Manpage: create_fastlinks(5).
 
File System: Buffer
 dbc_max_pct
 Maximum percentage of memory for dynamic buffer cache. Manpage: dbc_max_pct(5).
 
File System: Buffer
 dbc_min_pct
 Minimum percentage of memory for dynamic buffer cache. Manpage: dbc_min_pct(5).
 
Miscellaneous: Disk I/O
 default_disk_ir
 Immediate reporting for disk writes; whether a write() returns immediately after the data is placed in the disk's write buffer or waits until the data is physically stored on the disk media. Manpage: default_disk_ir(5).
 
File System: Buffer
 disksort_seconds
 Maximum wait time for disk requests. NO MANPAGE.
 
Miscellaneous: Disk I/O
 dma32_pool_size
 Amount of memory to set aside for 32-bit DMA (bytes). Manpage: dma32_pool_size(5).
 
Spinlock Pool
 dnlc_hash_locks
 Number of locks for directory cache synchronization. NO MANPAGE.
 
Kernel Crash Dump
 dontdump
 Select which classes of system memory pages are not to be dumped if a kernel panic occurs. Manpage: dontdump(5).
 
Miscellaneous: Clock
 dst
 Enable/disable daylight savings time. Manpage: timezone(5).
 
Miscellaneous: IDS
 enable_idds
 Flag to enable the IDDS daemon, which gathers data for IDS/9000. Manpage: enable_idds(5).
 
Miscellaneous: Memory
 eqmemsize
 Number of pages of memory to be reserved for equivalently mapped memory, used mostly for DMA transfers. Manpage: eqmemsize(5).
 
ProcessMgmt: Process
 executable_stack
 Allows or denies program execution on the stack. Manpage: executable_stack(5).
 
File System: Write
 fs_async
 Enable/disable asynchronous writes of file system data structures to disk. Manpage: fs_async(5).
 
Spinlock Pool
 ftable_hash_locks
 File table spinlock pool. NO MANPAGE. 
 
Spinlock Pool
 hdlpreg_hash_locks
 Set the size of the pregion spinlock pool. Manpage: hdlpreg_hash_locks(5).
 
File System: Read
 hfs_max_ra_blocks
 The maximum number of read-ahead blocks that the kernel may have outstanding for a single HFS file system. Manpage: hfs_max_ra_blocks(5).
 
File System: Read
 hfs_max_revra_blocks
 The maximum number of reverse read-ahead blocks that the kernel may have outstanding for a single HFS file system. Manpage: hfs_max_revra_blocks(5).
 
File System: Read
 hfs_ra_per_disk
 The amount of HFS file system read-ahead per disk drive, in KB. Manpage: hfs_ra_per_disk(5).
 
File System: Read
 hfs_revra_per_disk
 The amount of memory (in KB) for HFS reverse read-ahead operations, per disk drive. Manpage: hfs_revra_per_disk(5).
 
File System: Read
 hp_hfs_mtra_enabled
 Enable or disable HFS multithreaded read-ahead. NO MANPAGE.
 
Kernel Crash Dump
 initmodmax
 Maximum size of the dump table of dynamically loaded kernel modules. Manpage: initmodmax(5).
 
Spinlock Pool
 io_ports_hash_locks I/O port spinlock pool. NO MANPAGE.  
Miscellaneous: Queue
 ksi_alloc_max
 Maximum number of system-wide queued signals that can be allocated. Manpage: ksi_alloc_max(5).
 
Miscellaneous: Queue
 ksi_send_max
 Maximum number of queued signals that a process can send and have pending at one or more receivers. Manpage: ksi_send_max(5).
 
ProcessMgmt: Memory
 maxdsiz
 Maximum process data storage segment space that can be used for statics and strings, as well as dynamic data space allocated by sbrk() and malloc() (32-bit processes). Manpage: maxdsiz(5).
 
ProcessMgmt: Memory
 maxdsiz_64bit
 Maximum process data storage segment space that can be used for statics and strings, as well as dynamic data space allocated by sbrk() and malloc() (64-bit processes). Manpage: maxdsiz(5).
 
File System: Open/Lock
 maxfiles
 Soft limit on how many files a single process can have opened or locked at any given time. Manpage: maxfiles(5).
 
File System: Open/Lock
 maxfiles_lim
 Hard limit on how many files a single process can have opened or locked at any given time. Manpage: maxfiles_lim(5).
 
ProcessMgmt: Memory
 maxrsessiz
 Maximum size (in bytes) of the RSE stack for any user process on the IPF platform. Manpage: maxrsessiz(5).
 
ProcessMgmt: Memory
 maxrsessiz_64bit
 Maximum size (in bytes) of the RSE stack for any user process on the IPF platform. Manpage: maxrsessiz(5).
 
ProcessMgmt: Memory
 maxssiz
 Maximum dynamic storage segment (DSS) space used for stack space (32-bit processes). Manpage: maxssiz(5).
 
ProcessMgmt: Memory
 maxssiz_64bit
 Maximum dynamic storage segment (DSS) space used for stack space (64-bit processes). Manpage: maxssiz(5).
 
ProcessMgmt: Memory
 maxtsiz
 Maximum allowable process text segment size, used by unchanging executable-code (32-bit processes). Manpage: maxtsiz(5).
 
ProcessMgmt: Memory
 maxtsiz_64bit
 Maximum allowable process text segment size, used by unchanging executable-code (64-bit processes). Manpage: maxtsiz(5).
 
ProcessMgmt: Process
 maxuprc
 Maximum number of processes that any single user can have running at the same time, including login shells, user interface processes, running programs and child processes, I/O processes, etc. If a user is using multiple, simultaneous logins under the same login name (user ID) as is common in X Window, CDE, or Motif environments, all processes are combined, even though they may belong to separate process groups. Processes that detach from their parent process group, where that is possible, are not counted after they detach (line printer spooler jobs, certain specialized applications, etc.). Manpage: maxuprc(5).
 
Miscellaneous: Users
 maxusers
 Maximum number of users expected to be logged in on the system at one time; used by other system parameters to allocate system resources. Manpage: maxusers(5).
 
File System: LVM
 maxvgs
 Maximum number of volume groups configured by the Logical Volume Manager on the system. Manpage: maxvgs(5).
 
Accounting
 max_acct_file_size
 Maximum size of the accounting file. Manpage: max_acct_file_size(5).
 
Asynchronous I/O
 max_async_ports
 System-wide maximum number of ports to the asynchronous disk I/O driver that processes can have open at any given time. Manpage: max_async_ports(5).
 
Memory Paging
 max_mem_window
 Maximum number of group-private 32-bit shared memory windows. Manpage: max_mem_window(5).
 
ProcessMgmt: Threads
 max_thread_proc
 Maximum number of threads that any single process can create and have running at the same time. Manpage: max_thread_proc(5).
 
IPC: Message
 mesg
 Enable or disable IPC messages at system boot time. Manpage: mesg(5).
 
Kernel Crash Dump
 modstrmax
 Maximum size, in bytes, of the savecrash kernel module table that contains module names and their locations in the file system. Manpage: modstrmax(5).
 
IPC: Message
 msgmap
 Size of free-space resource map for allocating shared memory space for messages. Manpage: msgmap(5).
 
IPC: Message
 msgmax
 System-wide maximum size (in bytes) for individual messages. Manpage: msgmax(5).
 
IPC: Message
 msgmnb
 Maximum combined size (in bytes) of all messages that can be queued simultaneously in a message queue. Manpage: msgmnb(5).
 
IPC: Message
 msgmni
 Maximum number of message queues allowed on the system at any given time. Manpage: msgmni(5).
 
IPC: Message
 msgseg
 Maximum number of message segments that can exist on the system. Manpage: msgseg(5).
 
IPC: Message
 msgssz
 Message segment size in bytes. Manpage: msgssz(5).
 
IPC: Message
 msgtql
 Maximum number of messages that can exist on the system at any given time. Manpage: msgtql(5).
 
File System: Buffer
 nbuf
 System-wide number of static file system buffer and cache buffer headers. Manpage: nbuf(5).
 
Miscellaneous: CD
 ncdnode
 Maximum number of entries in the vnode table and therefore the maximum number of open CD-ROM file system nodes that can be in memory. Manpage: ncdnode(5).
 
Miscellaneous: Terminal
 nclist
 Maximum number of cblocks available for data transfers through tty and pty devices. Manpage: nclist(5).
 
File System: Open/Lock
 ncsize
 Inode space needed for directory name lookup cache (DNLC). NO MANPAGE.
 
File System: Open/Lock
 nfile
 Maximum number of files that can be open simultaneously on the system at any given time. Manpage: nfile(5).
 
File System: Open/Lock
 nflocks
 Maximum combined number of file locks that are available system-wide to all processes at one time. Manpage: nflocks(5).
 
File System: Open/Lock
 ninode
 Maximum number of open inodes that can be in memory. Manpage: ninode(5).
 
ProcessMgmt: Threads
 nkthread
 Maximum number of kernel threads allowed on the system at the same time. Manpage: nkthread(5).
 
ProcessMgmt: Process
 nproc
 Defines the maximum number of processes that can be running simultaneously on the entire system, including remote execution processes initiated by other systems via remsh or other networking commands. Manpage: nproc(5).
 
Miscellaneous: Terminal
 npty
 Maximum number of pseudo-tty entries allowed on the system at any one time. Manpage: npty(5).
 
Streams
 NSTREVENT
 Maximum number of outstanding streams bufcalls that are allowed to exist at any given time on the system. This number should be equal to or greater than the maximum bufcalls that can be generated by the combined total modules pushed onto any given stream, and serves to limit run-away bufcalls. Manpage: nstrevent(5).
 
Miscellaneous: Terminal
 nstrpty
 System-wide maximum number of streams-based pseudo-ttys that are allowed on the system. Manpage: nstrpty(5).
 
Streams
 nstrpty
 System-wide maximum number of streams-based pseudo-ttys that are allowed on the system. Manpage: nstrpty(5).
 
Streams
 NSTRPUSH
 Maximum number of streams modules that are allowed to exist in any single stream at any one time on the system. This provides a mechanism for preventing a software defect from attempting to push too many modules onto a stream, but it is not intended as adequate protection against malicious use of streams. Manpage: nstrpush(5).
 
Streams
 NSTRSCHED
 Maximum number of streams scheduler daemons that are allowed to run at any given time on the system. This value is related to the number of processors installed in the system. Manpage: nstrsched(5).
 
Miscellaneous: Terminal
 nstrtel
 Number of telnet session device files that are available on the system. Manpage: nstrtel(5).
 
Memory Paging
 nswapdev
 Maximum number of devices, system-wide, that can be used for device swap. Set to match actual system configuration. Manpage: nswapdev(5).
 
Memory Paging
 nswapfs
 Maximum number of mounted file systems, system-wide, that can be used for file system swap. Set to match actual system configuration. Manpage: nswapfs(5).
 
Miscellaneous: Memory
 nsysmap
 Number of entries in the kernel dynamic memory virtual address space resource map (32-bit processes). Manpage: nsysmap(5).
 
Miscellaneous: Memory
 nsysmap64
 Number of entries in the kernel dynamic memory virtual address space resource map (64-bit processes). Manpage: nsysmap(5).
 
Miscellaneous: Disk I/O
 o_sync_is_o_dsync
 Specifies whether an open() or fcntl() with the O_SYNC flag set can be converted to the same call with the O_DSYNC flag instead. This controls whether the function can return before updating the file access. NO MANPAGE.
 
ProcessMgmt: Memory
 pa_maxssiz_32bit
 Maximum size (in bytes) of the stack for a user process running under the PA-RISC emulator on IPF. Manpage: pa_maxssiz(5).
 
ProcessMgmt: Memory
 pa_maxssiz_64bit
 Maximum size (in bytes) of the stack for a user process running under the PA-RISC emulator on IPF. Manpage: pa_maxssiz(5).
 
Spinlock Pool
 pfdat_hash_locks
 Pfdat spinlock pool. Manpage: pfdat_hash_locks(5).
 
Miscellaneous: Disk I/O
 physical_io_buffers
 Total buffers for physical I/O operations. Manpage: physical_io_buffers(5).
 
Spinlock Pool
 region_hash_locks
 Process-region spinlock pool. Manpage: region_hash_locks(5).
 
Memory Paging
 remote_nfs_swap
 Enable or disable swap to mounted remote NFS file system. Used on cluster clients for swapping to NFS-mounted server file systems. Manpage: remote_nfs_swap(5).
 
Miscellaneous: Schedule
 rtsched_numpri
 Number of distinct real-time interrupt scheduling priority levels are available on the system. Manpage: rtsched_numpri(5).
 
Miscellaneous: Terminal
 scroll_lines
 Defines the number of lines that can be scrolled on the internal terminal emulator (ITE) system console. Manpage: scroll_lines(5).
 
File System: SCSI
 scsi_maxphys
 Maximum record size for the SCSI I/O subsystem, in bytes. Manpage: scsi_maxphys(5).
 
File System: SCSI
 scsi_max_qdepth
 Maximum number of SCSI commands queued up for SCSI devices. Manpage: scsi_max_qdepth(5).
 
ProcessMgmt: Process
 secure_sid_scripts
 Controls whether setuid and setgid bits on scripts are honored. Manpage: secure_sid_scripts(5).
 
IPC: Semaphore
 sema
 Enable or disable IPC semaphores at system boot time. Manpage: sema(5).
 
IPC: Semaphore
 semaem
 Maximum value by which a semaphore can be changed in a semaphore "undo" operation. Manpage: semaem(5).
 
IPC: Semaphore
 semmni
 Maximum number of sets of IPC semaphores allowed on the system at any one time. Manpage: semmni(5).
 
IPC: Semaphore
 semmns
 Maximum number of individual IPC semaphores available to system users, system-wide. Manpage: semmns(5).
 
IPC: Semaphore
 semmnu
 Maximum number of processes that can have undo operations pending on any given IPC semaphore on the system. Manpage: semmnu(5).
 
IPC: Semaphore
 semmsl
 Maximum number of individual System V IPC semaphores per semaphore identifier. Manpage: semmsl(5).
 
IPC: Semaphore
 semume
 Maximum number of IPC semaphores that a given process can have undo operations pending on. Manpage: semume(5).
 
IPC: Semaphore
 semvmx
 Maximum value any given IPC semaphore is allowed to reach (prevents undetected overflow conditions). Manpage: semvmx(5).
 
Miscellaneous: Web
 sendfile_max
 The amount of buffer cache that can be used by the sendfile() system call on HP-UX web servers. Manpage: sendfile_max(5).
 
IPC: Share
 shmem
 Enable or disable shared memory at system boot time. Manpage: shmem(5).
 
IPC: Share
 shmmax
 Maximum allowable shared memory segment size (in bytes). Manpage: shmmax(5).
 
IPC: Share
 shmmni
 Maximum number of shared memory segments allowed on the system at any given time. Manpage: shmmni(5).
 
IPC: Share
 shmseg
 Maximum number of shared memory segments that can be attached simultaneously to any given process. Manpage: shmseg(5).
 
Streams
 STRCTLSZ
 Maximum number of control bytes allowed in the control portion of any streams message on the system. Manpage: strctlsz(5).
 
Streams
 streampipes
 Force all pipes to be streams-based. Manpage: streampipes(5).
 
Streams
 STRMSGSZ
 Maximum number of bytes that can be placed in the data portion of any streams message on the system. Manpage: strmsgsz(5).
 
File System: SCSI
 st_ats_enabled
 Flag whether to reserve a tape device on open. Manpage: st_ats_enabled(5).
 
File System: SCSI
 st_fail_overruns
 SCSI tape read resulting in data overrun causes failure. Manpage: st_fail_overruns(5).
 
File System: SCSI
 st_large_recs
 Enable large record support for SCSI tape. Manpage: st_large_recs(5).
 
Memory Paging
 swapmem_on
 Enable or disable pseudo-swap allocation. This allows systems with large installed memory to allocate memory space as well as disk swap space for virtual memory use instead of restricting availability to defined disk swap area. Manpage: swapmem_on(5).
 
Memory Paging
 swchunk
 Amount of space allocated for each chunk of swap area. Chunks are allocated from device to device by the kernel. Changing this parameter requires extensive knowledge of system internals. Without such knowledge, do not change this parameter from the normal default value. Manpage: swchunk(5).
 
Spinlock Pool
 sysv_hash_locks
 System V interprocess communication spinlock pool. Manpage: sysv_hash_locks(5).
 
Miscellaneous: Network
 tcphashsz
 TCP hash table size, in bytes. Manpage: tcphashsz(5).
 
ProcessMgmt: CPU
 timeslice
 Maximum time a process can use the CPU until it is made available to the next process having the same process execution priority. This feature also prevents runaway processes from causing system lock-up. Manpage: timeslice(5).
 
Miscellaneous: Clock
 timezone
 The offset between the local time zone and Coordinated Universal Time (UTC), often called Greenwich Mean Time or GMT. Manpage: timezone(5).
 
Miscellaneous: Memory
 unlockable_mem
 Amount of system memory to be reserved for system overhead and virtual memory management, that cannot be locked by user processes. Manpage: unlockable_mem(5).
 
Spinlock Pool
 vnode_cd_hash_locks
 Vnode clean/dirty spinlock pool. NO MANPAGE. 
 
Spinlock Pool
 vnode_hash_locks
 Vnode spinlock pool. NO MANPAGE. 
 
Memory Paging: Size
 vps_ceiling
 Maximum system-selected page size (in KB) if the user does not specify a page size. Manpage: vps_ceiling(5).
 
Memory Paging: Size
 vps_chatr_ceiling
 Maximum page size a user can specify with the chatr command in a program. Manpage: vps_chatr_ceiling(5).
 
Memory Paging: Size
 vps_pagesize
 Minimum user page size (in KB) if no page size is specified using chatr. Manpage: vps_pagesize(5).
 
File System: Journaled
 vxfs_max_ra_kbytes
 Maximum amount of read-ahead data, in KB, that the kernel may have outstanding for a single VxFS file system. Manpage: vxfs_max_ra_kbytes(5).
 
File System: Read
 vxfs_max_ra_kbytes
 Maximum amount of read-ahead data, in KB, that the kernel may have outstanding for a single VxFS file system. Manpage: vxfs_max_ra_kbytes(5).
 
File System: Journaled
 vxfs_ra_per_disk
 Maximum amount of VxFS file system read-ahead per disk, in KB. Manpage: vxfs_ra_per_disk(5).
 
File System: Read
 vxfs_ra_per_disk
 Maximum amount of VxFS file system read-ahead per disk, in KB. Manpage: vxfs_ra_per_disk(5).
 
File System: Journaled
 vx_fancyra_enable
 Enable or disable VxFS file system read-ahead. NO MANPAGE.
 
File System: Journaled
 vx_maxlink
 Number of subdirectories created within a directory. NO MANPAGE.
 
File System: Journaled
 vx_ncsize
 Memory space reserved for VxFS directory path name cache. Manpage: vx_ncsize(5).
 
File System: Journaled
 vx_ninode
 Number of entries in the VxFS inode table. NO MANPAGE
 


Some HP-UX troubleshooting tips:
--------------------------------


Where to get information about problems:

dmesg  --> provides a finite list of diagnostic messages 
/var/adm/syslog/syslog.log  -->  system log 
/opt/resmon/log/error.log  --> 
/etc/shutdownlog  --> shutdown information 
/etc/rc.log  -->  system startup log 
/var/tombstones/ts99  --> crash analysis file 
cstm  -  command line support tool manager

mstm  -  menu based support tool manager 
<alt><underlined letter of command> 
<tab>  -->  to move to another portion of the screen, such as the drop down menu area 
Service Processor 
<ctrl> <b> from a serial console 
he  - help 
co  - return to console mode (exits the program) 
sl  - show log



Panic Reboots

Check these files for clues:

/var/tombstones/ts99 
/etc/shutdownlog 

Bad disk

1.  Check the syslog (/var/adm/syslog/syslog.log) looking for disk errors.
2.  Check the ioscan (ioscan -fnC disk), looking for NO_HW rather than Claimed.
3.  If diaglogd is running then check STM logs (/var/opt/resmon/log/event.log)
4.  Check the volume group to see if the disk is listed and whether there is any problem  with it's status (vgdisplay -v | more)
5.  Check lvmtab to see if the disk is supposed to be in a volume group (strings /etc/lvmtab | more)

Filesystem do not mount after a reboot

1.  Reactivate the Volume Group  -->  vgchange -a y /dev/<volume group>
2.  Remount the filesystems  -->  mount -a
3.  If still no success then perform a filesystem check  -->  fsck /dev/<volumegroup>/<logicalvolume>
4.  Remount the filesystems  -->  mount -a
5.  Check to see if all the filesystems are there:
          a)  bdf
          b)  compare with /etc/fstab

Filesystem full

du -kx / | sort -rn | more 
du -akx | sort -nr | more 

Shows directories on the local filesystem and how much space they are taking up

NFS mount - Permission Denied

1.  Check to see if the format of the /etc/exports file is correct on the server that is the nfs server.

2.  exportfs -av to export the filesystem

3.  Check the /etc/fstab file on the client to make sure that it is correct

4.  /usr/sbin/showmount -e <server>  on the client to show what is being exported

5.  To bypass the /etc/exports file execute the following on the nfs server:     exportfs -i -o rw <filesystem>. 


NFS Server

/etc/rc.config.d/nfsconf  --> NFS_SERVER=1

Verify the proper processes are running:

/sbin/init.d/nfs.server stop

The processes should NOT be running:

# ps -ef|grep nfsd
# ps -ef|grep rpc.mountd
# ps -ef|grep rpc.lockd
# ps -ef|grep rpc.statd

/sbin/init.d/nfs.server start

These processes should be running:

# ps -ef|grep nfsd
    root  3444     1  0 10:39:12 ?         0:00 /usr/sbin/nfsd 4
    root  3451  3444  0 10:39:12 ?         0:00 /usr/sbin/nfsd 4
    root  3449  3444  0 10:39:12 ?         0:00 /usr/sbin/nfsd 4
    root  3445  3444  0 10:39:12 ?         0:00 /usr/sbin/nfsd 4
# ps -ef|grep rpc.mountd
    root  3485     1  0 10:42:09 ?         0:00 rpc.mountd
# ps -ef|grep rpc.lockd
    root  3459     1  0 10:39:12 ?         0:00 /usr/sbin/rpc.lockd
# ps -ef|grep rpc.statd
    root  3453     1  0 10:39:12 ?         0:00 /usr/sbin/rpc.statd


To start a process if it is not running:

# ps -ef|grep rpc.mountd
# rpc.mountd   or  /usr/sbin/rpc.mountd
# ps -ef|grep rpc.mountd
    root  3485     1  0 10:42:09 ?         0:00 rpc.mountd


/etc/inetd.conf needs to have the proper services active (not commented out)

##
# WARNING: The rpc.mountd should now be started from a startup script.
#          Please enable the mountd startup script to start rpc.mountd.
##
#rpc  stream tcp  nowait  root  /usr/sbin/rpc.rexd     100017  1    rpc.rexd
rpc  dgram  udp  wait    root  /usr/lib/netsvc/rstat/rpc.rstatd   100001  2-4  rpc.rstatd
rpc  dgram  udp  wait    root  /usr/lib/netsvc/rusers/rpc.rusersd  100002  1-2  rpc.rusersd
rpc  dgram  udp  wait    root  /usr/etc/rpc.mountd  100005  1  rpc.mountd -e
rpc  dgram  udp  wait    root  /usr/lib/netsvc/rwall/rpc.rwalld   100008  1    rpc.rwalld
#rpc  dgram  udp  wait    root  /usr/sbin/rpc.rquotad  100011  1    rpc.rquotad
rpc  dgram  udp  wait    root  /usr/lib/netsvc/spray/rpc.sprayd   100012  1    rpc.sprayd



NIC problems:

The lanadmin utility provides NIC statistics
The nettladmin utility provides packet trace information


Replacing a Mirrored Root Disk:

Replace the disk 
Hot swap can be performed while system is up 
Not hot swappable means the system must be brought down 
Reboot the system into single user mode 
shutdown -r 0, unless the system is powered off already, then power it back on

interrupt the boot 
bo pri (or bo alt if the disk that was replaced was the primary boot disk) 
IPL>hpux -is -lq (;0)/stand/vmunix 
vgcfgrestore -n /dev/vg00 /dev/rdsk/c?t?d? 
vgsync /dev/vg00 
mkboot /dev/rdsk/c?t?d? 
mkboot -a "hpux -lq (;0)/stand/vmunix" /dev/rdsk/c?t?d? 
shutdown -r 0 
lvlnboot -v /dev/vg00  to verify that the disk is seen as bootable



Software Installation (swinstall, sd, etc)
    ERROR:   "server::/tmp/omni_tmp/packet":  You do not have the
         required permissions to perform this operation.  Check
         permissions using the "swacl" command or see your system
         administrator for assistance.  Or, to manage applications
         designed and packaged for nonprivileged mode, see the
         "run_as_superuser" option in the "sd" man page.
WARNING: More information may be found in the daemon logfile on this
         target (default location is
         server:/var/adm/sw/swagentd.log).

Bounce swagentd daemon:  /usr/sbin/swagentd -r




36. Some remarks about VI:
==========================

Before you run vi:
------------------

If you've connected to a central UCS computer to use vi, first tell that host about your communications software 
(e.g., NCSA Telnet). At IUB, your software will typically emulate a VT-100 terminal. 
To find out what shell program you use, type:

echo $SHELL 

Then if you use ksh, bash, or sh, type:
TERM=vt100; export TERM 

If you use csh or tcsh, type:
set term = vt100 

You can automate this task by adding the appropriate command to your default command shell's configuration file. 

Using vi modes:
---------------

Vi has three "modes": edit, insert, and colon.

- Edit mode (press Esc)
Vi enters edit mode by default when it starts up. Edit mode allows you to move the cursor and 
edit the text buffer. 

- Insert mode (press i)
Insert mode "drops" the cursor at a specific point in the buffer, allowing you to insert text. 
To enter insert mode, position the cursor where you want to place text and press i. 

If you make a typing mistake, press ESC to return to edit mode and then reposition the cursor at the error, 
and press i to get back to insert mode.

- Colon mode (press : with a command)
You enter colon mode from edit mode by typing a colon followed by a command. Some useful commands are:

:w           Write buffer to the current filename.
:w newname   Write buffer to file newname.
:r           Read the current filename into the buffer.
:r oldname   Read the file oldname into the buffer.
:q!          Quit vi without saving buffer.
:wq          Write buffer to current filename and quit vi.
:e filename  Close current buffer and edit (open) filename.
:e #         Close current buffer and edit (open) previous file.

Search and Replace:
-------------------

Replace: Same as with sed, Replace OLD with NEW: 
ESC,

 First occurrence on current line:      :s/OLD/NEW
 Globally (all) on current line:        :s/OLD/NEW/g 
 Between two lines #,#:                 :#,#s/OLD/NEW/g
 Every occurrence in file:              :%s/OLD/NEW/g 


The VI editor has two kinds of searches: string and character. For a string search, the / and ? commands are used. 
When you start these commands, the command just typed will be shown on the bottom line, where you type the particular 
string to look for. These two commands differ only in the direction where the search takes place. 
The / command searches forwards (downwards) in the file, while the ? command searches backwards (upwards) in the file. 
The n and N commands repeat the previous search command in the same or opposite direction, respectively. 
Some characters have special meanings to VI, so they must be preceded by a backslash (\) to be included as part 
of the search expression. 


Searching for Text:
------------------- 

  /text   Search forward (down) for text (text can include spaces
                      and characters with special meanings.)
  
  ?text   Search backward (up) for text
  
  n       Repeat last search in the same direction
  
  N       Repeat last search in the opposite direction
  
  fchar   Search forward for a charcter on current line
  
  Fchar   Search backward for a character on current line
  
  ;       Repeat last character search in the same direction
  
  %       Find matching ( ), { }, or [ ] 


Some other:
-----------

Moving half screens Up or Down: Ctrl-U, Ctrl-D

showing line numbers:

:set number

If you tire of the line numbers, enter the following command to turn them off:

:set nonumber





36. ulimit:
===========

limit, ulimit, unlimit - set or get limitations on the  system resources available to the current shell and its 
descendents.

/usr/bin/ulimit
     Example 1:  Limiting the stack size

     To limit the stack size to 512 kilobytes:

     example% ulimit -s 512
     example% ulimit -a
     time(seconds)         unlimited
     file(blocks)            100
     data(kbytes)            523256
     stack(kbytes)           512
     coredump(blocks)        200
     nofiles(descriptors)    64
     memory(kbytes)          unlimited


ULIMIT - Sets the file size limit for the login. Units are disk blocks. Default is zero (no limit). 
Be sure to specify even numbers, as the ULIMIT variable accepts a number of 512-byte blocks.


$ ulimit -a    # Display limits for your session under sh or ksh
$ limit        # Display limits for your session under csh or tcsh
$ ulimit -c SIZE_IN_BLOCKS       # Limit core size under sh or ksh
$ limit coredumpsize SIZE_IN_KB  # Limit core size under csh or tcsh

If you see a core file lying around, just type "file core" to get some details about it. Example: 
$ file core
  core:ELF-64 core file - PA-RISC 2.0 from 'sqlplus' - received SIGABRT


Run the Unix process debugger to obtain more information about where and why the process abended. 
This information is normally requested by Oracle Support for in-depth analysis of the problem. Some example: 

      Solaris:
          $ gdb $ORACLE_HOME/bin/sqlplus core
            bt                 # backtrace of all stack frames
            quit

      HP-UX, Solaris, etc:
          $ adb $ORACLE_HOME/bin/sqlplus core
            $c
            $q

      Sequent:
          $ debug -c core $ORACLE_HOME/bin/sqlplus
          debug> stack
          debug> quit

AIX:


Purpose
Sets or reports user resource limits.

Syntax
ulimit [ -H ] [ -S ] [ -a ] [ -c ] [ -d ] [  -f ] [ -m ] [ -n ] [ -s ] [ -t ] [ Limit ]

Description
The ulimit command sets or reports user process resource limits, as defined in the /etc/security/limits file. 
This file contains these default limits:


fsize = 2097151
core = 2097151
cpu = -1
data = 262144
rss = 65536
stack = 65536
nofiles = 2000

These values are used as default settings when a new user is added to the system. The values are set with the 
mkuser command when the user is added to the system, or changed with the chuser command.

Limits are categorized as either soft or hard. With the ulimit command, you can change your soft limits, 
up to the maximum set by the hard limits. You must have root user authority to change resource hard limits.

Many systems do not contain one or more of these limits. The limit for a specified resource is set when the 
Limit parameter is specified. The value of the Limit parameter can be a number in the unit specified with 
each resource, or the value unlimited. To set the specific ulimit to unlimited, use the word unlimited


Note: Setting the default limits in the /etc/security/limits file sets system wide limits, not just limits 
taken on by a user when that user is created.
The current resource limit is printed when you omit the Limit parameter. The soft limit is printed unless 
you specify the -H flag. When you specify more than one resource, the limit name and unit is printed 
before the value. If no option is given, the -f flag is assumed.

Since the ulimit command affects the current shell environment, it is provided as a shell regular built-in command. 
If this command is called in a separate command execution environment, it does not affect the file size limit of 
the caller's environment. This would be the case in the following examples:


nohup ulimit -f 10000
env ulimit 10000
Once a hard limit has been decreased by a process, it cannot be increased without root privilege, even to revert 
to the original limit.

For more information about user and system resource limits, refer to the getrlimit, setrlimit, or vlimit 
subroutine in AIX 5L Version 5.2 Technical Reference: Base Operating System and Extensions Volume 1.

Flags

-a Lists all of the current resource limits. 
-c Specifies the size of core dumps, in number of 512-byte blocks. 
-d Specifies the size of the data area, in number of K bytes. 
-f Sets the file size limit in blocks when the Limit parameter is used, or reports the file size limit if no parameter is specified. The -f flag is the default. 
-H Specifies that the hard limit for the given resource is set. If you have root user authority, you can increase the hard limit. Anyone can decrease it. 
-m Specifies the size of physical memory, in number of K bytes. 
-n Specifies the limit on the number of file descriptors a process may have. 
-s Specifies the stack size, in number of K bytes. 
-S Specifies that the soft limit for the given resource is set. A soft limit can be increased up to the value of the hard limit. If neither the -H nor -S flags are specified, the limit applies to both. 
-t Specifies the number of seconds to be used by each process. 


You can check the current ulimit settings using the ulimit -a command, and at least the following 
three commands should be run, as the user account that will launch Java: 

ulimit -m unlimited

ulimit -d unlimited

ulimit -f unlimited 




=====================================
37. RAM disks:
=====================================


37.1 AIX:
=========

Example:
--------

# mkramdisk SIZE
/dev/rramdiskxx
# mkfs -V jfs /dev/ramdiskxx
# mount -V jfs -o nointegrity /dev/ramdiskxx /whatever_mountpoint


mkramdisk Command:
------------------
Purpose
Creates a RAM disk using a portion of RAM that is accessed through normal reads and writes.

Syntax
mkramdisk [ -u ] size[ M | G ]

Description
The mkramdisk command is shipped as part of bos.rte.filesystems, which allows the user to create a RAM disk. 
Upon successful execution of the mkramdisk command, a new RAM disk is created, a new entry added to /dev, 
the name of the new RAM disk is written to standard output, and the command exits with a value of 0. 
If the creation of the RAM disk fails, the command prints an internalized error message, and the command 
will exit with a nonzero value.

The size can be specified in terms of MB or GB. By default, it is in 512 byte blocks. A suffix of M will be used 
to specify size in megabytes and G to specify size in gigabytes.

The names of the RAM disks are in the form of /dev/rramdiskx where x is the logical RAM disk number (0 through 63).

The mkramdisk command also creates block special device entries (for example, /dev/ramdisk5) although use 
of the block device interface is discouraged because it adds overhead. The device special files in /dev are owned 
by root with a mode of 600. However, the mode, owner, and group ID can be changed using normal system commands.

Up to 64 RAM disks can be created. 

Note:
The size of a RAM disk cannot be changed after it is created.
The mkramdisk command is responsible for generating a major number, loading the ram disk kernel extension, 
configuring the kernel extension, creating a ram disk, and creating the device special files in /dev. 
Once the device special files are created, they can be used just like any other device special files through 
normal open, read, write, and close system calls.

RAM disks can be removed by using the rmramdisk command. RAM disks are also removed when the machine is rebooted.

By default, RAM disk pages are pinned. Use the -u flag to create RAM disk pages that are not pinned.

Flags
-u Specifies that the ram disk that is created will not be pinned. By default, the ram disk will be pinned. 

Parameters

size Indicates the amount of RAM (in 512 byte increments) to use for the new RAM disk. For example, typing: 

# mkramdisk 1

creates a RAM disk that uses 512 bytes of RAM. 

To create a RAM disk that uses approximately 20 MB of RAM, type: 

# mkramdisk 40000 

Exit Status
The following exit values are returned:

0 Successful completion. 
>0 An error occurred. 

Examples:

To create a new ram disk using a default 512-byte block size, and the size is 500 MBs (1048576 * 512), enter: 

# mkramdisk 1048576 
/dev/rramdisk0

The /dev/rramdisk0 ramdisk is created.

To create a new ramdisk with a size of 500 Megabytes, enter: 

# mkramdisk 500M 
/dev/rramdisk0

The /dev/rramdisk0 ramdisk is created. Note that the ramdisk has the same size as example 1 above.

To create a new ram disk with a 2-Gigabyte size, enter: 

# mkramdisk 2G 
/dev/rramdisk0

To set up a RAM disk that is approximately 20 MB in size and create a JFS file system on that RAM disk, 
enter the following: 

# mkramdisk 40000
# ls -l /dev | grep ram
# mkfs -V jfs /dev/ramdiskx
# mkdir /ramdisk0
# mount -V jfs -o nointegrity /dev/ramdiskx /ramdiskx

where x is the logical RAM disk number. 

Note:
If using file system on a RAM disk, the RAM disk must be pinned.


37.2 Linux:
===========

Redhat:

It is very easy to use a ramdisk. First of all, the default installation of RedHat >= 6.0 comes with ramdisk support.
 All you have to do is format a ramdisk and then mount it to a directory. To find out all the ramdisks you 
have available, do a "ls -al /dev/ram*". This gives you the preset ramdisks available to your liking. 
These ramdisks don't actually grab memory until you use them somehow (like formatting them). 
Here is a very simple example of how to use a ramdisk. 

# create a mount point:
mkdir /tmp/ramdisk0
# create a filesystem:
mke2fs /dev/ram0
# mount the ramdisk:
mount /dev/ram0 /tmp/ramdisk0


Those three commands will make a directory for the ramdisk , format the ramdisk (create a filesystem), 
and mount the ramdisk to the directory "/tmp/ramdisk0". Now you can treat that directory as a pretend partition! 
Go ahead and use it like any other directory or as any other partition. 
If the formatting of the ramdisk faild then you might have no support for ramdisk compiled into the Kernel. 
The Kernel configuration option for ramdisk is CONFIG_BLK_DEV_RAM . 
The default size of the ramdisk is 4Mb=4096 blocks. You saw what ramdisk size you got while you were running mke2fs. 
mke2fs /dev/ram0 should have produced a message like this: 

mke2fs 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
Linux ext2 filesystem format
Filesystem label=
1024 inodes, 4096 blocks
204 blocks (4.98%) reserved for the super user
First data block=1
Block size=1024 (log=0)
Fragment size=1024 (log=0)
1 block group
8192 blocks per group, 8192 fragments per group
1024 inodes per group

Running df -k /dev/ram0 tells you how much of that you can really use (The filesystem takes also some space): 

>df -k /dev/ram0
Filesystem  1k-blocks  Used Available Use% Mounted on
/dev/ram0        3963    13      3746   0% /tmp/ramdisk0

What are some catches? Well, when the computer reboots, it gets wiped. Don't put any data there that isn't 
copied somewhere else. If you make changes to that directory, and you need to keep the changes, figure out 
some way to back them up.     

- Changing the size of the ramdisks
To use a ram disk you either need to have ramdisk support compiled into the Kernel or you need to compile 
it as loadable module. The Kernel configuration option is CONFIG_BLK_DEV_RAM . Compiling the ramdisk a loadable module 
has the advantage that you can decide at load time what the size of your ramdisks should be.

Okay, first the hard way. Add this line to your lilo.conf file: 

   ramdisk_size=10000 (or ramdisk=10000 for old kernels) 

and it will make the default ramdisks 10 megs after you type the "lilo" command and reboot the computer. 
Here is an example of my /etc/lilo.conf file. 

boot=/dev/hda
map=/boot/map
install=/boot/boot.b
prompt
timeout=50
image=/boot/vmlinuz
	label=linux
	root=/dev/hda2
	read-only
	ramdisk_size=10000

Actually, I got a little over 9 megs of usable space as the filesystem takes also a little space. 
When you compile ramdisk support as loadable module then you can decide at load time what the size should be. 
This is done either with an option line in the /etc/conf.modules file: 

options rd rd_size=10000

or as a command line parameter to ismod: 

insmod rd rd_size=10000

Here is an example which shows how to use the module: 
Unmount the ramdisk mounted in the previous chapter, umount /tmp/ramdisk0 . 
Unload the module (it was automatically loaded in the previous chapter), rmmod rd 
Load the ramdisk module and set the size to 20Mb, insmod rd rd_size=20000 
create a file system, mke2fs /dev/ram0 
mount the ramdisk, mount /dev/ram0 /tmp/ramdisk0 
  
- Example of how to use a RamDisk for a webserver.
Okay, here is an example of how to use 3 ramdisks for a webserver. Let us say you are 99% confident that your default installation of Apache for RedHat 6.0 won't use more than 9 megs for its cgi-scripts, html, and icons. Here is how to install one. 
First, issue this command to move the real copy of the document root directory of your webserver to a different place. Also, make the directories to mount the ramdisks . 
mv /home/httpd/ /home/httpd_real
mkdir /home/httpd
mkdir /home/httpd/cgi-bin
mkdir /home/httpd/html
mkdir /home/httpd/icons

Then, add these commands to the start procedure in your /etc/rc.d/init.d/httpd.init 
(or where ever the httpd gets started on your system): 

	### Make the ramdisk partitions
/sbin/mkfs -t ext2 /dev/ram0
/sbin/mkfs -t ext2 /dev/ram1
/sbin/mkfs -t ext2 /dev/ram2

	### Mount the ramdisks to their appropriate places

mount /dev/ram0 /home/httpd/cgi-bin
mount /dev/ram1 /home/httpd/icons
mount /dev/ram2 /home/httpd/html

	### Copying real directory to ramdisks (the
  ### data on the ramdisks is lost after a reboot)
tar -C /home/httpd_real -c . | tar -C /home/httpd -x
  
  ### After this you can start the web-server.

  

37.3 Solaris:
=============

Note 1:
-------

Solaris 9 and higher: use the ramdiskadm command:

Quick example:

Example: Creating a 2MB Ramdisk Named mydisk 

# ramdiskadm -a mydisk 2m
/dev/ramdisk/mydisk 

Example: Listing All Ramdisks

# ramdiskadm
Block Device                   Size  Removable
/dev/ramdisk/miniroot     134217728    No
/dev/ramdisk/certfs         1048576    No
/dev/ramdisk/mydisk         2097152    Yes 


-- The ramdiskadm command:


NAME
ramdiskadm- administer ramdisk pseudo device
SYNOPSIS
/usr/sbin/ramdiskadm -a name size [g | m | k | b]
/usr/sbin/ramdiskadm -d name 
/usr/sbin/ramdiskadm 

DESCRIPTION
The ramdiskadm command administers ramdisk(7D), the ramdisk driver. Use ramdiskadm to create a new named 
ramdisk device, delete an existing named ramdisk, or list information about exisiting ramdisks.

Ramdisks created using ramdiskadm are not persistent across reboots.

OPTIONS
The following options are supported:

-a name size 
Create a ramdisk named name of size size and its corresponding block and character device nodes.

name must be composed only of the characters a-z, A-Z, 0-9, _ (underbar), and - (hyphen), but it must not 
begin with a hyphen. It must be no more than 32 characters long. Ramdisk names must be unique.

The size can be a decimal number, or, when prefixed with 0x, a hexadecimal number, and can specify the size 
in bytes (no suffix), 512-byte blocks (suffix b), kilobytes (suffix k), megabytes (suffix m) 
or gigabytes (suffix g). The size of the ramdisk actually created might be larger than that specified, 
depending on the hardware implementation.

If the named ramdisk is successfully created, its block device path is printed on standard out.

-d name 
Delete an existing ramdisk of the name name. This command succeeds only when the named ramdisk is not open. 
The associated memory is freed and the device nodes are removed.

You can delete only ramdisks created using ramdiskadm. It is not possible to delete a ramdisk that was created 
during the boot process.

Without options, ramdiskadm lists any existing ramdisks, their sizes (in decimal), and whether they can be removed 
by ramdiskadm (see the description of the -d option, above).


Note 2:
-------

thread:

In Solaris =< version 8, its a bit of a pain.

This is what i asked:

Is there anyone who could tell me how to make a ram disk in Solaris 8?

I have a Sun Sparc Box running Solaris 8, and I want to use some of
it's memory to mount a new file-system

Thanks in advance,

The solution:

As many mentioned i could use tmpfs, lik this:

mkdir /ramdisk
mount -F tmpfs -o size=500m swap /ramdisk

However this is not a true ramdisk (it really uses VM, not RAM, and the size
is an upper limit, not a reservation) This is what Solaris provides.




======================
38. Software Packages:
======================


38.1 Software Packages on Solaris:
==================================

This section deals about software packages for Solaris. A software package is a collection of files
and directories in a defined format. It describes a software application such as manual pages and
line printer support. Solaris 8 has about 80 packages that total about 900MB.

A Solaris software package is the standard way to deliver bundeld and unbundled software.
Packages are administered by using the package administration commands, and are generally
identified by a SUNWxxx naming convention.

Software packages are grouped into software clusters, which are logical collections of
software packages. Some clusters contain just 1 or 2 packages, while another may contain more
packages.

Installing Software Packages:
-----------------------------

Solaris provides the tools for adding and removing software from a system.
You can use pkgadd command to install packages, and the pkgrm command to remove packages.
There are also GUI tools to install and remove packages.

Package files are delivered in package format and are unusable as they are delivered. The pkgadd command interprets the software package's 
control files, and then uncompresses and installs the product files onto the system's local disk.

Although the pkgadd and pkgrm commands do not log their output to a standard location, they do keep track of the product 
that is installed or removed. The pkgadd and pkgrm commands store information about a package that has been installed 
or removed in a software product database.

By updating this database, the pkgadd and pkgrm commands keep a record of all software products installed on the system.


-- pkgadd:
-- -------

pkgadd [-nv] [-a admin] [-d device] [[-M]-R root_path] [-r response] [-V fs_file] [pkginst...]
pkgadd -s spool [-d device] [pkginst...]


     -a admin
           Define an installation administration file, admin,  to
           be  used  in place of the default administration file.
           The token none overrides the use of  any  admin  file,
           and  thus  forces interaction with the user.  Unless a
           full path name is given, pkgadd  first  looks  in  the
           current working directory for the administration file.
           If the specified administration file  is  not  in  the
           current   working   directory,  pkgadd  looks  in  the
           /var/sadm/install/admin directory for the  administra-
           tion file.

     -d device
           Install or copy a package from device. device can be a
           full  path  name to a directory or the identifiers for
           tape, floppy disk, or  removable  disk  (for  example,
           /var/tmp  or   /floppy/floppy_name ). It can also be a
           device alias (for example, /floppy/floppy0).


pkgadd transfers the contents of a software package from the distribution medium or directory to install 
it onto the system. Used without the -d option, pkgadd looks in the default spool directory for 
the package (var/spool//pkg). Used with the -s option, it writes the package to a spool directory 
instead of installing it.

In general you would pkgadd as follows:

# pkgadd -a admin-file -d device-name pkgid

Or just

# pkgadd -d device-name pkgid


-a admin-file 
 (Optional) Specifies an administration file that the pkgadd command should consult during the installation. 
-d device-name 
 Specifies the absolute path to the software packages. device-name can be the path to a device, a directory, or a spool directory. 
 If you do not specify the path where the package resides, the pkgadd command checks the default spool directory (/var/spool/pkg). 
 If the package is not there, the package installation fails. 
pkgid 
 (Optional) Is the name of one or more packages (separated by spaces) to be installed. 
 If omitted, the pkgadd command installs all available packages.
   

After installing a package, verify the install with 

# pkgchk -v pkgid


Example 1:

following example shows how install the SUNWpl5u package from a mounted Solaris 9 CD. 
The example also shows how to verify that the package files were installed properly. 

# pkgadd -d /cdrom/cdrom0/s0/Solaris_9/Product SUNWpl5u
	.
Installation of <SUNWpl5u> was successful.
# pkgchk -v SUNWpl5u
/usr
/usr/bin
/usr/bin/perl
/usr/perl5
/usr/perl5/5.00503
 

Example 2:
# pkgadd -d /cdrom/cdrom0/s0/Solaris_2.6

Example 3:
# pkgadd -d /tmp/signed_pppd
The following packages are available:
  1  SUNWpppd     Solaris PPP Device Drivers
                  (sparc) 11.10.0,REV=2003.05.08.12.24

Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]: all
Enter keystore password:

Example 4:
# pkgadd -d http://install/signed-video.pkg

## Downloading...
..............25%..............50%..............75%..............100%
## Download Complete

Example 5:
# pkgadd -d . DISsci    The command will create a new directory structure in /opt/DISsci

Example 6:
Spooling the packages to a spool directory

# pkgadd -d /cdrom/sol_8_sparc/s0/Solaris_8/Product -s /var/spool/pkg SUNWaudio


Example 7:

Installing Software Packages From a Remote Package Server
If the packages you want to install are available from a remote system, you can manually mount the directory that contains the packages 
(in package format) and install packages on the local system.

The following example shows how install software packages from a remote system. In this example, assume that the remote system 
named package-server has software packages in the /latest-packages directory. The mount command mounts the packages locally on /mnt, 
and the pkgadd command installs the SUNWpl5u package. 

# mount -F nfs -o ro package-server:/latest-packages /mnt
# pkgadd -d /mnt SUNWpl5u
	.
Installation of <SUNWpl5u> was successful. 



Other package related commands:
-------------------------------

- pkgrm
- pkgchk
- pkginfo
- pkgask
- pkgparam

Displays a package parameter values.
# pkgparam -d /cdrom/cdrom0/s0/Solaris_2.8/Product SUNWvolr SUNW_PKGTYPE
The system responds with the location where the application will be stored.


Using a Response File:
----------------------

A response file contains your answers to specific questions that are asked by an interactive package. An interactive package includes 
a request script that asks you questions prior to package installation, such as whether or not optional pieces of the package should be installed.

If prior to installation, you know that the package you want to install is an interactive package, and you want to store 
your answers to prevent user interaction during future installations of this package, you can use the pkgask command to save your response. 

Once you have stored your responses to the questions asked by the request script, you can use the pkgadd -r command 
to install the package without user interaction.


-- pkginfo
-- -------

# pkginfo
system      SUNWaccr       System Accounting, (Root)
system      SUNWaccu       System Accounting, (Usr)
system      SUNWadmap      System administration applications
system      SUNWadmc       System administration core libraries
.
.
etc..

Example-Displaying Detailed Information About Software Packages


# pkginfo -l SUNWcar

   PKGINST:  SUNWcar
      NAME:  Core Architecture, (Root)
  CATEGORY:  system
      ARCH:  sparc.sun4u
   VERSION:  11.9.0,REV=2001.10.16.17.05
   BASEDIR:  /
    VENDOR:  Sun Microsystems, Inc.
      DESC:  core software for a specific hardware platform group
    PSTAMP:  crash20011016171723
  INSTDATE:  Nov 02 2001 08:53
   HOTLINE:  Please contact your local service provider
    STATUS:  completely installed
     FILES:    111 installed pathnames
                36 shared pathnames
                40 directories
                56 executables
             17626 blocks used (approx) 


# pkginfo -d /export/host1/packages -l SUNWman

For the spool directory, you may use the token spool.


-- pkgrm:
-- ------

Always use the pkgrm command to remove installed packages. Do not use the rm command, which will corrupt 
the system's record-keeping of installed packages. 

Examples:

# pkgrm pkgid ... 

pkgid identifies the name of one or more packages (separated by spaces) to be removed. If omitted, pkgrm removes all available packages.


# pkgrm SUNWctu

The following package is currently installed:
   SUNWctu         Netra ct usr/platform links (64-bit)
                   (sparc.sun4u) 11.9.0,REV=2001.07.24.15.53

Do you want to remove this package? y

## Removing installed package instance <SUNWctu>
## Verifying package dependencies.
## Processing package information.
## Removing pathnames in class <none>

This example shows how to remove a spooled package.

# pkgrm -s /export/pkg SUNWdmfex.u
The following package is currently spooled:
   SUNWdmfex.u           Sun Davicom 10/100Mb Ethernet Driver (64-bit)
                         (sparc.sun4u) 11.9.0,REV=2001.07.24.15.53

Do you want to remove this package? y

Removing spooled package instance <SUNWdmfex.u> 


Some Graphical tools for installing packages:
---------------------------------------------

>>> admintool (Solaris 8,9 Not in Solaris 10)

>>> Solaris Product Registry

The Solaris Product Registry is a GUI tool that enables you to install and uninstall software packages.

To startup the Solaris Product Registry to view, install or uninstall software, use the command
/usr/bin/prodreg

>>> Solaris Management Console (smc) Patch Manager

The Solaris Management Console provides a new Patches Tool for managing patches. You can only use the Patches Tool 
to add patches to a system running the Solaris 9 or later release.





Installing Patches:
-------------------

#patchadd
#patchrm

patchadd [-d] [-u] [-B backout_dir] [-C net_install_image| -R client_root_path| -S service] patch 
patchadd [-d] [-u] [-B backout_dir] [-C net_install_image| -R client_root_path| -S service] -M patch_dir| patch_id...
         | patch_dir patch_list 
patchadd [-C net_install_image| -R client_root_path| -S service] -p 

Examples:

Example 1:
Show the patches on your system:
# showrev -p    shows all patches applied to a system
# patchadd -p   same as above
# pkgparam <pkgid> PATCHLIST  shows all patches applied to the package identified by <pkgid>

Example 2:
# patchadd /var/spool/patch/104945-02
# patchadd -R /export/root/client1  /var/spool/patch/104945-02
# patchadd -M /var/spool/patch 104945-02  104946-02 102345-02
# patchadd -M /var/spool/patch patchlist
# patchadd -M /var/spool/patch -R /export/root/client1 -B /export/backoutrepository 104945-02 104946-02 102345-02


The /var/sadm/install/contents file:
------------------------------------

The /var/sadm/install/contents file is the file which Solaris uses to keep track of all the files 
installed on a system, and their corresponding packages.

Every file installed on a Solaris OS using the pkgadd command has an entry in the database
of installed files /var/sadm/install/contents.
The contents is a textfile that contains one line per installed file.



38.2 Software Packages on AIX:
==============================

Installing software, filesets, packages, lpp:
---------------------------------------------

Similar to Solaris, AIX5L also has a specific terminology related to installable software.
There are 4 basic package concepts in AIX5L: fileset, package, LPP, and bundle.

- Fileset: 
A fileset is the smallest individually installable unit. It's a collection of files that provide a specific
function. For example, the "bos.net.tcp.client" is a fileset in the "bos.net" package. 

- Package:
A package contains a group of filesets with a common function, This is a single installable image,
for example "bos.net".

- LPP:
This is a complete software product collection, including all the packages and filesets required.
LPP's are separately orderable products that will run on the AIX operating system, for example 
BOS, DB2, CICS, ADSM and so on.


-- AIX verifying correct installation:
# lppchk

# lppchk -v     Fileset version consistency check
# lppchk -l     File link verification


P521:/apps $lppchk -l
lppchk:  No link found from /etc/security/mkuser.sys to /usr/lib/security/mkuser.sys.
lppchk:  No link found from /etc/security/mkuser.default to /usr/lib/security/mkuser.default.




-- AIX installing maintenance levels and fixes:
1. download the fix from IBM website 
   http://techsupport.services.ibm.com/server/support?view=pSeries
2. uncompress and untar the software archive
3. type 

smitty update_all


Install a fix with instfix:
---------------------------

P521:/apps $instfix
Usage: instfix [-T [-M platform]] [-s string] [ -k keyword | -f file ]
        [-d device] [-S] [-p | [-i [-c] [-q] [-t type] [-v] [-F]]] [-a]

Function: Installs or queries filesets associated with keywords or fixes.

        -a Display the symptom text (can be combined with -i, -k, or -f).
        -c Colon-separated output for use with -i. Output includes keyword
           name, fileset name, required level, installed level, status, and
           abstract.  Status values are < (down level), = (correct level),
           + (superseded), and ! (not installed).
        -d Input device (required for all but -i and -a).
        -F Returns failure unless all filesets associated with the fix
           are installed.
        -f Input file containing keywords or fixes. Use '-' for standard input.
           The -T option produces a suitable input file format for -f.
        -i Use with -k or -f option to display whether specified fixes or
           keywords are installed.  Installation is not attempted.
           If neither -k nor -f is specified, all known fixes are displayed.
        -k Install filesets for a keyword or fix.
        -M Use with -T option to display information for fixes present
           on the media that have to do with the platform specified.
        -p Use with -k or -f to print filesets associated with keywords.
           Installation is not attempted when -p is used.
        -q Quiet option for use with -i.  If -c is specified, no heading is
           displayed.  Otherwise, no output is displayed.
        -S Suppress multi-volume processing.
        -s Search for and display fixes on media containing a specified string.
        -T Display fix information for complete fixes present on the media.
        -t Use with -i option to limit search to a given type.  Currently
           valid types are 'f' (fix) and 'p' (preventive maintenance).
        -v Verbose option for use with -i.  Gives information about each
           fileset associated with a fix or keyword.
           to the environment provided.



Another option is to use the instfix command. Any fix can have a single fileset or multiple filesets that
comprise that fix. Fix information is organized in the Table of Contents (TOC) on the installation media.
After a fix is installed, fix information is kept on the system in a fix database.

instfix [ -T ] [ -s String ] [ -S ] [ -k Keyword | -f File ] [ -p ] [ -d Device ] [ -i [ -c ] [ -q ] 
        [ -t Type ] [ -v ] [ -F ] ] [ -a ]

Examples:

- If you want to install only a specific fix, use # instfix -k <fileset> -d <device>, for example
# instfix -k IX75893 -d /dev/cd0
# instfix -k IX75893 -d .
# instfix -k IY63533 -d .

- To list fixes that are on a CD-ROM in /dev/cd0, enter
# instfix -T -d /dev/cd0
IX75893

- To determine if for example APAR IX75893 is installed on the system, enter
# instfix -ik IX75893
Not all filesets for IX75893 were found.

You will always be able to determine if an APAR is installed on your system using the 
command instfix -ivk APAR_NUMBER , whereas installed PTFs are not trackable. 

- How to determine if all filesets of a ML are installed?

P521:/apps $instfix -i | grep ML
    All filesets for 5.2.0.0_AIX_ML were found.
    All filesets for 5200-01_AIX_ML were found.
    All filesets for 5200-02_AIX_ML were found.
    All filesets for 5200-03_AIX_ML were found.
    All filesets for 5200-04_AIX_ML were found.
    All filesets for 5200-05_AIX_ML were found.
    All filesets for 5200-06_AIX_ML were found.
    All filesets for 5200-07_AIX_ML were found.
    All filesets for 5200-08_AIX_ML were found.
    All filesets for 5200-09_AIX_ML were found.


The command "instfix -i | grep ML" is essentially the same as "instfix -i -tp".

- To detect incomplete AIX maintaince levels: 
# instfix -i |grep ML
 Not all filesets for 4.3.1.0_AIX_ML were found.
 Not all filesets for 4.3.2.0_AIX_ML were found.
 All filesets for 4.3.1.0_AIX_ML were found.
 Not all filesets for 4.3.2.0_AIX_ML were found.
 Not all filesets for 4.3.3.0_AIX_ML were found.
 Not all filesets for 4330-02_AIX_ML were found.
 All filesets for 4320-02_AIX_ML were found.
 Not all filesets for 4330-03_AIX_ML were found.
..
..

You can also use smitty:

# smitty instfix

                          Update Software by Fix (APAR)

Type or select a value for the entry field.
Press Enter AFTER making all desired changes.

                                                        [Entry Fields]
* INPUT device / directory for software              []                                                                   +



The lslpp command:
------------------


Purpose

Lists installed software products.

Syntax

lslpp { -d | -E | -f | -h | -i | -l | -L | -p } ] [ -a] [ -c] [ -J ] [ -q ] [ -I
] [ -O { [ r ] [ s ] [ u ] } ] [ [ FilesetName ... | FixID ... | all ]

lslpp -w [ -c ] [ -q ] [ -O { [ r ] [ s ] [ u ] } ] [ FileName ... | all ]

lslpp -L -c [ -v]

lslpp -S [A|O]

lslpp -e

Description

The lslpp command displays information about installed filesets or fileset
updates. The FilesetName parameter is the name of a software product. The FixID
(also known as PTF or program temporary fix ID) parameter specifies the
identifier of an update to a formatted fileset.

When only the -l (lowercase L) flag is entered, the lslpp command displays the
latest installed level of the fileset specified for formatted filesets. The base
level fileset is displayed for formatted filesets. When the -a flag is entered
along with the -l flag, the lslpp command displays information about all
installed filesets for the FilesetName specified. The -I (uppercase i) flag
combined with the -l (lowercase L) flag specifies that the output from the lslpp
command should be limited to base level filesets.



        -a Displays additional ("all") information when combined with
           other flags.  (Not valid with -f, only valid with -B when
           combined with -h)
        -B Permits PTF ID input.  (Not valid with -L)
        -c Colon-separated output.
           (Includes all deinstallable levels of software if -Lc)
        -d Dependents (filesets for which this is a requisite).
        -E License Agreements.
        -S Lists Automatically and Optionally installed filesets.
        -e Lists all efixes on the system.
        -f Files that belong to this fileset.
        -h History information.
        -I Limits listings to base level filesets (no updates displayed).
        -i Product Identification information (requested per fileset).
        -J Use list as the output format.  (Valid with -l and -L)
        -L Lists fileset names, latest level, states, and descriptions.
           (Consolidates usr, root and share part information.)
        -l Lists fileset names, latest level, states, and descriptions.
           (Separates usr, root and share part information.)
        -O Data comes from [r] root and/or [s] share and/or [u] usr.
           (Not valid with -L)
        -p Requisites of installed filesets.
        -q Quiet (no column headers).
        -v Lists additional information from vendor database.
      (Valid with -Lc only)
        -w Lists the fileset that owns this file.

         One of the following mutually exclusive flags: d,f,h,i,L,l,p,w,E,S,e
         must be specified.
P521:/apps $


To display information about installed filesets, you can use the lslpp command.

If you need to check whether certain filesets have been installed, use the lslpp command
as in the following example:

# lslpp -h bos.adt.include bos.adt.l1b bos.adt.l1bm \
           bos.net.ncs 1for_ls.compat 1for_ls.base

In the above example, we check whether those filesets have been installed.

lslpp options:

-l: Displays the name, level, state and description of the fileset.
-h: Displays the installation and update history for the fileset.
-p: Displays requisite information for the fileset.
-d: Displays dependent information for the fileset.
-f: Displays the filenames added to the system during installation of the fileset.
-w: Lists the fileset that owns a file or files.

Examples:

- To display the name, level of the bos.adt.include fileset, use
zd57l09 
# lslpp -l bos.adt.include

  Fileset                      Level  State      Description
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  bos.adt.include           5.2.0.95  COMMITTED  Base Application Development
                                                 Include Files


- To display all files in the inventory database which include vmstat, use

# lslpp -w "*vmstat*"
  File                                        Fileset               Type
  ----------------------------------------------------------------------------
  /usr/sbin/lvmstat                           bos.rte.lvm           File
  /usr/share/man/info/EN_US/a_doc_lib/cmds/aixcmds6/vmstat.htm
                               infocenter.man.EN_US.commands        File
  /usr/share/man/info/EN_US/a_doc_lib/cmds/aixcmds3/lvmstat.htm
                               infocenter.man.EN_US.commands        File
  /usr/bin/vmstat                             bos.acct              File
  /usr/bin/vmstat64                           bos.acct              File
  /usr/es/sbin/cluster/OEM/VxVM40/cllsvxvmstat
                                     cluster.es.server.utils        File

The same for trying to find out what contains the make command:

# lslpp -w "*make*"

  /usr/bin/makedev                            bos.txt.tfs           File
  /usr/ccs/bin/make                           bos.adt.base          File
  /usr/bin/make                               bos.adt.base          Symlink
  /usr/bin/makekey                            bos.adt.base          Symlink
  /usr/ccs/bin/makekey                        bos.adt.base          File



- To list the installation state for the most recent level of installed filesets for all of the bos.rte filesets, use
# lslpp -l "bos.rte.*"
# lslpp -l | grep bos.rte

So, "lslpp -l" shows all of the filesets

- To display the names of the files added to the system during installation of the bos.perf.perfstat fileset, use
# lslpp -f "*perf*"

- To check whether some certain filesets have been installed, like in the following example:
# lslpp -h bos.adt.include bos.adt.lib bos.adt.l1bm \
           bos.net.ncs 1for_ls.compat 1for_ls.base

- To check you have the SDD driver on your system:
# lslpp -L devices.sdd.*


- To check the Java filesets on your system:
# lslpp -l | grep Java


/root:>lslpp -l | grep Java
  Java131.rte.bin           1.3.1.16  COMMITTED  Java Runtime Environment
  Java131.rte.lib           1.3.1.16  COMMITTED  Java Runtime Environment
                                                 Java-based build tool.
                                                 JavaBeans(TM) (EJB(TM)).
                                                 Javadocs
                                                 Java(TM) technology-based Web
                                                 Java(TM) technology-based Web
                                                 Javadocs
  idebug.rte.hpj             9.2.5.0  COMMITTED  High-Performance Java Runtime
  idebug.rte.jre             9.2.5.0  COMMITTED  Java Runtime Environment
  idebug.rte.olt.Java        9.2.5.0  COMMITTED  Object Level Trace Java

# lslpp -l | grep Java13_64


# lslpp -l | grep App
                                                 Application Server Dynamic
                                                 WebSphere Application Server.
                                                 for WebSphere Application
                                                 the WebSphere Application
                                                 Application Profile, and
  X11.adt.bitmaps            5.2.0.0  COMMITTED  AIXwindows Application
  X11.adt.ext               5.2.0.30  COMMITTED  AIXwindows Application
  X11.adt.imake              5.2.0.0  COMMITTED  AIXwindows Application
  X11.adt.include           5.2.0.10  COMMITTED  AIXwindows Application
  X11.adt.lib               5.2.0.40  COMMITTED  AIXwindows Application
  X11.adt.motif              5.2.0.0  COMMITTED  AIXwindows Application
  X11.apps.aixterm          5.2.0.30  COMMITTED  AIXwindows aixterm Application
  X11.apps.clients           5.2.0.0  COMMITTED  AIXwindows Client Applications
                                                 Applications
  X11.apps.msmit            5.2.0.50  COMMITTED  AIXwindows msmit Application
                                                 Configuration Applications
                                                 Applications
  X11.apps.xdm              5.2.0.40  COMMITTED  AIXwindows xdm Application
  X11.apps.xterm             5.2.0.0  COMMITTED  AIXwindows xterm Application
                             5.2.0.0  COMMITTED  AIXwindows Client Application
  X11.msg.en_US.apps.config  5.2.0.0  COMMITTED  AIXwindows Config Application
  bos.adt.base              5.2.0.50  COMMITTED  Base Application Development
  bos.adt.debug             5.2.0.50  COMMITTED  Base Application Development
  bos.adt.graphics          5.2.0.40  COMMITTED  Base Application Development
  bos.adt.include           5.2.0.53  COMMITTED  Base Application Development
  bos.adt.lib               5.2.0.50  COMMITTED  Base Application Development
  bos.adt.libm              5.2.0.50  COMMITTED  Base Application Development
  bos.adt.sccs               5.2.0.0  COMMITTED  SCCS Application Development
  bos.adt.syscalls          5.2.0.50  COMMITTED  System Calls Application
  bos.adt.utils             5.2.0.50  COMMITTED  Base Application Development
  bos.net.tcp.adt           5.2.0.40  COMMITTED  TCP/IP Application Toolkit
                                                 Application Runtime
  xlC.adt.include            6.0.0.0  COMMITTED  C Set ++ Application
  bos.adt.data               5.2.0.0  COMMITTED  Base Application Development




Removing a fix:
---------------

On AIX you can use either the 
installp -r    command, or use the
smitty reject  fast path


Smitty fastpaths:
-----------------

-- AIX software maintenance:
# smitty maintain_software

From here you can commit or reject installed software. You can also copy the filesets from the installation media
to a directory on disk. The default directory for doing this is /usr/sys/inst.images  

-- Install new software:
# smitty install_update
# smitty install_latest

-- To commit software:
# smitty install_commit

-- To reject software:
# smitty install_reject

-- To remove installed and commited software:
# smitty install_remove

-- To see what fixes are installed on your system:
# smitty show_apar_stat

-- To install individual fix:
# smitty instfix   or
# smitty update_by_fix

-- To install all filesets:
# smitty update_all

-- To view already installed software:
# smitty list_installed



The AIX installp command:
-------------------------

installp Command
Purpose
Installs available software products in a compatible installation package.

Syntax
To Install with Apply Only or with Apply and Commit
installp [ -a | -ac [ -N ] ] [ -eLogFile ] [ -V Number ] [ -dDevice ] [ -b ] [ -S ] [ -B ] [ -D ] [ -I ] [ -p ] 
         [ -Q ] [ -q ] [ -v ] [ -X ] [ -F | -g ] [ -O { [ r ] [ s ] [ u ] } ] [ -tSaveDirectory ] [ -w ] [ -zBlockSize ] 
         { FilesetName [ Level ]... | -f ListFile | all }

To Commit Applied Updates
installp -c [ -eLogFile ] [ -VNumber ] [ -b ] [ -g ] [ -p ] [ -v ] [ -X ] [ -O { [ r ] [ s ] [ u ] } ] [ -w ] { FilesetName [ Level ]... | -f ListFile | all } 

To Reject Applied Updates
installp -r [ -eLogFile ] [ -VNumber ] [ -b ] [ -g ] [ -p ] [ -v ] [ -X ] [ -O { [ r ] [ s ] [ u ] } ] [ -w ] { FilesetName [ Level ]... | -f ListFile } 

To Deinstall (Remove) Installed Software
installp -u [ -eLogFile ] [ -VNumber ] [ -b ] [ -g ] [ -p ] [ -v ] [ -X ] [ -O { [ r ] [ s ] [ u ] } ] [ -w ] { FilesetName [ Level ]... | -f ListFile } 

To Clean Up a Failed Installation:
installp -C [ -b ] [ -eLogFile ] 

To List All Installable Software on Media
installp { -l | -L } [ -eLogFile ] [ -d Device ] [ -B ] [ -I ] [ -q ] [ -zBlockSize ] [ -O { [ s ] [ u ] } ] 

To List All Customer-Reported Problems Fixed with Software or Display All Supplemental Information
installp { -A|-i } [ -eLogFile ] [ -dDevice ] [ -B ] [ -I ] [ -q ] [ -z BlockSize ] [ -O { [ s ] [ u ] } ] { FilesetName [ Level ]... | -f ListFile | all } 

To List Installed Updates That Are Applied But Not Committed
installp -s [ -eLogFile ] [ -O { [ r ] [ s ] [ u ] } ] [ -w ] { FilesetName [ Level ]... | -fListFile | all }

fileset is the lowest installable base unit. For example, bos.net.tcp.client 4.1.0.0 is a fileset. 
A fileset update is an update with a different fix ID or maintenance level. 
For example, bos.net.tcp.client 4.1.0.2 and bos.net.tcp.client 4.1.1.0 are both fileset updates 
for bos.net.tcp.client 4.1.0.0. 

When a base level (fileset) is installed on the system, it is automatically committed. You can remove a fileset 
regardless of the state (committed, broken, committed with applied updates, committed with committed updates, etc.). 

When a fileset update is applied to the system, the update is installed. The current version of that software, 
at the time of the installation, is saved in a special save directory on the disk so that later you can return 
to that version if desired. Once a new version of a software product has been applied to the system, that version 
becomes the currently active version of the software. 

Updates that have been applied to the system can be either committed or rejected at a later time. 
The installp -s command can be used to get a list of applied updates that can be committed or rejected. 

When updates are committed with the -c flag, the user is making a commitment to that version of the software product, 
and the saved files from all previous versions of the software product are removed from the system, thereby making 
it impossible to return to a previous version of the software product. 
Software can be committed at the time of installation by using the -ac flags. Note that committing already 
applied updates does not change the currently active version of a software product. 
It merely removes saved files for previous versions of the software product. 

Examples:

To install all filesets within the bos.net software package in /usr/sys/inst.images directory in the
applied state, enter

# installp -avX -d/usr/sys/inst.images bos.net

To commit all updates, enter

# installp -cgX all

To list the software that is on your CDROM, enter

# installp -L -d /dev/cd0

A record of the installp output can be found in the /var/adm/sw/installp.summary
# cat /var/adm/sw/installp.summary

Used to cleanup after a failed lpp install/update:
# installp -C 

Commits all applied LPPs or PTFs:                                  
# installp -c -g -X all    

Lists the table of contents for the install/update media and saves it into a file named /tmp/toc.list                  
# installp -q -d/dev/rmt1.1 -l > /tmp/toc.list   

Lists the lpps that have been applied but not yet committed or rejected:                                                
# installp -s    

[P521]root@ol116u106:installp -s
0503-459 installp:  No filesets were found in the Software
        Vital Product Database in the APPLIED state.
                               


The AIX geninstall command:
---------------------------

A generic installer that installs software products of various packaging formats. 
For example, installp, RPM, and ISMP.

With the geninstall command, you can list and install packages from media that contains installation images 
packaged in any of the listed formats. The geninstall and gencopy commands recognize the non-installp 
installation formats and either call the appropriate installers or copy the images, respectively.


Beginning in AIX 5L, you can not only install installp formatted packages, but also RPM and 
Install Shield Mutli-Platform (ISMP) formatted packages. Use the Web-based System Manager, 
SMIT, or the geninstall command to install and uninstall these types of packages. 
The geninstall command is designed to detect the format type of a specified package and run the 
appropriate install command.

                                       
Syntax
geninstall -d Media [ -I installpFlags ] [ -E | -T ] [ -t ResponseFileLocation ] 
          [-e LogFile] [ -p ] [ -F ] [ -Y ] [ -Z ] [ -D ] { -f File | Install_List ] | all}

OR

geninstall -u [-e LogFile] [ -E | -T ] [ -t ResponseFileLocation ] [ -D ] {-f File | Uninstall_List...}

OR

geninstall -L -d Media [-e LogFile] [ -D ]

Description
Accepts all current installp flags and passes them on to installp. Some flags (for example, -L) are overloaded 
to mean list all products on the media. Flags that don't make sense for ISMP packaged products are ignored. 
This allows programs (like NIM) to continue to always send in installp flags to geninstall, but only the flags 
that make sense are used.

The geninstall command provides an easy way to see what modifications have been made to the configuration files 
listed in /etc/check_config.files. When these files have been changed during a geninstall installation or update 
operation, the differences between the old and new files will be recorded in the /var/adm/ras/config.diff. 
If /etc/check_config.files requests that the old file be saved, the old file can be found in the /var/adm/config 
directory.

The /etc/check_config.files file can be edited and can be used to specify whether old configuration files 
that have been changed should be saved (indicated by s) or deleted (indicated by d), and has the following format: 

d /etc/inittab

A summary of the geninstall command's install activity is kept at /var/adm/sw/geninstall.summary. 
This file contains colon-separated lists of filesets installed by installp and components installed 
by ISMP. This is used mainly to provide summary information for silent installs.

Note:
Refer to the README.ISMP file in the /usr/lpp/bos directory to learn more about ISMP-packaged installations 
and using response files.
 
Examples:

- To install all the products on a CD media that is in drive cd0, type:

# geninstall -d /dev/cd0 all

If ISMP images are present on the media, a graphical interface is presented. Any installp or RPM images 
are installed without prompting, unless the installp images are spread out over multiple CDs.


- If you using the geninstall command to install RPM or ISMP packages, use the prefix type to designate 
to the geninstall command the type of package you are installing. In AIX 5L, the package prefix types 
are the following:

I: installp format 
R: RPM format 
J: ISMP format 

For example, to install the cdrecord RPM package and the bos.games installp package, type the following:

# geninstall -d/dev/cd0 R:cdrecord I:bos.games

The geninstall command detects that the cdrecord package is an RPM package type and runs the rpm command 
to install cdrecord. The geninstall command then detects that bos.games is an installp package type and runs 
the installp command to install bos.games. The process for uninstallation is similar to the installation process.



    
Fixdist:
--------

There is a tool named fixdist you can use to download fixes from IBM.


Maintenance levels:
===================

Notes:

Note 1:
-------

Current versions of AIX5L are 5200-04, 05, 06, 07  

04: V5.2 with the 5200-04 Recommended Maintenance Package APAR IY56722
plus APAR IY60347 £ 

05: V5.2 with the 5200-05 Recommended Maintenance Package   


Note 2: Go from 5200-00 to 5200-05:
-----------------------------------

Use this package to update to 5200-05 (ML 05) an AIX 5.2.0 system whose current ML is 5200-00 (i.e. base level) or higher.
(Nota: ML 05 notably brings the fileset bos.mp.5.2.0.54) 

AIX 5200-05 maintenance package:

AIX 5200-05 maintenance package 
Recommended maintenance for AIX 5.2.0 

This package, 5200-05, updates AIX 5.2 from base level (no maintenance level) to maintenance level  05 (5200-05). 
This package is a recommended maintenance package for AIX 5.2. IBM recommends that customers install the latest 
available maintenace package for their AIX release.  
 
To determine if AIX 5200-05 is already installed on your system, run the following command:
oslevel -r  
  
General description 

This package contains code corrections for the AIX operating system and many related subsystems. 
Unless otherwise stated, this package is released 
for all languages. For additional information, refer to the Package information   
 
Download and install instructions 
 
Package                   Released Size (Bytes) Checksum 
 520005.tar.gz (See Note) 01/20/05 750,314,420  2116147779 

Additional space needed to extract the filesets 1,034,141,696 
  
Note: IBM recommends that you create a separate file system for /usr/sys/inst.images to prevent the expansion 
of the /usr file system. 
More information

Click on the package name above. 
Put the package (a tar.gz file) in /usr/sys/inst.images 
Extract the filesets from the package. 
cd /usr/sys/inst.images 
gzip -d -c 520005.tar.gz | tar -xvf - 
Back up your system. 
Install the package by creating a table of contents for install to use. 
Then update the install subsystem itself. Run SMIT to complete the installation. 

# inutoc /usr/sys/inst.images 
# installp -acgXd /usr/sys/inst.images bos.rte.install 
# smit update_all 
Reboot your system. This maintenance package replaces critical operating system code. 
 
  
Installation Tips

 * You will need to be logged in as 'root' to perform the
   installation of this package.

 * Creating a system backup is recommended before starting the
   installation procedure. Refer to the mksysb command in the
   AIX 5.2 Commands Reference manual for additional information.

 * The latest AIX 5.2 installation hints and tips are available
   from the eServer Subscription Services web site at:

   https://techsupport.services.ibm.com/server/pseries.subscriptionSvcs


   These tips contain important information that should be
   reviewed before installing this update.


Installation

 To install selected updates from this package, use the command:

   smit update_by_fix

 To install all updates from this package that apply to installed
 filesets on your system, use the command:

   smit update_all

 It is highly recommended that you apply all updates from this
 package.

 After successful installation, a system reboot is required for
 this update to take effect.
 

Note 2: Go from 5200-04 to 5200-05:
-----------------------------------

AIX 5200(04)-05 maintenance package 
Recommended maintenance for AIX 5.2.0 

This package, 5200(04)-05, updates AIX 5.2 from maintenance level 04 (5200-04) to maintenance level 05 (5200-05). 
This package is a recommended maintenance package for AIX 5.2. IBM recommends that customers install the latest available 
maintenace package for their AIX release.  
 
To determine if AIX 5200-05 is already installed on your system, run the following command:
oslevel -r  
  
General description 
 
This package contains code corrections for the AIX operating system and many related subsystems. Unless otherwise stated, 
this package is released for all languages. For additional information, refer to the Package information  
  
Download and install instructions 
 
Package                   Released Size (Bytes) Checksum 
 520405.tar.gz (See Note) 01/20/05 637,751,943  3712904912 
Additional space needed to extract the filesets 856,494,080 
 
 
Note: IBM recommends that you create a separate file system for /usr/sys/inst.images to prevent 
the expansion of the /usr file system. 
 More information

Click on the package name above. 
Put the package (a tar.gz file) in /usr/sys/inst.images 
Extract the filesets from the package. 
cd /usr/sys/inst.images 
gzip -d -c 520405.tar.gz | tar -xvf - 
Back up your system. 
Install the package by creating a table of contents for install to use. Then update the install subsystem itself. 
Run SMIT to complete the installation. 

# inutoc /usr/sys/inst.images 
# installp -acgXd /usr/sys/inst.images bos.rte.install 
# smit update_all 

Reboot your system. This maintenance package replaces critical operating system code. 


Note 3: Go from 5200-05 to 5200-07:
-----------------------------------

Always run the inutoc command to ensure the installation subsystem will recognize the new fix packages 
you download. This command creates a new .toc file for the fix package. Run the inutoc command in 
the same directory where you downloaded the package filesets. For example, if you downloaded the 
filesets to /usr/sys/inst.images, run the following command: 

# inutoc /usr/sys/inst.images 

- For selected updates

To install selected updates from this package, use the following command: 

# smit update_by_fix 


- For all updates

To install all updates from this package that apply to the installed filesets on your system, 
use the following command: 

# smit update_all 

It is highly recommended that you apply all updates from this package. 

Reboot the system. A reboot is required for this update to take effect. 


--

First do the bos.rte.install

# installp -acgYqXd /software/ML07 bos.rte.install

# inutoc /software/ML07

# smitty update_all




Note 4: About the /usr/sys/inst.images fs:
------------------------------------------

Create a LV

# crfs -v jfs -a bf=true -dXXX##instlv -m/usr/sys/inst.images -Ayes -prw -tno -a nbpi=4096 -a ag=64 

# mount /usr/sys/inst.images 



Note 5: About the inutoc command:
---------------------------------

inutoc Command

Purpose
Creates a .toc file for directories that have backup format file install images. 
This command is used by the installp command and the install scripts.

Syntax
inutoc [ Directory ]

Description
The inutoc command creates the .toc file in Directory. If a .toc file already exists, it is recreated with new information. 
The default installation image Directory is /usr/sys/inst.images. The inutoc command adds table of contents entries 
in the .toc file for every installation image in Directory.

The installp command and the bffcreate command call this command automatically upon the creation or use 
of an installation image in a directory without a .toc file.

Examples
To create the .toc file for the /usr/sys/inst.images directory, enter: 
# inutoc

To create a .toc file for the /tmp/images directory, enter: 
# inutoc /tmp/images


Note 6: About the bffcreate command:
------------------------------------

bffcreate Command
Purpose
Creates installation image files in backup format. 

Syntax
bffcreate [ -q ] [ -S ] [ -U ] [ -v ] [ -X ] [ -d Device ] [ -t SaveDir ] [ -w Directory ] 
          [ -M Platform ] { [ -l | -L ] | -c [ -s LogFile ] | Package [Level ] ... | -f ListFile | all }

Description
The bffcreate command creates an installation image file in backup file format (bff) to support 
software installation operations.

The bffcreate command creates an installation image file from an installation image file 
on the specified installation media. Also, it automatically creates an installation image file from 
hyptertext images (such as those on the operating system documentation CD-ROMs). The installp command 
can use the newly created installation file to install software onto the system. The file is created 
in backup format and saved to the directory specified by SaveDir. The .toc file in the directory 
specified by the SaveDir parameter is updated to include an entry for the image file.

The bffcreate command determines the bff name according to this information:

Neutral Packages         package.v.r.m.f.platform.installtype 
POWER-based platform     Packages package.v.r.m.f.installtype 

Image Type                                             Target bff Name 
Installation image for the POWER-based platform        package.v.r.m.f.I 
Installation image for Neutral                         package.v.r.m.f.N.I 
3.1 update for the POWER-based platform                package.v.r.m.f.service# 
3.2 update for the POWER-based platform                package.v.r.m.f.ptf 
4.X** or later updates for the POWER-based platform    package.part.v.r.m.f.U 
Update image for Neutral                               package.v.r.m.f.N.U 
** 4.X or later updates contain one package only. In addition, AIX Version 4 and later updates do not contain ptf IDs.
 

package = the name of the software package as described by the PackageName parameter

v.r.m.f = version.release.modification.fix, the level associated with the software package. 
The PackageName is usually not the same as the fileset name.

ptf = program temporary fix ID (also known as FixID)

The installation image file name has the form Package.Level.I. The Package is the name of the software package, 
as described for the Package Name parameter. Level has the format of v.r.m.f, where v = version, r = release, 
m = modification, f = fix. The I extension means that the image is an installation image rather than an update image.

Update image files containing an AIX 3.1 formatted update have a service number extension following the level. 
The Servicenum parameter can be up to 4 digits in length. One example is xlccmp.3.1.5.0.1234.

Update image files containing an AIX 3.2 formatted update have a ptf extension following the level. 
One example is bosnet.3.2.0.0.U412345.

AIX Version 4 and later update image file names begin with the fileset name, not the PackageName. 
They also have U extensions to indicate that they are indeed update image files, not installation images. 
One example of an update image file is bos.rte.install.4.3.2.0.U.

The all keyword indicates that installation image files are created for every installable software package on the device.

You can extract a single update image with the AIX Version 4 and later bffcreate command. 
Then you must specify the fileset name and the v.r.m.f. parameter. As in example 3 in the Examples section, 
the PackageName parameter must be the entire fileset name, bos.net.tcp.client, not just bos.net.

Attention: Be careful when selecting the target directory for the extracted images, especially if 
that directory already contains installable images. If a fileset at a particular level exists as both 
an installation image and as an update image in the same directory, unexpected installation results can occur. 
In cases like this, installp selects the image it finds first in the table of contents (.toc) file. 
The image it selects may not be the one you intended and unexpected requisite failures can result. 
As a rule of thumb, you should extract maintenance levels to clean directories.


Examples
To create an installation image file from the bos.net software package on the tape in the /dev/rmt0 tape drive 
and use /var/tmp as the working directory, type: 
# bffcreate  -d /dev/rmt0.1 -w /var/tmp bos.net

To create an installation image file from the package software package on the diskette in the /dev/rfd0 
diskette drive and print the name of the installation image file without being prompted, type: 
# bffcreate  -q  -v package

To create a single update image file from the bos.net.tcp.client software package on the CD in /dev/cd0, type: 
# bffcreate  -d /dev/cd0 bos.net.tcp.client 4.2.2.1

To list the packages on the CD in /dev/cd0, type: 
# bffcreate  -l -d /dev/cd0

To create installation and/or update images from a CD in /dev/cd0 by specifying a list of PackageNames 
and Levels in a ListFile called my MyListFile, type: 
# bffcreate  -d /dev/cd0 -f MyListFile

To create installation or update images of all software packages on the CD-ROM media for the current platform, type: 
# bffcreate -d /dev/cd0 all

To list fileset information for the bos.games software package from a particular device, type: 
# bffcreate -d /usr/sys/inst.images/bos.games -l

To list all the Neutral software packages on the CD-ROM media, type: 
# bffcreate -d /dev/cd0 -MN -l



38.3 Software Packages on Linux:
================================


38.3.1 RPM packages on Linux (1):
---------------------------------

Note 1:
-------

First we show a few simple examples:

- Examples getting software info from your system:

# rpm -q kernel
kernel-2.4.7-10
 
# rpm -q glibc
glibc-2.2.4-19.3
 
# rpm -q gcc
gcc-2.96-98

Show everything:

# rpm -qa

- Examples installing rpm packages:

# rpm -Uvh libpng-1.2.2-22.i386.rpm

# rpm -Uvh gnome-libs-1.4.1.2.90-40.i386.rpm

# rpm -Uvh oracleasm-support-2.0.0-1.i386.rpm \
    oracleasm-lib-2.0.0-1.i386.rpm \
    oracleasm-2.6.9-5.0.5-ELsmp-2.0.0-1.i686.rpm

# rpm -Uvh /mnt/cdrom/RedHat/RPMS/tripwire*.rpm

Note: 
the U switch really means starting an Upgrade, but if nothing is there, an installation will take place.


Note 2:
-------

What is RPM?

RPM is the RPM Package Manager. It is an open packaging system available for anyone to use. 
It allows users to take source code for new software and package it into source and binary form 
such that binaries can be easily installed and tracked and source can be rebuilt easily. 
It also maintains a database of all packages and their files that can be used for verifying packages 
and querying for information about files and/or packages. 

Red Hat, Inc. encourages other distribution vendors to take the time to look at RPM and use it 
for their own distributions. RPM is quite flexible and easy to use, though it provides the base 
for a very extensive system. It is also completely open and available, though we would appreciate 
bug reports and fixes. Permission is granted to use and distribute RPM royalty free under the GPL. 

More complete documentation is available on RPM in the book by Ed Bailey, Maximum RPM. That book is 
available for download or purchase at www.redhat.com. 

RPM is a core component of many Linux distributions, such as Red Hat Enterprise Linux, the Fedora Project, 
SUSE Linux Enterprise, openSUSE, CentOS, Mandriva Linux, and many others. 
It is also used on many other operating systems as well, and the RPM format is part of the Linux Standard Base. 


Acquiring RPM
The best way to get RPM is to install Red Hat Linux. If you don't want to do that, you can still get 
and use RPM. It can be acquired from ftp.redhat.com. 

RPM Requirements
RPM itself should build on basically any Unix-like system. It has been built and used on Tru64 Unix, 
AIX, Solaris, SunOS, and basically all flavors of Linux. 

To build RPMs from source, you also need everything normally required to build a package, like gcc, make, etc. 


In its simplest form, RPM can be used to install packages: 

# rpm -i foobar-1.0-1.i386.rpm
    
The next simplest command is to uninstall a package: 

# rpm -e foobar
    
One of the more complex but highly useful commands allows you to install packages via FTP. 
If you are connected to the net and want to install a new package, all you need to do is specify 
the file with a valid URL, like so: 

# rpm -i ftp://ftp.redhat.com/pub/redhat/rh-2.0-beta/RPMS/foobar-1.0-1.i386.rpm
 

Please note, that RPM will now query and/or install via FTP. 

While these are simple commands, rpm can be used in a multitude of ways. To see which options are available 
in your version of RPM, type: 

# rpm --help

You can find more details on what those options do in the RPM man page, found by typing: 

# man rpm

RPM is a very useful tool and, as you can see, has several options. The best way to make sense of them 
is to look at some examples. I covered simple install/uninstall above, so here are some more examples: 

Let's say you delete some files by accident, but you aren't sure what you deleted. If you want to verify 
your entire system and see what might be missing, you would do: 

# rpm -Va 

Let's say you run across a file that you don't recognize. To find out which package owns it, you would do: 

# rpm -qf /usr/X11R6/bin/xjewel
	
The output would be sometime like: 

xjewel-1.6-1
	
You find a new koules RPM, but you don't know what it is. To find out some information on it, do: 

# rpm -qpi koules-1.2-2.i386.rpm

The output would be: 

Name        : koules                      Distribution: Red Hat Linux Colgate
Version     : 1.2                               Vendor: Red Hat Software
Release     : 2                             Build Date: Mon Sep 02 11:59:12 1996
Install date: (none)                        Build Host: porky.redhat.com
Group       : Games                         Source RPM: koules-1.2-2.src.rpm
Size        : 614939
Summary     : SVGAlib action game with multiplayer, network, and sound support
Description :

This arcade-style game is novel in conception and excellent in execution.
No shooting, no blood, no guts, no gore.  The play is simple, but you
still must develop skill to play.  This version uses SVGAlib to
run on a graphics console.
	
Now you want to see what files the koules RPM installs. You would do: 

# rpm -qpl koules-1.2-2.i386.rpm

The output is: 

/usr/doc/koules
/usr/doc/koules/ANNOUNCE
/usr/doc/koules/BUGS
/usr/doc/koules/COMPILE.OS2
/usr/doc/koules/COPYING
/usr/doc/koules/Card
/usr/doc/koules/ChangeLog
/usr/doc/koules/INSTALLATION
/usr/doc/koules/Icon.xpm
/usr/doc/koules/Icon2.xpm
/usr/doc/koules/Koules.FAQ
/usr/doc/koules/Koules.xpm
/usr/doc/koules/README
/usr/doc/koules/TODO
/usr/games/koules
/usr/games/koules.svga
/usr/games/koules.tcl
/usr/man/man6/koules.svga.6
	
 
SYNOPSIS
QUERYING AND VERIFYING PACKAGES:

rpm {-q|--query} [select-options] [query-options] 
rpm {-V|--verify} [select-options] [verify-options] 
rpm --import PUBKEY ... 
rpm {-K|--checksig} [--nosignature] [--nodigest] 
PACKAGE_FILE ... 


INSTALLING, UPGRADING, AND REMOVING PACKAGES:
rpm {-i|--install} [install-options] PACKAGE_FILE ... 
rpm {-U|--upgrade} [install-options] PACKAGE_FILE ... 
rpm {-F|--freshen} [install-options] PACKAGE_FILE ... 
rpm {-e|--erase} [--allmatches] [--nodeps] [--noscripts] 
[--notriggers] [--repackage] [--test] PACKAGE_NAME ... 


MISCELLANEOUS:
rpm {--initdb|--rebuilddb} 
rpm {--addsign|--resign} PACKAGE_FILE ... 
rpm {--querytags|--showrc} 
rpm {--setperms|--setugids} PACKAGE_NAME ... 



Note 3:
-------

NAME
rpm - RPM Package Manager 
SYNOPSIS
QUERYING AND VERIFYING PACKAGES:


rpm {-q|--query} [select-options] [query-options] 
rpm {-V|--verify} [select-options] [verify-options] 
rpm --import PUBKEY ... 
rpm {-K|--checksig} [--nosignature] [--nodigest] 
PACKAGE_FILE ... 


INSTALLING, UPGRADING, AND REMOVING PACKAGES:
rpm {-i|--install} [install-options] PACKAGE_FILE ... 
rpm {-U|--upgrade} [install-options] PACKAGE_FILE ... 
rpm {-F|--freshen} [install-options] PACKAGE_FILE ... 
rpm {-e|--erase} [--allmatches] [--nodeps] [--noscripts] 
[--notriggers] [--repackage] [--test] PACKAGE_NAME ... 


MISCELLANEOUS:
rpm {--initdb|--rebuilddb} 
rpm {--addsign|--resign} PACKAGE_FILE ... 
rpm {--querytags|--showrc} 
rpm {--setperms|--setugids} PACKAGE_NAME ... 

select-options

[PACKAGE_NAME] [-a,--all] [-f,--file FILE] 
[-g,--group GROUP] {-p,--package PACKAGE_FILE] 
[--fileid MD5] [--hdrid SHA1] [--pkgid MD5] [--tid TID] 
[--querybynumber HDRNUM] [--triggeredby PACKAGE_NAME] 
[--whatprovides CAPABILITY] [--whatrequires CAPABILITY] 


query-options

[--changelog] [-c,--configfiles] [-d,--docfiles] [--dump] 
[--filesbypkg] [-i,--info] [--last] [-l,--list] 
[--provides] [--qf,--queryformat QUERYFMT] 
[-R,--requires] [--scripts] [-s,--state] 
[--triggers,--triggerscripts] 


verify-options

[--nodeps] [--nofiles] [--noscripts] 
[--nodigest] [--nosignature] 
[--nolinkto] [--nomd5] [--nosize] [--nouser] 
[--nogroup] [--nomtime] [--nomode] [--nordev] 


install-options

[--aid] [--allfiles] [--badreloc] [--excludepath OLDPATH] 
[--excludedocs] [--force] [-h,--hash] 
[--ignoresize] [--ignorearch] [--ignoreos] 
[--includedocs] [--justdb] [--nodeps] 
[--nodigest] [--nosignature] [--nosuggest] 
[--noorder] [--noscripts] [--notriggers] 
[--oldpackage] [--percent] [--prefix NEWPATH] 
[--relocate OLDPATH=NEWPATH] 
[--repackage] [--replacefiles] [--replacepkgs] 
[--test] 


DESCRIPTION
rpm is a powerful Package Manager, which can be used to build, install, query, verify, update, and erase 
individual software packages. A package consists of an archive of files and meta-data used to install 
and erase the archive files. The meta-data includes helper scripts, file attributes, and descriptive 
information about the package. Packages come in two varieties: binary packages, used to encapsulate 
software to be installed, and source packages, containing the source code and recipe necessary 
to produce binary packages. 

One of the following basic modes must be selected: Query, Verify, Signature Check, Install/Upgrade/Freshen, 
Uninstall, Initialize Database, Rebuild Database, Resign, Add Signature, Set Owners/Groups, Show Querytags, 
and Show Configuration. 

GENERAL OPTIONS
These options can be used in all the different modes. 

-?, --help
Print a longer usage message then normal. 
--version
Print a single line containing the version number of rpm being used. 
--quiet
Print as little as possible - normally only error messages will be displayed. 
-v
Print verbose information - normally routine progress messages will be displayed. 
-vv
Print lots of ugly debugging information. 
--rcfile FILELIST
Each of the files in the colon separated FILELIST is read sequentially by rpm for configuration information. 
Only the first file in the list must exist, and tildes will be expanded to the value of $HOME. 
The default FILELIST is /usr/lib/rpm/rpmrc:/usr/lib/rpm/redhat/rpmrc:~/.rpmrc. 
--pipe CMD
Pipes the output of rpm to the command CMD. 
--dbpath DIRECTORY
Use the database in DIRECTORY rathen than the default path /var/lib/rpm 
--root DIRECTORY
Use the file system tree rooted at DIRECTORY for all operations. Note that this means the database within 
DIRECTORY will be used for dependency checks and any scriptlet(s) (e.g. %post if installing, or %prep if building, 
a package) will be run after a chroot(2) to DIRECTORY. 

INSTALL AND UPGRADE OPTIONS
The general form of an rpm install command is 


rpm {-i|--install} [install-options] PACKAGE_FILE ... 


This installs a new package. 

The general form of an rpm upgrade command is 

rpm {-U|--upgrade} [install-options] PACKAGE_FILE ... 

This upgrades or installs the package currently installed to a newer version. This is the same as install, 
except all other version(s) of the package are removed after the new package is installed. 


rpm {-F|--freshen} [install-options] PACKAGE_FILE ... 


This will upgrade packages, but only if an earlier version currently exists. The PACKAGE_FILE may be specified 
as an ftp or http URL, in which case the package will be downloaded before being installed. See FTP/HTTP OPTIONS 
for information on rpm's internal ftp and http client support. 


--aid
Add suggested packages to the transaction set when needed. 
--allfiles
Installs or upgrades all the missingok files in the package, regardless if they exist. 
--badreloc
Used with --relocate, permit relocations on all file paths, not just those OLDPATH's included in the binary package relocation hint(s). 
--excludepath OLDPATH
Don't install files whose name begins with OLDPATH. 
--excludedocs
Don't install any files which are marked as documentation (which includes man pages and texinfo documents). 
--force
Same as using --replacepkgs, --replacefiles, and --oldpackage. 
-h, --hash
Print 50 hash marks as the package archive is unpacked. Use with -v|--verbose for a nicer display. 
--ignoresize
Don't check mount file systems for sufficient disk space before installing this package. 
--ignorearch
Allow installation or upgrading even if the architectures of the binary package and host don't match. 
--ignoreos
Allow installation or upgrading even if the operating systems of the binary package and host don't match. 
--includedocs
Install documentation files. This is the default behavior. 
--justdb
Update only the database, not the filesystem. 
--nodigest
Don't verify package or header digests when reading. 
--nosignature
Don't verify package or header signatures when reading. 
--nodeps
Don't do a dependency check before installing or upgrading a package. 
--nosuggest
Don't suggest package(s) that provide a missing dependency. 
--noorder
Don't reorder the packages for an install. The list of packages would normally be reordered to satisfy dependancies. 
--noscripts
--nopre
--nopost
--nopreun
--nopostun
Don't execute the scriptlet of the same name. The --noscripts option is equivalent to 
--nopre --nopost --nopreun --nopostun 

and turns off the execution of the corresponding %pre, %post, %preun, and %postun scriptlet(s). 

--notriggers
--notriggerin
--notriggerun
--notriggerpostun
Don't execute any trigger scriptlet of the named type. The --notriggers option is equivalent to 
--notriggerin --notriggerun --notriggerpostun 

and turns off execution of the corresponding %triggerin, %triggerun, and %triggerpostun scriptlet(s). 

--oldpackage
Allow an upgrade to replace a newer package with an older one. 
--percent
Print percentages as files are unpacked from the package archive. This is intended to make rpm easy to run from other tools. 
--prefix NEWPATH
For relocateable binary packages, translate all file paths that start with the installation prefix in the package relocation hint(s) to NEWPATH. 
--relocate OLDPATH=NEWPATH
For relocatable binary packages, translate all file paths that start with OLDPATH in the package relocation hint(s) to NEWPATH. This option can be used repeatedly if several OLDPATH's in the package are to be relocated. 
--repackage
Re-package the files before erasing. The previously installed package will be named according to the macro %_repackage_name_fmt and will be created in the directory named by the macro %_repackage_dir (default value is /var/tmp). 
--replacefiles
Install the packages even if they replace files from other, already installed, packages. 
--replacepkgs
Install the packages even if some of them are already installed on this system. 
--test
Do not install the package, simply check for and report potential conflicts. 
ERASE OPTIONS
The general form of an rpm erase command is 


rpm {-e|--erase} [--allmatches] [--nodeps] [--noscripts] [--notriggers] [--repackage] [--test] PACKAGE_NAME ... 


The following options may also be used: 

--allmatches
Remove all versions of the package which match PACKAGE_NAME. Normally an error is issued if PACKAGE_NAME matches multiple packages. 
--nodeps
Don't check dependencies before uninstalling the packages. 
--noscripts
--nopreun
--nopostun
Don't execute the scriptlet of the same name. The --noscripts option during package erase is equivalent to 
--nopreun --nopostun 

and turns off the execution of the corresponding %preun, and %postun scriptlet(s). 

--notriggers
--notriggerun
--notriggerpostun
Don't execute any trigger scriptlet of the named type. The --notriggers option is equivalent to 
--notriggerun --notriggerpostun 

and turns off execution of the corresponding %triggerun, and %triggerpostun scriptlet(s). 

--repackage
Re-package the files before erasing. The previously installed package will be named according to the macro %_repackage_name_fmt and will be created in the directory named by the macro %_repackage_dir (default value is /var/tmp). 
--test
Don't really uninstall anything, just go through the motions. Useful in conjunction with the -vv option for debugging. 
QUERY OPTIONS
The general form of an rpm query command is 


rpm {-q|--query} [select-options] [query-options] 


You may specify the format that package information should be printed in. To do this, you use the 


--qf|--queryformat QUERYFMT 

option, followed by the QUERYFMT format string. Query formats are modifed versions of the standard printf(3) formatting. The format is made up of static strings (which may include standard C character escapes for newlines, tabs, and other special characters) and printf(3) type formatters. As rpm already knows the type to print, the type specifier must be omitted however, and replaced by the name of the header tag to be printed, enclosed by {} characters. Tag names are case insesitive, and the leading RPMTAG_ portion of the tag name may be omitted as well. 

Alternate output formats may be requested by following the tag with :typetag. Currently, the following types are supported: 

:armor

Wrap a public key in ASCII armor. 
:base64
Encode binary data using base64. 
:date
Use strftime(3) "%c" format. 
:day
Use strftime(3) "%a %b %d %Y" format. 
:depflags
Format dependency flags. 
:fflags
Format file flags. 
:hex
Format in hexadecimal. 
:octal
Format in octal. 
:perms
Format file permissions. 
:shescape
Escape single quotes for use in a script. 
:triggertype
Display trigger suffix. 
For example, to print only the names of the packages queried, you could use %{NAME} as the format string. To print the packages name and distribution information in two columns, you could use %-30{NAME}%{DISTRIBUTION}. rpm will print a list of all of the tags it knows about when it is invoked with the --querytags argument. 

There are two subsets of options for querying: package selection, and information selection. 

PACKAGE SELECTION OPTIONS:

PACKAGE_NAME
Query installed package named PACKAGE_NAME. 
-a, --all
Query all installed packages. 
-f, --file FILE
Query package owning FILE. 
--fileid MD5
Query package that contains a given file identifier, i.e. the MD5 digest of the file contents. 
-g, --group GROUP
Query packages with the group of GROUP. 
--hdrid SHA1
Query package that contains a given header identifier, i.e. the SHA1 digest of the immutable header region. 
-p, --package PACKAGE_FILE
Query an (uninstalled) package PACKAGE_FILE. The PACKAGE_FILE may be specified as an ftp or http style URL, in which case the package header will be downloaded and queried. See FTP/HTTP OPTIONS for information on rpm's internal ftp and http client support. The PACKAGE_FILE argument(s), if not a binary package, will be interpreted as an ASCII package manifest. Comments are permitted, starting with a '#', and each line of a package manifest file may include white space seperated glob expressions, including URL's with remote glob expressions, that will be expanded to paths that are substituted in place of the package manifest as additional PACKAGE_FILE arguments to the query. 
--pkgid MD5
Query package that contains a given package identifier, i.e. the MD5 digest of the combined header and payload contents. 
--querybynumber HDRNUM
Query the HDRNUMth database entry directly; this is useful only for debugging. 
--specfile SPECFILE
Parse and query SPECFILE as if it were a package. Although not all the information (e.g. file lists) is available, this type of query permits rpm to be used to extract information from spec files without having to write a specfile parser. 
--tid TID
Query package(s) that have a given TID transaction identifier. A unix time stamp is currently used as a transaction identifier. All package(s) installed or erased within a single transaction have a common identifier. 
--triggeredby PACKAGE_NAME
Query packages that are triggered by package(s) PACKAGE_NAME. 
--whatprovides CAPABILITY
Query all packages that provide the CAPABILITY capability. 
--whatrequires CAPABILITY
Query all packages that requires CAPABILITY for proper functioning. 
PACKAGE QUERY OPTIONS:

--changelog
Display change information for the package. 
-c, --configfiles
List only configuration files (implies -l). 
-d, --docfiles
List only documentation files (implies -l). 
--dump
Dump file information as follows: 


path size mtime md5sum mode owner group isconfig isdoc rdev symlink
        

This option must be used with at least one of -l, -c, -d. 

--filesbypkg
List all the files in each selected package. 
-i, --info
Display package information, including name, version, and description. This uses the --queryformat if one was specified. 
--last
Orders the package listing by install time such that the latest packages are at the top. 
-l, --list
List files in package. 
--provides
List capabilities this package provides. 
-R, --requires
List packages on which this package depends. 
--scripts
List the package specific scriptlet(s) that are used as part of the installation and uninstallation processes. 
-s, --state
Display the states of files in the package (implies -l). The state of each file is one of normal, not installed, or replaced. 
--triggers, --triggerscripts
Display the trigger scripts, if any, which are contained in the package. 
VERIFY OPTIONS
The general form of an rpm verify command is 


rpm {-V|--verify} [select-options] [verify-options] 


Verifying a package compares information about the installed files in the package with information about the files taken from the package metadata stored in the rpm database. Among other things, verifying compares the size, MD5 sum, permissions, type, owner and group of each file. Any discrepencies are displayed. Files that were not installed from the package, for example, documentation files excluded on installation using the "--excludedocs" option, will be silently ignored. 

The package selection options are the same as for package querying (including package manifest files as arguments). Other options unique to verify mode are: 

--nodeps
Don't verify dependencies of packages. 
--nodigest
Don't verify package or header digests when reading. 
--nofiles
Don't verify any attributes of package files. 
--noscripts
Don't execute the %verifyscript scriptlet (if any). 
--nosignature
Don't verify package or header signatures when reading. 
--nolinkto
--nomd5
--nosize
--nouser
--nogroup
--nomtime
--nomode
--nordev
Don't verify the corresponding file attribute. 
The format of the output is a string of 8 characters, a possible attribute marker: 


c %config configuration file.
d %doc documentation file.
g %ghost file (i.e. the file contents are not included in the package payload).
l %license license file.
r %readme readme file.

from the package header, followed by the file name. Each of the 8 characters denotes the result of a comparison of attribute(s) of the file to the value of those attribute(s) recorded in the database. A single "." (period) means the test passed, while a single "?" (question mark) indicates the test could not be performed (e.g. file permissions prevent reading). Otherwise, the (mnemonically emBoldened) character denotes failure of the corresponding --verify test: 


S file Size differs
M Mode differs (includes permissions and file type)
5 MD5 sum differs
D Device major/minor number mis-match
L readLink(2) path mis-match
U User ownership differs
G Group ownership differs
T mTime differs


DIGITAL SIGNATURE AND DIGEST VERIFICATION
The general forms of rpm digital signature commands are 



rpm --import PUBKEY ... 


rpm {--checksig} [--nosignature] [--nodigest] 
PACKAGE_FILE ... 


The --checksig option checks all the digests and signatures contained in PACKAGE_FILE to ensure the integrity and origin of the package. Note that signatures are now verified whenever a package is read, and --checksig is useful to verify all of the digests and signatures associated with a package. 

Digital signatures cannot be verified without a public key. An ascii armored public key can be added to the rpm database using --import. An imported public key is carried in a header, and key ring management is performed exactly like package management. For example, all currently imported public keys can be displayed by: 

rpm -qa gpg-pubkey* 

Details about a specific public key, when imported, can be displayed by querying. Here's information about the Red Hat GPG/DSA key: 

rpm -qi gpg-pubkey-db42a60e 

Finally, public keys can be erased after importing just like packages. Here's how to remove the Red Hat GPG/DSA key 

rpm -e gpg-pubkey-db42a60e 

SIGNING A PACKAGE

rpm --addsign|--resign PACKAGE_FILE ... 


Both of the --addsign and --resign options generate and insert new signatures for each package PACKAGE_FILE given, replacing any existing signatures. There are two options for historical reasons, there is no difference in behavior currently. 

USING GPG TO SIGN PACKAGES
In order to sign packages using GPG, rpm must be configured to run GPG and be able to find a key ring with the appropriate keys. By default, rpm uses the same conventions as GPG to find key rings, namely the $GNUPGHOME environment variable. If your key rings are not located where GPG expects them to be, you will need to configure the macro %_gpg_path to be the location of the GPG key rings to use. 

For compatibility with older versions of GPG, PGP, and rpm, only V3 OpenPGP signature packets should be configured. Either DSA or RSA verification algorithms can be used, but DSA is preferred. 

If you want to be able to sign packages you create yourself, you also need to create your own public and secret key pair (see the GPG manual). You will also need to configure the rpm macros 

%_signature
The signature type. Right now only gpg and pgp are supported. 
%_gpg_name
The name of the "user" whose key you wish to use to sign your packages. 
For example, to be able to use GPG to sign packages as the user "John Doe <jdoe@foo.com>" from the key rings located in /etc/rpm/.gpg using the executable /usr/bin/gpg you would include 


%_signature gpg
%_gpg_path /etc/rpm/.gpg
%_gpg_name John Doe <jdoe@foo.com>
%_gpgbin /usr/bin/gpg

in a macro configuration file. Use /etc/rpm/macros for per-system configuration and ~/.rpmmacros for per-user configuration. 

REBUILD DATABASE OPTIONS
The general form of an rpm rebuild database command is 


rpm {--initdb|--rebuilddb} [-v] [--dbpath DIRECTORY] [--root DIRECTORY] 


Use --initdb to create a new database, use --rebuilddb to rebuild the database indices from the installed package headers. 

SHOWRC
The command 

rpm --showrc 

shows the values rpm will use for all of the options are currently set in rpmrc and macros configuration file(s). 

FTP/HTTP OPTIONS
rpm can act as an FTP and/or HTTP client so that packages can be queried or installed from the internet. 
Package files for install, upgrade, and query operations may be specified as an ftp or http style URL: 

ftp://USER:PASSWORD@HOST:PORT/path/to/package.rpm 

If the :PASSWORD portion is omitted, the password will be prompted for (once per user/hostname pair). 
If both the user and password are omitted, anonymous ftp is used. In all cases, passive (PASV) ftp transfers 
are performed. 

rpm allows the following options to be used with ftp URLs: 

--ftpproxy HOST
The host HOST will be used as a proxy server for all ftp transfers, which allows users to ftp through firewall machines which use proxy systems. This option may also be specified by configuring the macro %_ftpproxy. 
--ftpport HOST
The TCP PORT number to use for the ftp connection on the proxy ftp server instead of the default port. This option may also be specified by configuring the macro %_ftpport. 
rpm allows the following options to be used with http URLs: 

--httpproxy HOST
The host HOST will be used as a proxy server for all http transfers. This option may also be specified by configuring the macro %_httpproxy. 
--httpport PORT
The TCP PORT number to use for the http connection on the proxy http server instead of the default port. This option may also be specified by configuring the macro %_httpport. 
LEGACY ISSUES
Executing rpmbuild
The build modes of rpm are now resident in the /usr/bin/rpmbuild executable. Although legacy compatibility provided by the popt aliases below has been adequate, the compatibility is not perfect; hence build mode compatibility through popt aliases is being removed from rpm. Install the rpmbuild package, and see rpmbuild(8) for documentation of all the rpm build modes previously documented here in rpm(8). 

Add the following lines to /etc/popt if you wish to continue invoking rpmbuild from the rpm command line: 


rpm     exec --bp               rpmb -bp
rpm     exec --bc               rpmb -bc
rpm     exec --bi               rpmb -bi
rpm     exec --bl               rpmb -bl
rpm     exec --ba               rpmb -ba
rpm     exec --bb               rpmb -bb
rpm     exec --bs               rpmb -bs 
rpm     exec --tp               rpmb -tp 
rpm     exec --tc               rpmb -tc 
rpm     exec --ti               rpmb -ti 
rpm     exec --tl               rpmb -tl 
rpm     exec --ta               rpmb -ta
rpm     exec --tb               rpmb -tb
rpm     exec --ts               rpmb -ts 
rpm     exec --rebuild          rpmb --rebuild
rpm     exec --recompile        rpmb --recompile
rpm     exec --clean            rpmb --clean
rpm     exec --rmsource         rpmb --rmsource
rpm     exec --rmspec           rpmb --rmspec
rpm     exec --target           rpmb --target
rpm     exec --short-circuit    rpmb --short-circuit

SEE ALSO

popt(3),
rpm2cpio(8),
rpmbuild(8),

http://www.rpm.org/ http://www.rpm.org/> 





39. Simplified overview Kernel parameters Solaris, AIX, Linux:
==============================================================

Throughout this document, you can find many other examples of settings.
This section is only a simplified overview.


39.1 Solaris:
-------------

The "/etc/system" file:

Available for Solaris Operating Environment, the /etc/system file contains definitions for kernel configuration limits 
such as the maximum number of users allowed on the system at a time, the maximum number of processes per user, 
and the inter-process communication (IPC) limits on size and number of resources. These limits are important because 
they affect, for example, DB2, Oracle performance on a Solaris Operating Environment machine. 

Some examples:

set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
set semsys:seminfo_semmni=100
set semsys:seminfo_semmsl=100
set semsys:seminfo_semmns=2500
set semsys:seminfo_semopm=100
set semsys:seminfo_semvmx=32767
..
..

You can use, among others, the "ipcs" command and "adb" command to retrieve kernel parameters and mem info.

Some remarks on Shared Memory and Semaphores:

- Shared Memory
Shared memory provides the fastest way for processes to pass large amounts of data to one another. 
As the name implies, shared memory refers to physical pages of memory that are shared by more than one process. 

Of particular interest is the "Intimate Shared Memory" facility, where the translation tables are shared 
as well as the memory. This enhances the effectiveness of the TLB (Translation Lookaside Buffer), 
which is a CPU-based cache of translation table information. Since the same information is used for 
several processes, available buffer space can be used much more efficiently. In addition, ISM-designated memory 
cannot be paged out, which can be used to keep frequently-used data and binaries in memory. 

Database applications are the heaviest users of shared memory. Vendor recommendations should be consulted 
when tuning the shared memory parameters. 

Solaris 10 only uses the shmmax and shmmni parameters. (Other parameters are set dynamically within the 
Solaris 10 IPC model.) 

shmmax (max-shm-memory in Solaris 10+): This is the maximum size of a shared memory segment 
(ie the largest value that can be used by shmget). Its theoretical maximum value is 4294967295 (4GB), 
but practical considerations usually limit it to less than this. There is no reason not to tune this value 
as high as possible, since no kernel resources are allocated based on this parameter. Solaris 10 sets shmmax 
to 1/4 physical memory by default, vs 512k for previous versions. 
shmmin: This is the smallest possible shared memory segment size. The default is 1 byte; this parameter 
should probably not be tuned. 
shmmni (max-shm-ids in Solaris 10+): Maximum number of shared memory identifiers at any given time. 
This parameter is used by kernel memory allocation to determine how much size to put aside for shmid_ds structures. 
Each of these is 112 bytes and requires an additional 8 bytes for a mutex lock; if it is set too high, memory useage 
can be a problem. The maximum setting for this variable in Solaris 2.5.1 and 2.6 is 2147483648 (2GB), and the 
default is 100. For Solaris 10, the default is 128 and the maximum is MAXINT. 
shmseg: Maximum number of segments per process. It is usually set to shmmni, but it should always be less 
than 65535. Sun documentations suggests a maximum for this parameter of 32767 and a default of 8 for 
Solaris 2.5.1 and 2.6. 

- Semaphores
Semaphores are a shareable resource that take on a non-negative integer value. They are manipulted 

by the P (wait) and V (signal) functions, which decrement and increment the semaphore, respectively. When a 
process needs a resource, a "wait" is issued and the semaphore is decremented. When the semaphore contains 
a value of zero, the resources are not available and the calling process spins or blocks (as appropriate) 
until resources are available. When a process releases a resource controlled by a semaphore, it increments 
the semaphore and the waiting processes are notified. 

Solaris 10 only uses the semmni, semmsl and semopm parameters. (Other parameters are dynamic within 
the Solaris 10 IPC model.) 

semmap: This sets the number of entries in the semaphore map. This should never be greater than semmni. If the number 
of semaphores per semaphore set used by the application is "n" then set semmap = ((semmni + n - 1)/n)+1
or more. Alternatively, we can set semmap to semmni x semmsl. An undersized semmap leads to "WARNING: 
rmfree map overflow" errors. The default setting is 10; the maximum for Solaris 2.6 is 2GB. The default for 
Solaris 9 was 25; Solaris 10 increased the default to 512. The limit is SHRT_MAX. 
semmni (max-sem-ids in Solaris 10+): Maximum number of systemwide semaphore sets. Each control structure consumes 
84 bytes. For Solaris 2.5.1-9, the default setting is 10; for Solaris 10, the default setting is 128. 
The maximum is 65535 
semmns: Maximum number of semaphores in the system. Each structure uses 16 bytes. This parameter should be set 
to semmni x semmsl. The default is 60; the maximum is 2GB. 
semmnu: Maximum number of undo structures in the system. This should be set to semmni so that each control structure 
has an undo structure. The default is 30, the maximum is 2 GB. 
semmsl (max-sem-nsems in Solaris 10+): Maximum number of semaphores per semaphore set. The default is 25, 
the maximum is 65535. 
semopm (max-sem-ops in Solaris 10+): Maximum number of semaphore operations that can be performed in each 
semop call. The default in Solaris 2.5.1-9 is 10, the maximum is 2 GB. Solaris 10 increased the default to 512. 
semume: Maximum number of undo structures per process. This should be set to semopm times the number of processes 
that will be using semaphores at any one time. The default is 10; the maximum is 2 GB. 
semusz: Number of bytes required for semume undo structures. This should not be tuned; it is set to 
semume x (1 + sizeof(undo)). The default is 96; the maximum is 2 GB. 
semvmx: Maximum value of a semaphore. This should never exceed 32767 (default value) unless SEM_UNDO 
is never used. The default is 32767; the maximum is 65535. 
semaem: Maximum adjust-on-exit value. This should almost always be left alone. The default is 16384; 
the maximum is 32767. 



39.2 Linux:
-----------

Kernel parameters used for system configuration are found in "/etc/sysctl.conf" and on a running system also in "/proc/sys/kernel", where you 
will find an individual file for each configuration parameter. Because these parameters have a direct effect on system 
performance and viability, you must have root access in order to modify them.

Occasionally, a prerequisite to a package installation requires the modification of kernel parameters. 
Since each parameter file contains a single line of data consisting of either a text 
string or numeric values, it is often easy to modify a parameter by simply using the echo command:

# echo 2048 > /proc/sys/kernel/msgmax

The aforementioned command will set the value of the msgmax parameter to 2048.

-- More on the proc File System:

The Linux kernel has two primary functions: to control access to physical devices on the computer 
and to schedule when and how processes interact with these devices. The /proc/ directory contains 
a hierarchy of special files which represent the current state of the kernel - allowing applications 
and users to peer into the kernel's view of the system. 

Within the /proc/ directory, one can find a wealth of information about the system hardware and any processes 
currently running. In addition, some of the files within the /proc/ directory tree can be manipulated by users 
and applications to communicate configuration changes to the kernel. 

Under Linux, all data are stored as files. Most users are familiar with the two primary types of files: 
text and binary. But the /proc/ directory contains another type of file called a virtual file. 
It is for this reason that /proc/ is often referred to as a virtual file system. 
These virtual files have unique qualities. Most of them are listed as zero bytes in size and yet when one 
is viewed, it can contain a large amount of information. In addition, most of the time and date settings 
on virtual files reflect the current time and date, indicative of the fact they constantly changing. 

Virtual files such as interrupts, /proc/meminfo, /proc/mounts, and /proc/partitions provide an 
up-to-the-moment glimpse of the system's hardware. Others, like /proc/filesystems and the /proc/sys/ 
directory provide system configuration information and interfaces. 

For organizational purposes, files containing information on a similar topic are grouped into virtual 
directories and sub-directories. For instance, /proc/ide/ contains information for all physical IDE devices. 
Likewise, process directories contain information about each running process on the system. 

By using the cat, more, or less commands on files within the /proc/ directory, you can immediately access 
an enormous amount of information about the system. For example, if you want to see what sort of CPU 
your computer has, type "cat /proc/cpuinfo" and you will see something similar to the following: 

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 5
model		: 9
model name	: AMD-K6(tm) 3D+ Processor
stepping	: 1
cpu MHz		: 400.919
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips	: 799.53
 

When viewing different virtual files in the /proc/ file system, you will notice some of the information is 
easily understandable while some is not human-readable. This is in part why utilities exist to pull data 
from virtual files and display it in a useful way. Some examples of such applications are 
lspci, apm, free, and top. 

As a general rule, most virtual files within the /proc/ directory are read only. However, some can be used 
to adjust settings in the kernel. This is especially true for files in the /proc/sys/ subdirectory. 

To change the value of a virtual file, use the echo command and a > symbol to redirect the new value to the file. 
For instance, to change your hostname on the fly, you can type: 

echo bob.subgenius.com > /proc/sys/kernel/hostname 
 
Other files act as binary or boolean switches. For instance, if you type cat /proc/sys/net/ipv4/ip_forward, 
you will see either a 0 or a 1. A 0 indicates the kernel is not forwarding network packets. By using the 
echo command to change the value of the ip_forward file to 1, you can immediately turn packet forwarding on. 

Another command used to alter settings in the /proc/sys/ subdirectory is /sbin/sysctl.


-- sysctl:

Linux also provides the sysctl command to modify kernel parameters at runtime. 
Sysctl uses parameter information stored in a file called /etc/sysctl.conf. If, for example, we wanted to 
change the value of the msgmax parameter as we did above, but this time using sysctl, the command would 
look like this:

# sysctl -w kernel.msgmax=2048


- About the kernel:

Finding the Kernel
Locate the kernel image on your hard disk. It should be in the file /vmlinuz, or /vmlinux, or /boot/vmlinux
In some installations, /vmlinuz is a soft link to the actual kernel, so you may need to track down 
the kernel by following the links. On Redhat 6.1 it is in "/boot/vmlinuz". To find the kernel being used 
look in "/etc/lilo.conf".

You can also type "uname -a" to see the kernel version. 

/proc/cmdline

This file shows the parameters passed to the kernel at the time it is started. A sample /proc/cmdline file 
looks like this: 

ro root=/dev/hda2

This tell us the kernel is mounted read-only - signified by (ro) - off of the second partition 
on the first IDE device (/dev/hda2). 


- Kernel, memory tuning:

Most about tuning memory en kernel params seem to do with the "/etc/sysctl.conf" file:

In most distributions, the "/etc/sysctl.conf" determines the limits and/or behaviour of the kernel 
and memory.

If you type "sysctl -a |more" you will see a long list of kernel parameters. 
You can use this sysctl program to modify these parameters, for example:

# sysctl -w kernel.shmmax=100000000
# sysctl -w fs.file-max=65536
# echo "kernel.shmmax = 100000000" >> /etc/sysctl.conf


Example configuration: setting kernel parameters before installing Oracle 10g:
------------------------------------------------------------------------------

Most out of the box kernel parameters (of RHELS 3,4,5) are set correctly for Oracle
except a few.

You should have the following minimal configuration:

net.ipv4.ip_local_port_range	1024  65000
kernel.sem			250  32000  100  128
kernel.shmmni			4096
kernel.shmall			2097152
kernel.shmmax			2147483648
fs.file-max			65536


You can check the most important parameters using the following command:

# /sbin/sysctl -a | egrep 'sem|shm|file-max|ip_local'

net.ipv4.ip_local_port_range = 1024  65000
kernel.sem = 250  32000  100  128
kernel.shmmni = 4096
kernel.shmall = 2097152
kernel.shmmax = 2147483648
fs.file-max = 65536

If some value should be changed, you can change the "/etc/sysctl.conf" file and run the "/sbin/sysctl -p" command
to change the value immediately.
Every time the system boots, the init program runs the /etc/rc.d/rc.sysinit script. This script contains 
a command to execute sysctl using /etc/sysctl.conf to dictate the values passed to the kernel. 
Any values added to /etc/sysctl.conf will take effect each time the system boots. 



Example configuration: from: Installing Oracle 91 on Linux
-----------------------------------------------------------

For Linux, use the ipcs command to obtain a list of the system's current shared memory segments and 
semaphore sets, and their identification numbers and owner. 

Perform the following steps to modify the kernel parameters by using the /proc file system. 

Log in as the root user. 

Change to the /proc/sys/kernel directory. 

Review the current semaphore parameter values in the sem file by using the cat or more utility. 
For example, using the cat utility, enter the following command: 

# cat sem

The output lists, in order, the values for the SEMMSL, SEMMNS, SEMOPM, and SEMMNI parameters. 
The following example shows how the output appears: 

250 32000 32 128

In the preceding output example, 250 is the value of the SEMMSL parameter, 32000 is the value of the 
SEMMNS parameter, 32 is the value of the SEMOPM parameter, and 128 is the value of the SEMMNI parameter. 

Modify the parameter values by using the following command syntax: 

# echo SEMMSL_value SEMMNS_value SEMOPM_value SEMMNI_value > sem

Replace the parameter variables with the values for your system in the order that they are entered 
in the preceding example. For example: 

# echo 100 32000 100 100 > sem

Review the current shared memory parameters by using the cat or more utility. For example, using the cat utility, 
enter the following command: 

# cat shared_memory_parameter

In the preceding example, the variable shared_memory_parameter is either the SHMMAX or SHMMNI parameter. 
The parameter name must be entered in lowercase letters. 

Modify the shared memory parameter by using the echo utility. For example, to modify the SHMMAX parameter, 
enter the following command: 

# echo 2147483648 > shmmax

Modify the shared memory parameter by using the echo utility. For example, to modify the SHMMNI parameter, 
enter the following command: 

# echo 4096 > shmmni

Modify the shared memory parameter by using the echo utility. For example, to modify the SHMALL parameter, 
enter the following command: 

# echo 2097152 > shmall

Write a script to initialize these values during system startup, and include the script in your system init files. 

See Also: 
Your system vendor's documentation for more information on script files and init files.  

Set the File Handles by using ulimit -n and /proc/sys/fs/file-max. 

# echo 65536 > /proc/sys/fs/file-max
ulimit -n 65536

Set the Sockets to /proc/sys/net/ipv4/ip_local_port_range 

# echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_change

Set the Process limit by using ulimit -u. This will give you the number of processes per user. 

ulimit -u 16384


39.4 Linux modules:
-------------------


Modules on Linux (1):
---------------------

- insmod, rmmod, lsmod

lsmod:
------

lsmod - list loaded modules.   

SYNOPSIS
lsmod [-hV]   
DESCRIPTION
lsmod shows information about all loaded modules. 
The format is name, size, use count, list of referring modules. The information displayed is identical 
to that available from "/proc/modules". 

If the module controls its own unloading via a can_unload routine then the user count displayed by lsmod 
is always -1, irrespective of the real use count.   

insmod:
-------

insmod - install loadable kernel module 

SYNOPSIS
insmod [-fhkLmnpqrsSvVxXyYN] [-e persist_name] [-o module_name] [-O blob_name] [-P prefix] module [ symbol=value ... ] 
DESCRIPTION
insmod installs a loadable module in the running kernel. 
insmod tries to link a module into the running kernel by resolving all symbols from the kernel's 
exported symbol table. 

If the module file name is given without directories or extension, insmod will search for the module 
in some common default directories. The environment variable MODPATH can be used to override this default. 
If a module configuration file such as /etc/modules.conf exists, it will override the paths defined in MODPATH. 

The environment variable MODULECONF can also be used to select a different configuration file from the 
default /etc/modules.conf (or /etc/conf.modules (deprecated)). This environment variable will override 
all the definitions above. 

When environment variable UNAME_MACHINE is set, modutils will use its value instead of the machine field 
from the uname() syscall. This is mainly of use when you are compiling 64 bit modules in 32 bit user space 
or vice versa, set UNAME_MACHINE to the type of the modules. Current modutils does not support full 
cross build mode for modules, it is limited to choosing between 32 and 64 bit versions of the host architecture. 

rmmod:
------

rmmod - unload loadable modules   
SYNOPSIS
rmmod [ -aehrsvV ] module ...   
DESCRIPTION
rmmod unloads loadable modules from the running kernel. 
rmmod tries to unload a set of modules from the kernel, with the restriction that they are not in use 
and that they are not referred to by other modules. 

If more than one module is named on the command line, the modules will be removed in the given order. 
This supports unloading of stacked modules. 

With the option '-r', a recursive removal of modules will be attempted. This means that if a top module 
in a stack is named on the command line, all modules that are used by this module will be removed as well, 
if possible. 



More info about the mod commands:
---------------------------------

- Hardware Detection with the Help of hwinfo
hwinfo can detect the hardware of your system and select the drivers needed to run this hardware. 
Get a small introduction to this command with hwinfo --help. If you, for example, need information about 
your SCSI devices, use the command hwinfo --scsi.

All this information is also available in YaST in the hardware information module. 

- Handling Modules
The following commands are available:

insmod
insmod loads the requested module after searching for it in a subdirectory of /lib/modules/<version>. 
It is better, however, to use modprobe rather than insmod. 

rmmod
Unloads the requested module. This is only possible if this module is no longer needed. For example, 
the isofs module cannot be unloaded while a CD is still mounted. 

depmod
Creates the file modules.dep in /lib/modules/<version> that defines the dependencies of all the modules. 
This is necessary to ensure that all dependent modules are loaded with the selected ones. 
This file will be built after the system is started if it does not exist.

modprobe
Loads or unloads a given module while taking into account dependencies of this module. This command 
is extremely powerful and can be used for a lot of things (e.g., probing all modules of a given type 
until one is successfully loaded). In contrast to insmod, modprobe checks /etc/modprobe.conf and therefore 
is the preferred method of loading modules. For detailed information about this topic, refer to the 
corresponding man page. 

lsmod
Shows which modules are currently loaded as well as how many other modules are using them. Modules started 
by the kernel daemon are tagged with autoclean. This label denotes that these modules will automatically 
be removed once they reach their idle time limit. 

modinfo
Shows module information.

/etc/modprobe.conf
The loading of modules is affected by the files /etc/modprobe.conf and /etc/modprobe.conf.local 
and the directory /etc/modprobe.d. See man modprobe.conf. Parameters for modules that access hardware directly
must be entered in this file. Such modules may need system-specific options (e.g., CD-ROM driver or network driver). 
The parameters used here are described in the kernel sources. Install the package kernel-source and read the 
documentation in the directory /usr/src/linux/Documentation. 

Kmod - the Kernel Module Loader
The kernel module loader is the most elegant way to use modules. Kmod performs background monitoring 
and makes sure the required modules are loaded by modprobe as soon as the respective functionality is needed 
in the kernel. 

To use Kmod, activate the option `Kernel module loader' (CONFIG_KMOD) in the kernel configuration. 
Kmod is not designed to unload modules automatically; in view of today's RAM capacities, the potential memory savings 
would be marginal. For reasons of performance, monolithic kernels may be more suitable for servers 
that are used for special tasks and need only a few drivers. 


modprobe.conf:
--------------

Example 1:

# This file is autogenerated from /etc/modules.conf using generate-modprobe.conf command

alias eth1 sk98lin
alias eth0 ipw2200
alias sound-slot-0 snd-hda-intel
install scsi_hostadapter /sbin/modprobe ahci; /bin/true
remove snd-hda-intel /sbin/modprobe -r snd-pcm-oss; /sbin/modprobe --first-time -r --ignore-remove snd-hda-intel
install snd-hda-intel /sbin/modprobe --first-time --ignore-install snd-hda-intel && { /sbin/modprobe snd-pcm-oss; /bin/true; }
install usb-interface /sbin/modprobe uhci-hcd; /sbin/modprobe ehci-hcd; /bin/true
#alias eth1 eth1394
alias ieee1394-controller ohci1394
alias net-pf-10 off

#irda
alias tty-ldisc-11 irtty
alias char-major-161-* ircomm-tty

# Para nsc 383 SIO:
alias char-major-160-* nsc-ircc
alias irda0 nsc-ircc
options nsc-irc io=0x2f8 irq=3 dma=0
install nsc-ircc { /bin/setserial /dev/ttyS1 uart none; } ; /sbin/modprobe --first-time --ignore-install nsc-ircc

#irda: 0x2f8, irq 3, dma 0
#lpt: 0x3f8, irq 7, dma 1

options parport_pc io=0x378 irq=7 dma=1

Example 2:

alias ieee1394-controller ohci1394
alias eth0 eepro100
alias sound-slot-0 emu10k1
alias net-pf-10 off
install snd-emu10k1 /sbin/modprobe --first-time --ignore-install snd-emu10k1 
&& { /sbin/modprobe snd-pcm-oss; /bin/true; }
install usb-interface /sbin/modprobe usb-uhci; /sbin/modprobe ehci-hcd; /bin/true
remove snd-emu10k1 { /sbin/modprobe -r snd-pcm-oss; } ; /sbin/modprobe -r --first-time --ignore-remove snd-emu10k1 


/etc/sysconfig:
---------------

Note 1:
-------

SuSEconfig and /etc/sysconfig
The main configuration of SUSE LINUX can be made with the configuration files in /etc/sysconfig. 
Former versions of SUSE LINUX relied on /etc/rc.config for system configuration, but it became obsolete 
in previous versions. /etc/rc.config is not created at installation time, as all system configuration 
is controlled by /etc/sysconfig. However, if /etc/rc.config exists at the time of a system update, 
it remains intact.

The individual files in /etc/sysconfig are only read by the scripts to which they are relevant. This ensures 
that network settings, for instance, need to be parsed only by network-related scripts. Apart from that, 
there are many other system configuration files that are generated according to the settings in /etc/sysconfig. 
This task is performed by SuSEconfig. For example, if you change the network configuration, SuSEconfig is likely 
to make changes to the file /etc/host.conf as well, as this is one of the files relevant for the 
network configuration. 

If you change anything in these files manually, run SuSEconfig afterwards to make sure all the necessary 
changes are made in all the relevant places. If you change the configuration using the YaST sysconfig editor, 
all changes are applied automatically - YaST automatically starts SuSEconfig to update the configuration 
files as needed.

This concept enables you to make basic changes to your configuration without needing to reboot the system. 
Because some changes are rather complex, some programs must be restarted for the changes to take effect. 
For instance, changes to the network configuration may require a restart of the network programs concerned. 
This can be achieved by entering the commands rcnetwork stop and rcnetwork start.

Note 2:
-------

The Linux sysconfig directory
The /etc/sysconfig directory is where many of the files that control the system configuration are stored. 
This section lists these files and many of the optional values in the files used to make system changes. 
To get complete information on these files read the file /usr/doc/initscripts-4.48/sysconfig.txt. 

/etc/sysconfig/clock
Used to configure the system clock to Universal or local time and set some other clock parameters. An example file: 
UTC=false
ARC=false

Options: 
UTC - true means the clock is set to UTC time otherwise it is at local time 
ARC - Set true on alpha stations only. It indicates the ARC console's 42-year time offset is in effect. If not set to true, the normal Unix epoch is assumed. 
ZONE="filename" - indicates the zonefile under the directory /usr/share/zoneinfo that the /etc/localtime file is a copy of. This may be set to: 
ZONE="US/Eastern" 

/etc/sysconfig/init
This file is used to set some terminal characteristics and environment variables. A sample listing: 
# color => new RH6.0 bootup
# verbose => old-style bootup
# anything else => new style bootup without ANSI colors or positioning
BOOTUP=color
# column to start "[  OK  ]" label in 
RES_COL=60
# terminal sequence to move to that column. You could change this
# to something like "tput hpa ${RES_COL}" if your terminal supports it
MOVE_TO_COL="echo -en \\033[${RES_COL}G"
# terminal sequence to set color to a 'success' color (currently: green)
SETCOLOR_SUCCESS="echo -en \\033[1;32m"
# terminal sequence to set color to a 'failure' color (currently: red)
SETCOLOR_FAILURE="echo -en \\033[1;31m"
# terminal sequence to set color to a 'warning' color (currently: yellow)
SETCOLOR_WARNING="echo -en \\033[1;33m"
# terminal sequence to reset to the default color.
SETCOLOR_NORMAL="echo -en \\033[0;39m"
# default kernel loglevel on boot (syslog will reset this)
LOGLEVEL=1
# Set to something other than 'no' to turn on magic sysrq keys...
MAGIC_SYSRQ=no
# Set to anything other than 'no' to allow hotkey interactive startup...
PROMPT=yes

Options: 
BOOTUP=bootupmode - Choices are color, or verbose. The choice color sets new boot display. The choice verbose sets old style display. Anything else sets a new display without ANSI formatting. 
LOGLEVEL=number - Sets the initial console logging level for the kernel. The default is 7. The values are: 
emergency, panic - System is unusable 
alert - Action must be taken immediately 
crit - Critical conditions 
err, error (depreciated) - Error conditions 
warning, warn (depreciated) - Warning conditions 
notice - Normal but significant conditions 
info - Informational message 
debug - Debug level message 
RES_COL=number - Screen column to start status labels at. The Default is 60. 
MOVE_TO_COL=command - A command to move the cursor to $RES_COL. 
SETCOLOR_SUCCESS=command - Set the color used to indicate success. 
SETCOLOR_FAILURE=command - Set the color used to indicate failure. 
SETCOLOR_WARNING=command - Set the color used to indicate warning. 
SETCOLOR_NORMAL=command - Set the color used tor normal color 
MAGIC_SYSRQ=yes|no - Set to 'no' to disable the magic sysrq key. 
PROMPT=yes|no - Set to 'no' to disable the key check for interactive mode. 


/etc/sysconfig/keyboard
Used to configure the keyboard. Used by the startup script /etc/rc.d/rc.sysinit. An example file: 
KEYTABLE="us"

Options: 
KEYTABLE="keytable file" - The line [ KEYTABLE="/usr/lib/kbd/keytables/us.map" ] tells the system to use the file shown for keymapping. 
KEYBOARDTYPE=sun|pc - The selection, "sun", indicates attached on /dev/kbd is a sun keyboard. The selection "pc" indicates a PS/2 keyboard is on the ps/2 port. 


/etc/sysconfig/mouse
This file is used to configure the mouse. An example file: 
FULLNAME="Generic - 2 Button Mouse (PS/2)"
MOUSETYPE="ps/2"
XEMU3="yes"
XMOUSETYPE="PS/2"

Options: 
MOUSETYPE=type - Choices are microsoft, mouseman, mousesystems, ps/2, msbm, logibm, atibm, logitech, mmseries, or mmhittab. 
XEMU3=yes|no - If yes, emulate three buttons, otherwise not. 


/etc/sysconfig/network
Used to configure networking options. All IPX options default to off. An example file: 
NETWORKING=yes
FORWARD_IPV4="yes"
HOSTNAME="mdct-dev3"
GATEWAY="10.1.0.25"
GATEWAYDEV="eth0"

Options: 
NETWORKING=yes|no - Sets network capabilities on or off. 
HOSTNAME="hostname". To work with old software, the /etc/HOSTNAME file should contain the same hostname. 
FORWARD_IPV4=yes|no - Turns the ability to perform IP forwarding on or off. Turn it on if you want to use the machine as a router. Turn it off to use it as a firewall or IP masquerading. 
DEFRAG_IPV4=yes|no - Set this to automatically defragment IPv4 packets. This is good for masquerading, and a bad idea otherwise. It defaults to 'no'. 
GATEWAY="gateway IP" 
GATEWAYDEV="gateway device" Possible values include eth0, eth1, or ppp0. 
NISDOMAIN="nis domain name" 
IPX=yes|no - Turn IPX ability on or off. 
IPXAUTOPRIMARY=on|off - Must not be yes or no. 
IPXAUTOFRAME=on|off 
IPXINTERNALNETNUM="netnum" 
IPXINTERNALNODENUM="nodenum" 


/etc/sysconfig/static-routes
Configures static routes on a network. Used to set up static routing. An example file: 
eth1 net 192.168.199.0 netmask 255.255.255.0 gw 192.168.199.1
eth0 net 10.1.0.0 netmask 255.255.0.0 gw 10.1.0.153
eth1 net 255.255.255.255 netmask 255.255.255.255

The syntax is: 
device net network netmask netmask gw gateway 

The device may be a device name such as eth0 which is used to have the route brought up and down as the device is brought up or down. The value can also be "any" to let the system calculate the correct devices at run time. 


/etc/sysconfig/routed 
Sets up dynamic routing policies. An example file: 
EXPORT_GATEWAY="no"
SILENT="yes"

Options: 
SILENT=yes|no 
EXPORT_GATEWAY=yes|no 


/etc/sysconfig/pcmcia
Used to configure pcmcia network cards. An example file: 
PCMCIA=no
PCIC=
PCIC_OPTS=
CORE_OPTS=

Options: 
PCMCIA=yes|no 
PCIC=i82365|tcic 
PCIC_OPTS=socket driver (i82365 or tcic) timing parameters 
CORE_OPTS=pcmcia_core options 
CARDMGR_OPTS=cardmgr options 


/etc/sysconfig/amd
Used to configure the auto mount daemon. An example file: 
ADIR=/.automount
MOUNTPTS='/net /etc/amd.conf'
AMDOPTS=

Options: 
ADIR=/.automount (normally never changed) 
MOUNTPTS='/net /etc/amd.conf' (standard automount stuff) 
AMDOPTS= (extra options for AMD) 


/etc/sysconfig/tape
Used for backup tape device configuration. Options: 
DEV=/dev/nst0 - The tape device. Use the non-rewinding tape for these scripts. For SCSI tapes the device is /dev/nst#, where # is the number of the tape drive you want to use. If you only have one then use nst0. For IDE tapes the device is /dev/ht#. For floppy tape drives the device is /dev/ftape. 
ADMIN=root - The person to mail to if the backup fails for any reason 
SLEEP=5 - The time to sleep between tape operations. 
BLOCKSIZE=32768 - This worked fine for 8mm, then 4mm, and now DLT. An optimal setting is probably the amount of data your drive writes at one time. 
SHORTDATE=$(date +%y:%m:%d:%H:%M) - A short date string, used in backup log filenames. 
DAY=$(date +log-%y:%m:%d) - Used for the log file directory. 
DATE=$(date) - Date string, used in log files. 
LOGROOT=/var/log/backup - Root of the logging directory 
LIST=$LOGROOT/incremental-list - This is the file name the incremental backup will use to store the incremental list. It will be $LIST-{some number}. 
DOTCOUNT=$LOGROOT/.count - For counting as you go to know which incremental list to use. 
COUNTER=$LOGROOT/counter-file - For rewinding when done...might not use. 
BACKUPTAB=/etc/backuptab - The file in which we keep our list of backup(s) we want to make. 


/etc/sysconfig/sendmail
An example file: 
DAEMON=yes
QUEUE=1h

Options: 
DAEMON=yes|no - yes implies -bd 
QUEUE=1h - Given to sendmail as -q$QUEUE. The -q option is not given to sendmail if /etc/sysconfig/sendmail exists and QUEUE is empty or undefined. 


/etc/sysconfig/i18n
Controls the system font settings. The language variables are used in /etc/profile.d/lang.sh. An example i18n file: 
LANG="en_US"
LC_ALL="en_US"
LINGUAS="en_US"

Options: 
LANG= set locale for all categories, can be any two letter ISO language code. 
LC_CTYPE= localedata configuration for classification and conversion of characters. 
LC_COLLATE= localedata configuration for collation (sort order) of strings. 
LC_MESSAGES= localedata configuration for translation of yes and no messages. 
LC_NUMERIC= localedata configuration for non-monetary numeric data. 
LC_MONETARY= localedata configuration for monetary data. 
LC_TIME= localedata configuration for date and time. 
LC_ALL= localedata configuration overriding all of the above. 
LANGUAGE= can be a : separated list of ISO language codes. 
LINGUAS= can be a ' ' separated list of ISO language codes. 
SYSFONT= any font that is legal when used as /usr/bin/consolechars -f $SYSFONT ... (See console-tools package for consolechars command) 
UNIMAP= any SFM (screen font map, formerly called Unicode mapping table - see consolechars(8)) 
/usr/bin/consolechars -f $SYSFONT --sfm $UNIMAP 

SYSFONTACM= any ACM (application charset map - see consolechars(8)) 
/usr/bin/consolechars -f $SYSFONT --acm $SYSFONTACM 

The above is used by the /sbin/setsysfont command (which is run by rc.sysinit at boot time.) 



/etc/sysconfig/network-scripts/ifup:
/etc/sysconfig/network-scripts/ifdown:
These are symbolic links to /sbin/ifup and /sbin/ifdown, respectively. These symlinks are here for legacy purposes only. They will probably be removed in future versions. These scripts take one argument normally: the name of the device (e.g. eth0). They are called with a second argument of "boot" during the boot sequence so that devices that are not meant to be brought up on boot (ONBOOT=no, see below) can be ignored at that time. 


/etc/sysconfig/network-scripts/network-functions
This is not really a public file. Contains functions which the scripts use for bringing interfaces up and down. In particular, it contains most of the code for handling alternative interface configurations and interface change notification through netreport. 


/etc/sysconfig/network-scripts/ifcfg-interface
/etc/sysconfig/network-scripts/ifcfg-interface-clone
Defines an interface. An example file called ifcfg-eth0: 
DEVICE="eth0"
IPADDR="10.1.0.153"
NETMASK="255.255.0.0"
ONBOOT="yes"
BOOTPROTO="none"
IPXNETNUM_802_2=""
IPXPRIMARY_802_2="no"
IPXACTIVE_802_2="no"
IPXNETNUM_802_3=""
IPXPRIMARY_802_3="no"
IPXACTIVE_802_3="no"
IPXNETNUM_ETHERII=""
IPXPRIMARY_ETHERII="no"
IPXACTIVE_ETHERII="no"
IPXNETNUM_SNAP=""
IPXPRIMARY_SNAP="no"
IPXACTIVE_SNAP="no"

The /etc/sysconfig/network-scripts/ifcfg-interface-clone file only contains the parts of the definition that are different in a "clone" (or alternative) interface. For example, the network numbers might be different, but everything else might be the same, so only the network numbers would be in the clone file, but all the device information would be in the base ifcfg file.

Base items in the above two files: 

NAME="friendly name for users to see" - Most important for PPP. Only used in front ends. 
DEVICE="name of physical device" 
IPADDR= 
NETMASK= 
GATEWAY= 
ONBOOT=yes|no 
USERCTL=yes|no 
BOOTPROTO=none|bootp|dhcp - If BOOTPROTO is not "none", then the only other item that must be set is the DEVICE item; all the rest will be determined by the boot protocol. No "dummy" entries need to be created. 
Base items being deprecated: 
NETWORK="will be calculated automatically with ifcalc" 
BROADCAST="will be calculated automatically with ifcalc" 
Ethernet-only items: 
{IPXNETNUM,IPXPRIMARY,IPXACTIVE}_{802_2,802_3,ETHERII,SNAP} configuration matrix for IPX. Only used if IPX is active. Managed from /etc/sysconfig/network-scripts/ifup-ipx 
PPP/SLIP items: 
PERSIST=yes|no 
MODEMPORT=device - An example device is /dev/modem. 
LINESPEED=speed - An example speed is 115200. 
DEFABORT=yes|no - Tells netcfg whether or not to put default abort strings in when creating/editing the chat script and/or dip script for this interface. 
PPP-specific items 
WVDIALSECT="list of sections from wvdial.conf to use" - If this variable is set, then the chat script (if it exists) is ignored, and wvdial is used to open the PPP connection. 
PEERDNS=yes|no - Modify /etc/resolv.conf if peer uses msdns extension. 
DEFROUTE=yes|no - Set this interface as default route? 
ESCAPECHARS=yes|no -Simplified interface here doesn't let people specify which characters to escape; almost everyone can use asyncmap 00000000 anyway, and they can set PPPOPTIONS to asyncmap foobar if they want to set options perfectly). 
HARDFLOWCTL=yes|no - Yes implies "modem crtscts" options. 
PPPOPTIONS="arbitrary option string" - It is placed last on the command line, so it can override other options like asyncmap that were specified differently. 
PAPNAME="name $PAPNAME" - On pppd command line. Note that the "remotename" option is always specified as the logical ppp device name, like "ppp0" (which might perhaps be the physical device ppp1 if some other ppp device was brought up earlier...), which makes it easy to manage pap/chap files -- name/password pairs are associated with the logical ppp device name so that they can be managed together. 
REMIP="remote ip address" - Normally unspecified. 
MTU= 
MRU= 
DISCONNECTTIMEOUT="number of seconds" The current default is 5. This is the time to wait before re-establishing the connection after a successfully-connected session terminates before attempting to establish a new connection. 
RETRYTIMEOUT="number of seconds" - The current default is 60. This is the time to wait before re-attempting to establish a connection after a previous attempt fails. 
/etc/sysconfig/network-scripts/chat-interface - This is the chat script for PPP or SLIP connection intended to establish the connection. For SLIP devices, a DIP script is written from the chat script; for PPP devices, the chat script is used directly.


/etc/sysconfig/network-scripts/dip-interface
A write-only script created from the chat script by netcfg. Do not modify this. In the future, this file may disappear by default and created on-the-fly from the chat script if it does not exist.


/etc/sysconfig/network-scripts/ifup-post
Called when any network device EXCEPT a SLIP device comes up. Calls /etc/sysconfig/network-scripts/ifup-routes to bring up static routes that depend on that device. Calls /etc/sysconfig/network-scripts/ifup-aliases to bring up aliases for that device. Sets the hostname if it is not already set and a hostname can be found for the IP for that device. Sends SIGIO to any programs that have requested notification of network events. It could be extended to fix up nameservice configuration, call arbitrary scripts, etc, as needed.


/etc/sysconfig/network-scripts/ifup-routes
Set up static routes for a device. An example file: 
#!/bin/sh

# adds static routes which go through device $1

if [ "$1" = "" ]; then
	echo "usage: $0 <net-device>"
	exit 1
fi

if [ ! -f /etc/sysconfig/static-routes ]; then
	exit 0
fi

#note the trailing space in the grep gets rid of aliases
grep "^$1 " /etc/sysconfig/static-routes | while read device args; do
	/sbin/route add -$args $device
done


/etc/sysconfig/network-scripts/ifup-aliases
Bring up aliases for a device.


/etc/sysconfig/network-scripts/ifdhcpc-done
Called by dhcpcd once dhcp configuration is complete; sets up /etc/resolv.conf from the version dhcpcd dropped in /etc/dhcpc/resolv.conf 


Note 3:
-------

Red Hat Linux 8.0: The Official Red Hat Linux Reference Guide 
Prev Chapter 3. Boot Process, Init, and Shutdown Next 

--------------------------------------------------------------------------------

The /etc/sysconfig/ Directory
The following information outlines some of the files found in the /etc/sysconfig/ directory, their function, 
and their contents. This information is not intended to be complete, as many of these files have a variety 
of options that are only used in very specific or rare circumstances.

The /usr/share/doc/initscripts-<version-number>/sysconfig.txt file contains a more authoritative listing 
of the files found in the /etc/sysconfig directory and the configuration options available.

Files in the /etc/sysconfig/ Directory
The following files are normally found in the /etc/sysconfig/ directory:

amd
apmd
arpwatch
authconfig
cipe
clock
desktop
dhcpd
firstboot
gpm
harddisks
hwconf
i18n
identd
init
ipchains
iptables
irda
keyboard
kudzu
mouse
named
netdump
network
ntpd
pcmcia
radvd
rawdevices
redhat-config-users
redhat-logviewer
samba
sendmail
soundcard
squid
tux
ups
vncservers
xinetd

It is possible that your system may be missing a few of them if the corresponding program that would need 
that file is not installed.

Next, we will take a look at each one.

/etc/sysconfig/amd
The /etc/sysconfig/amd file contains various parameters used by amd allowing for the automounting and 
automatic unmounting of file systems.

/etc/sysconfig/apmd
The /etc/sysconfig/apmd file is used by apmd as a configuration for what things to start/stop/change 
on suspend or resume. It is set up to turn on or off apmd during startup, depending on whether your hardware 
supports Advanced Power Management (APM) or if you choose not to use it. apm is a monitoring daemon that works 
with power management code within the Linux kernel. It can alert you to a low battery if you are using 
Red Hat Linux on a laptop, among other things.

/etc/sysconfig/arpwatch
The /etc/sysconfig/arpwatch file is used to pass arguments to the arpwatch daemon at boot time. 
The arpwatch daemon maintains a table of Ethernet MAC addresses and their IP address pairings. 
For more information about what parameters you can use in this file, type man arpwatch. By default, 
this file sets the owner of the arpwatch process to the user pcap.

/etc/sysconfig/authconfig
The /etc/sysconfig/authconfig file sets the kind of authorization to be used on the host. 
It contains one or more of the following lines:

USEMD5=<value>, where <value> is one of the following:

yes - MD5 is used for authentication.
no - MD5 is not used for authentication.

USEKERBEROS=<value>, where <value> is one of the following:

yes - Kerberos is used for authentication.
no - Kerberos is not used for authentication.

USELDAPAUTH=<value>, where <value> is one of the following:

yes - LDAP is used for authentication.
no - LDAP is not used for authentication.

/etc/sysconfig/clock
The /etc/sysconfig/clock file controls the interpretation of values read from the system hardware clock.

The correct values are:

UTC=<value>, where <value> is one of the following boolean values:

true or yes - Indicates that the hardware clock is set to Universal Time.
false or no - Indicates that the hardware clock is set to local time.

ARC=<value>, where <value> is the following:

true or yes - Indicates the ARC console's 42-year time offset is in effect. This setting is only 
for ARC- or AlphaBIOS-based Alpha systems. Any other value indicates that the normal UNIX epoch is in use.

SRM=<value>, where <value> is the following:

true or yes - Indicates the SRM console's 1900 epoch is in effect. This setting is only for SRM-based 
Alpha systems. Any other value indicates that the normal UNIX epoch is in use.

ZONE=<filename> - Indicates the timezone file under /usr/share/zoneinfo that /etc/localtime is a copy of, such as:

ZONE="America/New York"


Earlier releases of Red Hat Linux used the following values (which are deprecated):

CLOCKMODE=<value>, where <value> is one of the following:

GMT - Indicates that the clock is set to Universal Time (Greenwich Mean Time).

ARC - Indicates the ARC console's 42-year time offset is in effect (for Alpha-based systems only).

/etc/sysconfig/desktop
The /etc/sysconfig/desktop file specifies the desktop manager to be run, such as:

DESKTOP="GNOME"

/etc/sysconfig/dhcpd
The /etc/sysconfig/dhcpd file is used to pass arguments to the dhcpd daemon at boot time. 
The dhcpd daemon implements the Dynamic Host Configuration Protocol (DHCP) and the Internet Bootstrap 
Protocol (BOOTP). DHCP and BOOTP assign hostnames to machines on the network. For more information 
about what parameters you can use in this file, type man dhcpd.

/etc/sysconfig/firstboot
Beginning with Red Hat Linux 8.0, the first time you boot the system, the /sbin/init program calls 
the etc/rc.d/init.d/firstboot script. This allows the user to install additional applications 
and documentation before the boot process completes.

The /etc/sysconfig/firstboot file tells the firstboot command not to run on subsequent reboots. 
If you want firstboot to run the next time you boot the system, simply remove /etc/sysconfig/firstboot 
and execute chkconfig --level 5 firstboot on.

/etc/sysconfig/gpm
The /etc/sysconfig/gpm file is used to pass arguments to the gpm daemon at boot time. The gpm daemon is the 
mouse server which allows mouse acceleration and middle-click pasting. For more information about what 
parameters you can use in this file, type man gpm. By default, it sets the mouse device to /dev/mouse.

/etc/sysconfig/harddisks
The /etc/sysconfig/harddisks file allows you to tune your hard drive(s). You can also use /
etc/sysconfig/hardiskhd[a-h], to configure parameters for specific drives.

 Warning 
  Do not make changes to this file lightly. If you change the default values stored here, you could 
  corrupt all of the data on your hard drive(s).
 
The /etc/sysconfig/harddisks file may contain the following:

USE_DMA=1, where setting this to 1 enables DMA. However, with some chipsets and hard drive combinations, 
DMA can cause data corruption. Check with your hard drive documentation or manufacturer before enabling this.

Multiple_IO=16, where a setting of 16 allows for multiple sectors per I/O interrupt. When enabled, 
this feature reduces operating system overhead by 30-50%. Use with caution.

EIDE_32BIT=3 enables (E)IDE 32-bit I/O support to an interface card.

LOOKAHEAD=1 enables drive read-lookahead.

EXTRA_PARAMS= specifies where extra parameters can be added.

/etc/sysconfig/hwconf
The /etc/sysconfig/hwconf file lists all the hardware that kudzu detected on your system, as well as 
the drivers used, vendor ID and device ID information. The kudzu program detects and configures new and/or 
changed hardware on a system. The /etc/sysconfig/hwconf file is not meant to be manually edited. 
If you do edit it, devices could suddenly show up as being added or removed.

/etc/sysconfig/i18n
The /etc/sysconfig/i18n file sets the default language, such as:

LANG="en_US"

/etc/sysconfig/identd
The /etc/sysconfig/identd file is used to pass arguments to the identd daemon at boot time. 
The identd daemon returns the username of processes with open TCP/IP connections. Some services on 
the network, such as FTP and IRC servers, will complain and cause slow responses if identd is not running. 
But in general, identd is not a required service, so if security is a concern, you should not run it. 
For more information about what parameters you can use in this file, type man identd. By default, 
the file contains no parameters.

/etc/sysconfig/init
The /etc/sysconfig/init file controls how the system will appear and function during the boot process.

The following values may be used:

BOOTUP=<value>, where <value> is one of the following:

BOOTUP=color means the standard color boot display, where the success or failure of devices and services starting up is shown in different colors.

BOOTUP=verbose means an old style display, which provides more information than purely a message of success or failure.

Anything else means a new display, but without ANSI-formatting.

RES_COL=<value>, where <value> is the number of the column of the screen to start status labels. Defaults to 60.

MOVE_TO_COL=<value>, where <value> moves the cursor to the value in the RES_COL line. Defaults to ANSI sequences output by echo -e.

SETCOLOR_SUCCESS=<value>, where <value> sets the color to a color indicating success. Defaults to ANSI sequences output by echo -e, setting the color to green.

SETCOLOR_FAILURE=<value>, where <value> sets the color to a color indicating failure. Defaults to ANSI sequences output by echo -e, setting the color to red.

SETCOLOR_WARNING=<value>, where <value> sets the color to a color indicating warning. Defaults to ANSI sequences output by echo -e, setting the color to yellow.

SETCOLOR_NORMAL=<value>, where <value> sets the color to 'normal'. Defaults to ANSI sequences output by echo -e.

LOGLEVEL=<value>, where <value> sets the initial console logging level for the kernel. The default is 7; 8 means everything (including debugging); 1 means nothing except kernel panics. syslogd will override this once it starts.

PROMPT=<value>, where <value> is one of the following boolean values:

yes - Enables the key check for interactive mode.

no - Disables the key check for interactive mode.

/etc/sysconfig/ipchains
The /etc/sysconfig/ipchains file contains information used by the kernel to set up ipchains packet filtering rules at boot time or whenever the service is started.

This file is modified by typing the command /sbin/service ipchains save when valid ipchains rules are in place. You should not manually edit this file. Instead, use the /sbin/ipchains command to configure the necessary packet filtering rules and then save the rules to this file using /sbin/service ipchains save.

Use of ipchains to set up firewall rules is not recommended as it is deprecated and may disappear from future releases of Red Hat Linux. If you need a firewall, you should use iptables instead.

/etc/sysconfig/iptables
Like /etc/sysconfig/ipchains, the /etc/sysconfig/iptables file stores information used by the kernel to set up packet filtering services at boot time or whenever the service is started.

You should not modify this file by hand unless you are familiar with how to construct iptables rules. The simplest way to add rules is to use the /usr/sbin/lokkit command or the gnome-lokkit graphical application to create your firewall. Using these applications will automatically edit this file at the end of the process.

If you wish, you can manually create rules using /sbin/iptables and then type /sbin/service iptables save to add the rules to the /etc/sysconfig/iptables file.

Once this file exists, any firewall rules saved there will persist through a system reboot or a service restart.

For more information on iptables see Chapter 13.

/etc/sysconfig/irda
The /etc/sysconfig/irda file controls how infrared devices on your system are configured at startup.

The following values may be used:

IRDA=<value>, where <value> is one of the following boolean values:

yes - irattach will be run, which periodically checks to see if anything is trying to connect to the infrared port, such as another notebook computer trying to make a network connection. For infrared devices to work on your system, this line must be set to yes.

no - irattach will not be run, preventing infrared device communication.

DEVICE=<value>, where <value> is the device (usually a serial port) that handles infrared connections.

DONGLE=<value>, where <value> specifies the type of dongle being used for infrared communication. This setting exists for people who use serial dongles rather than real infrared ports. A dongle is a device that is attached to a traditional serial port to communicate via infrared. This line is commented out by default because notebooks with real infrared ports are far more common than computers with add-on dongles.

DISCOVERY=<value>, where <value> is one of the following boolean values:d

yes - Starts irattach in discovery mode, meaning it actively checks for other infrared devices. This needs to be turned on for the machine to be actively looking for an infrared connection (meaning the peer that does not initiate the connection).

no - Does not start irattach in discovery mode.

/etc/sysconfig/keyboard
The /etc/sysconfig/keyboard file controls the behavior of the keyboard. The following values may be used:

KEYBOARDTYPE=sun|pc, which is used on SPARCs only. sun means a Sun keyboard is attached on /dev/kbd, and pc means a PS/2 keyboard connected to a PS/2 port.

KEYTABLE=<file>, where <file> is the name of a keytable file.

For example: KEYTABLE="us". The files that can be used as keytables start in /lib/kbd/keymaps/i386 and branch into different keyboard layouts from there, all labeled <file>.kmap.gz. The first file found beneath /lib/kbd/keymaps/i386that matches the KEYTABLE setting is used.

/etc/sysconfig/kudzu
The /etc/sysconfig/kuzdu allows you to specify a safe probe of your system's hardware by kudzu at boot time. A safe probe is one that disables serial port probing.

SAFE=<value>, where <value> is one of the following:

yes - kuzdu does a safe probe.

no - kuzdu does a normal probe.

/etc/sysconfig/mouse
The /etc/sysconfig/mouse file is used to specify information about the available mouse. The following values may be used:

FULLNAME=<value>, where <value> refers to the full name of the kind of mouse being used.

MOUSETYPE=<value>, where <value> is one of the following:

microsoft - A MicrosoftT mouse.

mouseman - A MouseManT mouse.

mousesystems - A Mouse SystemsT mouse.

ps/2 - A PS/2 mouse.

msbm - A MicrosoftT bus mouse.

logibm - A LogitechT bus mouse.

atibm - An ATIT bus mouse.

logitech - A LogitechT mouse.

mmseries - An older MouseManT mouse.

mmhittab - An mmhittab mouse.

XEMU3=<value>, where <value> is one of the following boolean values:

yes - The mouse only has two buttons, but three mouse buttons should be emulated.

no - The mouse already has three buttons.




XMOUSETYPE=<value>, where <value> refers to the kind of mouse used when X is running. The options here are the same as the MOUSETYPE setting in this same file.

DEVICE=<value>, where <value> is the mouse device.

In addition, /dev/mouse is a symbolic link that points to the actual mouse device.

/etc/sysconfig/named
The /etc/sysconfig/named file is used to pass arguments to the named daemon at boot time. The named daemon is a Domain Name System (DNS) server which implements the Berkeley Internet Name Domain (BIND) version 9 distribution. This server maintains a table of which hostnames are associated with IP addresses on the network.

Currently, only the following values may be used:

ROOTDIR="</some/where>", where </some/where> refers to the full directory path of a configured chroot environment under which named will run. This chroot environment must first be configured. Type info chroot for more information on how to do this.

OPTIONS="<value>", where <value> any option listed in the man page for named except -t. In place of -t, use the ROOTDIR line above instead.

For more information about what parameters you can use in this file, type man named. For detailed information on how to configure a BIND DNS server, see Chapter 16. By default, the file contains no parameters.

/etc/sysconfig/netdump
The /etc/sysconfig/netdump file is the configuration file for the /etc/init.d/netdump service. The netdump service sends both oops data and memory dumps over the network. In general, netdump is not a required service, so you should only run it if you absolutely need to. For more information about what parameters you can use in this file, type man netdump.

/etc/sysconfig/network
The /etc/sysconfig/network file is used to specify information about the desired network configuration. The following values may be used:

NETWORKING=<value>, where <value> is one of the following boolean values:

yes - Networking should be configured.

no - Networking should not be configured.

HOSTNAME=<value>, where <value> should be the Fully Qualified Domain Name (FQDN), such as hostname.domain.com, but can be whatever hostname you want.

 Note 
  For compatibility with older software that people might install (such as trn), the /etc/HOSTNAME file should contain the same value as here.
 

GATEWAY=<value>, where <value> is the IP address of the network's gateway.

GATEWAYDEV=<value>, where <value> is the gateway device, such as eth0.

NISDOMAIN=<value>, where <value> is the NIS domain name.

/etc/sysconfig/ntpd
The /etc/sysconfig/ntpd file is used to pass arguments to the ntpd daemon at boot time. The ntpd daemon sets and maintains the system clock to synchronize with an Internet standard time server. It implements version 4 of the Network Time Protocol (NTP). For more information about what parameters you can use in this file, point a browser at the following file: /usr/share/doc/ntp-<version>/ntpd.htm (where <version> is the version number of ntpd). By default, this file sets the owner of the ntpd process to the user ntp.

/etc/sysconfig/pcmcia
The /etc/sysconfig/pcmcia file is used to specify PCMCIA configuration information. The following values may be used:

PCMCIA=<value>, where <value> is one of the following:

yes - PCMCIA support should be enabled.

no - PCMCIA support should not be enabled.

PCIC=<value>, where <value> is one of the following:

i82365 - The computer has an i82365-style PCMCIA socket chipset.

tcic - The computer has a tcic-style PCMCIA socket chipset.

PCIC_OPTS=<value>, where <value> is the socket driver (i82365 or tcic) timing parameters.

CORE_OPTS=<value>, where <value> is the list of pcmcia_core options.

CARDMGR_OPTS=<value>, where <value> is the list of options for the PCMCIA cardmgr (such as -q for quiet mode; -m to look for loadable kernel modules in the specified directory, and so on). Read the cardmgr man page for more information.

/etc/sysconfig/radvd
The /etc/sysconfig/radvd file is used to pass arguments to the radvd daemon at boot time. The radvd daemon listens to for router requests and sends router advertisements for the IP version 6 protocol. This service allows hosts on a network to dynamically change their default routers based on these router advertisements. For more information about what parameters you can use in this file, type man radvd. By default, this file sets the owner of the radvd process to the user radvd.

/etc/sysconfig/rawdevices
The /etc/sysconfig/rawdevices file is used to configure raw device bindings, such as:

/dev/raw/raw1 /dev/sda1
/dev/raw/raw2 8 5

 

/etc/sysconfig/redhat-config-users
The /etc/sysconfig/redhat-config-users file is the configuration file for the graphical application, User Manager. Under Red Hat Linux 8.0 this file is used to filter out system users such as root, daemon, or lp. This file is edited by the Preferences => Filter system users and groups pull-down menu in the User Manager application and should not be edited by hand. For more information on using this application, see the chapter called User and Group Configuration in the Official Red Hat Linux Customization Guide.

/etc/sysconfig/redhat-logviewer
The /etc/sysconfig/redhat-logviewer file is the configuration file for the graphical, interactive log viewing application, Log Viewer. This file is edited by the Edit => Preferences pull-down menu in the Log Viewer application and should not be edited by hand. For more information on using this application, see the chapter called Log Files in the Official Red Hat Linux Customization Guide.

/etc/sysconfig/samba
The /etc/sysconfig/samba file is used to pass arguments to the smbd and the nmbd daemons at boot time. The smbd daemon offers file sharing connectivity for Windows clients on the network. The nmbd daemon offers NetBIOS over IP naming services. For more information about what parameters you can use in this file, type man smbd. By default, this file sets smbd and nmbd to run in daemon mode.

/etc/sysconfig/sendmail
The /etc/sysconfig/sendmail file allows messages to be sent to one or more recipients, routing the message over whatever networks are necessary. The file sets the default values for the Sendmail application to run. Its default values are to run as a background daemon, and to check its queue once an hour in case something has backed up.

The following values may be used:

DAEMON=<value>, where <value> is one of the following boolean values:

yes - Sendmail should be configured to listen to port 25 for incoming mail. yes implies the use of Sendmail's -bd options.

no - Sendmail should not be configured to listen to port 25 for incoming mail.

QUEUE=1h which is given to Sendmail as -q$QUEUE. The -q option is not given to Sendmail if /etc/sysconfig/sendmail exists and QUEUE is empty or undefined.

/etc/sysconfig/soundcard
The /etc/sysconfig/soundcard file is generated by sndconfig and should not be modified. The sole use of this file is to determine what card entry in the menu to pop up by default the next time sndconfig is run. Sound card configuration information is located in the /etc/modules.conf file.

It may contain the following:

CARDTYPE=<value>, where <value> is set to, for example, SB16 for a Soundblaster 16 sound card.

/etc/sysconfig/squid
The /etc/sysconfig/squid file is used to pass arguments to the squid daemon at boot time. The squid daemon is a proxy caching server for Web client applications. For more information on configuring a squid proxy server, use a Web browser to open the /usr/share/doc/squid-<version>/ directory (replace <version> with the squid version number installed on your system). By default, this file sets squid top start in daemon mode and sets the amount of time before it shuts itself down.

/etc/sysconfig/tux
The /etc/sysconfig/tux file is the configuration file for the Red Hat Content Accelerator (formerly known as TUX), the kernel-based web server. For more information on configuring the Red Hat Content Accelerator, use a Web browser to open the /usr/share/doc/tux-<version>/tux/index.html (replace <version> with the version number of TUX installed on your system). The parameters available for this file are listed in /usr/share/doc/tux-<version>/tux/parameters.html.

/etc/sysconfig/ups
The /etc/sysconfig/ups file is used to specify information about any Uninterruptible Power Supplies (UPS) connected to your system. A UPS can be very valuable for a Red Hat Linux system because it gives you time to correctly shut down the system in the case of power interruption. The following values may be used:

SERVER=<value>, where <value> is one of the following:

yes - A UPS device is connected to your system.

no - A UPS device is not connected to your system.

MODEL=<value>, where <value> must be one of the following or set to NONE if no UPS is connected to the system:

apcsmart - For a APC SmartUPST or similar device.

fentonups - For a Fenton UPST.

optiups - For an OPTI-UPST device.

bestups - For a Best PowerT UPS.

genericups - For a generic brand UPS.

ups-trust425+625 - For a TrustT UPS.

DEVICE=<value>, where <value> specifies where the UPS is connected, such as /dev/ttyS0.

OPTIONS=<value>, where <value> is a special command that needs to be passed to the UPS.

/etc/sysconfig/vncservers
The /etc/sysconfig/vncservers file configures the way the Virtual Network Computing (VNC) server starts up.

VNC is a remote display system which allows you to view a desktop environment not only on the machine where it is running but across different networks on a variety of architectures.

It may contain the following:

VNCSERVERS=<value>, where <value> is set to something like "1:fred", to indicate that a VNC server should be started for user fred on display :1. User fred must have set a VNC password using vncpasswd before attempting to connect to the remote VNC server.

Note that when you use a VNC server, your communication with it is unencrypted, and so it should not be used on an untrusted network. For specific instructions concerning the use of SSH to secure the VNC communication, please read the information found at http://www.uk.research.att.com/vnc/sshvnc.html. To find out more about SSH, see Chapter 9 or Official Red Hat Linux Customization Guide.

/etc/sysconfig/xinetd
The /etc/sysconfig/xinetd file is used to pass arguments to the xinetd daemon at boot time. 
The xinetd daemon starts programs that provide Internet services when a request to the port for that service 
is received. For more information about what parameters you can use in this file, type man xinetd. 
For more information on the xinetd service, see the Section called Access Control Using xinetd in Chapter 8.

Directories in the /etc/sysconfig/ Directory
The following directories are normally found in /etc/sysconfig/ and a basic description of what they contain:

apm-scripts - This contains the Red Hat APM suspend/resume script. You should not edit this file directly. If you need customization, simple create a file called /etc/sysconfig/apm-scripts/apmcontinue and it will be called at the end of the script. Also, you can control the script by editing /etc/sysconfig/apmd.

cbq - This directory contains the configuration files needed to do Class Based Queuing for bandwidth management on network interfaces.

networking - This directory is used by the Network Administration Tool (redhat-config-network) and its contents should not be edited manually. For more information about configuring network interfaces using the Network Administration Tool, see the chapter called Network Configuration in the Official Red Hat Linux Customization Guide.

network-scripts - This directory contains the following network-related configuration files:

Network configuration files for each configured network interface, such as ifcfg-eth0 for the eth0 Ethernet interface.

Scripts used to bring up and down network interfaces, such as ifup and ifdown.

Scripts used to bring up and down ISDN interfaces, such as ifup-isdn and ifdown-isdn

Various shared network function scripts which should not be edited directly.

For more information on the network-scripts directory, see Chapter 12

rhn - This directory contains the configuration files and GPG keys for the Red Hat Network. No files in this directory should be edited by hand. For more information on the Red Hat Network, see the Red Hat Network website at the following URL: https://rhn.redhat.com.




39.5 More on AIX kernel parameters:
-----------------------------------

Througout this document, you can find many AIX kernel parameter statements.
Most commands are related to retrieving or changing attributes on the sys0 object.

Please see section 9.2 for a complete description.

For example, take a look at the following example:

  maxuproc:    Specifies the maximum number of processes per user ID. 
  Values:      Default: 40; Range: 1 to 131072 
  Display:     lsattr -E -l sys0 -a maxuproc 
  Change:      chdev -l sys0 -a maxuproc=NewValue 
               Change takes effect immediately and is preserved over boot. If value is reduced, 
               then it goes into effect only after a system boot. 
  Diagnosis:   Users cannot fork any additional processes. 
  Tuning:      This is a safeguard to prevent users from creating too many processes. 


Kernel Tunable Parameters
Following are kernel parameters, grouped into the following sections:

-Scheduler and Memory Load Control Tunable Parameters 
-Virtual Memory Manager Tunable Parameters 
-Synchronous I/O Tunable Parameters 
-Asynchronous I/O Tunable Parameters 
-Disk and Disk Adapter Tunable Parameters 
-Interprocess Communication Tunable Parameters
-Scheduler and Memory Load Control Tunable Parameters
-Most of the scheduler and memory load control tunable parameters are fully described in the schedo man page. 
-The following are a few other related parameters:



40. NFS:
========

On Solaris:
-----------

NFS uses a number of deamons to handle its services. These services are initialized at startup
from the "/etc/init.d/nfs.server" and "/etc/init.d/nfs.client" startup scripts.

nfsd:		handles filesystem exporting and file access from remote systems
mountd:		handles mount requests from nfs clients. provides also info about which filesystems
		are mounted by which clients. use the showmount command to view this information.
lockd:		runs on nfs server and nfs clients and provides locking services
statd:		runs on nfs server and nfs clients and provides crash and recovery functions for lockd
rpcbind:	facilitates the initial connection between client and server
nfslogd:	provides logging 


On AIX:
-------

To start the NFS daemons for each system, whether client or Server, you can use either

# smitty mknfs
# mknfs -N  (or -B or -I)

The mknfs command configures the system to rum the NFS daemons. The command also adds an entry 
to the /etc/inittab file, so that the /etc/rc.nsf file is executed on system restart.

mknfs flags:

-B: adds an entry to the inittab and it also executes /etc/rc.nsf to start the daemons now.
-I: adds an entry to the inittab to execute rc.nfs at system restart.
-N: executes rc.nfs now to start the daemons.

The NFS daemons can be started individually or all at once. To start individual daemons, you can use
the System Resource Controller:

# startsrc -s daemon, like e.g. # startsrc -s nfsd

To start the complete nfs system:
(good command)

# startsrc -g nfs


Exporting NFS directories:

To export filesystems using smitty, follow this procedure:

1. Verify that NFS is already running using the command "lssrc -g nfs". The output should indicate
that the nfsd and rpc.mountd daemons are active.

# lssrc -g nfs
Subsystem           Group         PID        Status
biod                nfs           1234       active
nfsd                nfs           5678       active
rpc.mountd          nfs           9101       active
rpc.statd           nfs           1213       active
rpc.lockd           nfs           1516       active

2. To export the dirctory use either

# smitty mknfsexp       or
# mknfsexp              or
# edit the /etc/exports file, like for example
  vi /etc/exports

  /home1
  /home2
  etc..

  



41. NETWORK COMMANDS AND FILES:
===============================


41.1 SOLARIS:
=============

ifconfig:
---------

ifconfig enables or disables a network interface, sets its IP address, subnet mask, and sets
various other options.

syntax: 
ifconfig interface address options .. up

Examples:

# ifconfig -a
Displays the systems IP address and mac address.

# ifconfig en0 128.138.240.1 netmask 255.255.255 up
# ifconfig lo0 127.0.0.1 up
# ifconfig en0 128.138.243.151 netmask 255.255.255.192 broadcast 128.138.243.191 up

An identifier as en0 identifies the network interface to which the command applies.
Some common names are ie0, le0, le1, en0, we0, qe0, hme0, eth0, lan0, lo0

Under Solaris, network interfaces must be attached with "ifconfig interface plumb"
before they become configurable.

rpcinfo:
--------

This utility can list all registered RPC services running on a system, for example 

# rpcinfo -p 192.168.1.21

You can also unregister an rpc service using the -d option, for example

#rpcinfo -d sprayd 1  

which would stop spayd


route:
------

The route command defines static routes.

Syntax:
route [-f] add/delete destination gateway [hop-count]

# route add default gateway_ipaddress


files:
------

- /etc/hostname.interface
The file contains the hostname or IP address associated with the networkinterface.
Suppose the system is called system1 and the interface is le0
then the file would be "hostname.le0" and contains the entry "system1".

- /etc/nodename
The file should contain one entry: the hostname of the local machine.

- /etc/defaultdomain
The file is present if the network uses a name service. The file should contain
one entry: the fully qualified Domain name of the administrative domain to which
the local host belongs.

- /etc/inet/hosts or /etc/hosts
This is the well known local hosts file, which resolves names to IP addresses.
The /etc/hosts is a symbolic link to /etc/inet/hosts.

- /etc/defaultrouter
This file should contain an entry for each router directly connected to the network.

- /etc/inetd.conf
The inetd deamon runs on behalf of other networkservices. It starts the appropriate server process
when a request for that service is received. The /etc/inetd.conf file lists the services that
inetd is to provide

- /etc/services
This file lists the well known ports.

- /etc/hosts.equiv
This file contains a list of trusted hosts for a remote system, one per line.
It has the following structure:
system1
system2 user_a

If the user attemps to login remotely by using rlogin from one of the hosts listed
in this file, the system allows the user to login without a password.

~/.rhosts

This file is the user equivalent of /etc/hosts.equiv file. This is normally regarded as a security hole.
This file could be found in a user home directory. It could contain the name of a remote host
that want, for example, copy files to this host.

- /etc/resolv.conf

Create or edit /etc/resolv.conf

Here you tell it three things:

 What domain we're in 
 Specify any additional search domains 
 What the nameservers are (it will use them in the order you put them in the file) 
 When you're done it should look something like this:


# cat resolv.conf
domain yourdomain.com
search yourdomain.com
search client1.com
nameserver 192.168.0.9
nameserver 192.168.0.11



41.2 AIX:
=========

41.2.1 Network initialization at boot:
------------------------------------

At IPL time, the init process will run the /etc/rc.tcpip after starting the SRC.
This is so because in /etc/inittab the following record is present:

rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons

The /etc/rc.tcpip file is a shell script that uses SRC commands to initialize selected deamons.
It can also be executed at any time from the command line.
These deamons are:

inetd (started by default),gated,routed,named,timed,rwhod

There are also deamons specific to the bos or to other applications that can be started through
the rc.tcpip file. These deamons are lpd, portmap, sendmail, syslogd (started by default)

The subsystems started from rc.tcpip can be stopped and restarted using the stopsrc and startsrc commands.

Example:
# stopsrc -s inetd

To configure tcp/ip use the command

# mktcpip

or use smitty

# smitty mktcpip (only for the first time)
# smitty tcpip
# smitty inet  OR smitty chgenet (for configuring the network interface)
# smitty configtcp (many advanced options)

or use the Web-based System manager. 

Smitty uses a number of screens to guide you through the process, As an example of the command, take
a look at the following example:

# mktcpip -h server1 -a 10.10.10.5 -m 255.255.255.0 -i en0 \
-n 10.10.10.254 -d abc.xyz.nl -g 10.10.10.254 -s -C -A no

If you need to further configure your network, use

# smitty configtcp


41.2.2 resolving hostnames and /etc/netsvc.conf:
-----------------------------------------

The default order in resolving host names is: 

- BIND/DNS (named) 
- Network Information Service (NIS) 
- Local /etc/hosts file 

The default order can be overwritten by creating the configuration file, /etc/netsvc.conf and specifying 
the desired order. Both the default and /etc/netsvc.conf can be overwritten with the environment variable NSORDER. 

You can override the order by creating the /etc/netsvc.conf file with an entry. 
If /etc/netsvc.conf does not exist, it will be just like you have the following entry: 

hosts = bind,nis,local

You can override the order by changing the NSORDER environment variable. If it is not set, 
it will be just like you have issued the command: 

export NSORDER=bind,nis,local


the /etc/resolv.conf file:
--------------------------

If you use name services, you can provide the minimal information needed through the mktcpip command.
Typically, the "/etc/resolv.conf" file stores your domain name and name server ip addresses.
The mktcpip command creates or updates the /etc/resolv.conf file for you.


41.2.3 Adapter:
---------------

When an adapter is added to the system, a logical device is created in the ODM, for example 
Ethernet adapters as follows:

# lsdev -Cc adapter | grep ent
ent0   Available 10-80   IBM PCI Ethernet Adapter (22100020)
ent1   Available 20-60   Gigabit Ethernet-SX PCI Adapter (14100401)


So you will have an adapter, and a corresponding interface, like for example
The Adapter is       : ent0
Then the interface is: en0

To list all interfaces on the system, use:

# lsdev -Cc if
en0 Defined   10-80  Standard Ethernet Network Interface
en1 Defined   20-60  Standard Ethernet Network Interface
et0 Defined   10-80  IEEE 802.3 Ethernet Network INterface
et1 Defined   20-60  IEEE 802.3 Ethernet Network INterface
lo0 Available        Loopback Network INterface

A corresponding network interface will allow tcpip to use the adapter.
Most of the time, we will deal with auto-detectable adapters, but in some cases an interface might 
need to be created manually with
# smitty inet  or   smitty mkinet

To change or view attributes like duplex settings, use
# smitty chgenet 

more info:

An Ethernet can have 2 interfaces: Standard ethernet (enX) or IEEE 802.3 (etX). X is the same number 
in the entX adapter name, like for example ent0 and en0. Only one of these interfaces can be using 
TCPIP at a time. The adapter ent0 can have en0 and et0 interfaces.
An ATM adapter (atmX) can have only one atm interface (atX). For example ATM adapter atm0 has an at0 interface.


41.2.4 Other stuff:
-------------------

iptrace:
--------

The iptrace command can be used to record the packets that are exchanged on an interface to and from
a remote host. This is like a Solaris snoop facility.

Examples

  1. To start the iptrace daemon with the System Resource Controller (SRC),
     enter:
     startsrc -s iptrace -a "/tmp/nettrace"

     To stop the iptrace daemon with SRC enter the following:
     stopsrc -s iptrace

  2. To record packets coming in and going out to any host on every interface,
     enter the command in the following format:

     iptrace /tmp/nettrace

     The recorded packets are received on and sent from the local host. All
     packet flow between the local host and all other hosts on any interface is
     recorded. The trace information is placed into the /tmp/nettrace file.
  3. To record packets received on an interface from a specific remote host,
     enter the command in the following format:

     iptrace - i en0 -p telnet -s airmail /tmp/telnet.trace

     The packets to be recorded are received on the en0 interface, from remote
     hostairmail, over the telnet port. The trace information is placed into the
     /tmp/telnet.trace file.
  4. To record packets coming in and going out from a specific remote host,
     enter the command in the following format:

     iptrace -i en0 -s airmail -b /tmp/telnet.trace

     The packets to be recorded are received on the en0 interface, from remote
     hostairmail. The trace information is placed into the /tmp/telnet.trace
     file.



Adding routes:
--------------

Use smitty mkroute
or use the route add command, like for example:

# route add -net 192.168.1 -netmask 255.255.255.0 9.3.1.124

Changing the IP Address:
------------------------

You can check the interfaces whether they have IP addresses asigned to them with
# ifconfig -a
# ifconfig <interface>

Changing the IP adress:

# smitty mktcpip
# smitty chinet

or use the ifconfig command, like for example:

# ifconfig tr0 up                                   # activate interface
# ifconfig tr0 down                                 # deactivate interface
# ifconfig tr0 detach                               # removes the interface
# ifconfig tr0                                      # put it back again
# ifconfig tr0 delete                               # delete the IP address
# ifconfig en0 10.1.2.3 netmask 255.255.255.0 up    # configure IP params on the interface

You can even use the chdev command like:

# chdev -l en0 -a netaddr='9.3.240.58' -a netmask='255.255.255.0'

Smitty and chdev will update the ODM database, and makes changes permanent, while ifconfig commands will not.


host.equiv and .rhost files:
----------------------------

- /etc/hosts.equiv
This file contains a list of trusted hosts for a remote system, one per line.
It has the following structure:
system1
system2 user_a

If the user attemps to login remotely by using rlogin from one of the hosts listed
in this file, the system allows the user to login without a password.

~/.rhosts

This file is the user equivalent of /etc/hosts.equiv file. This is normally regarded as a security hole.

For example, to allow all the users on the host toaster and machine to login to the local host,
you would have a host.equiv file like

toaster
starboss

To allow only the user bob to login from starboss, you would have

toaster
starboss bob

To allow the user lester to login from any host, you would have

toaster
starboss bob
+ lester

Show statistics and collisions of an interface:
-----------------------------------------------

# entstat -d en0

This command shows Media speed and that kind of stuff etc..


Check the current routing table:
--------------------------------

# netstat -nr

Add or change routes can be done by using "smitty mkroute".

If your system is going to be configured as a static router (it has 2 or more network interface cards),
then it needs to be enabled as a router by the no command, that is the network option command, for example

# no -o ipforwarding=1

note:
-----

The no command is used to configure network attributes. The no commands sets or displays current 
network attributes in the kernel. It will only operate on the currently running kernel.
Whether the commands sets or displays an attribute is determined by the accompanying flag:
the -o flag performs both actions.

Some examples:

# no -o thewall=3072
# no -o tcp_sendspace=16384
# no -o ipqmaxlen=512       (controls the number of incoming packets that can exists on the IP interrupt queue)


# no -a

                 arpqsize = 12
               arpt_killc = 20
              arptab_bsiz = 7
                arptab_nb = 149
                bcastping = 0
      clean_partial_conns = 1
                 delayack = 0
            delayackports = {}
         dgd_packets_lost = 3
            dgd_ping_time = 5
           dgd_retry_time = 5
       directed_broadcast = 0
         extendednetstats = 0
                 fasttimo = 200
        icmp6_errmsg_rate = 10
          icmpaddressmask = 0
ie5_old_multicast_mapping = 0
                   ifsize = 256
          inet_stack_size = 16
               ip6_defttl = 64
                ip6_prune = 1
            ip6forwarding = 0
       ip6srcrouteforward = 0
       ip_ifdelete_notify = 0
                 ip_nfrag = 200
             ipforwarding = 0
                ipfragttl = 2
        ipignoreredirects = 1
                ipqmaxlen = 100
          ipsendredirects = 1
        ipsrcrouteforward = 0
           ipsrcrouterecv = 0
           ipsrcroutesend = 0
          llsleep_timeout = 3
                  lo_perf = 1
                lowthresh = 90
                 main_if6 = 0
               main_site6 = 0
                 maxnip6q = 20
                   maxttl = 255
                medthresh = 95
               mpr_policy = 1
              multi_homed = 1
                nbc_limit = 891289
            nbc_max_cache = 131072
            nbc_min_cache = 1
         nbc_ofile_hashsz = 12841
                 nbc_pseg = 0
           nbc_pseg_limit = 1048576
           ndd_event_name = {all}
        ndd_event_tracing = 0
            ndp_mmaxtries = 3
            ndp_umaxtries = 3
                 ndpqsize = 50
                ndpt_down = 3
                ndpt_keep = 120
               ndpt_probe = 5
           ndpt_reachable = 30
             ndpt_retrans = 1
             net_buf_size = {all}
             net_buf_type = {all}
        net_malloc_police = 0
           nonlocsrcroute = 0
                 nstrpush = 8
              passive_dgd = 0
         pmtu_default_age = 10
              pmtu_expire = 10
 pmtu_rediscover_interval = 30
              psebufcalls = 20
                 psecache = 1
             pseintrstack = 24576
                psetimers = 20
           rfc1122addrchk = 0
                  rfc1323 = 0
                  rfc2414 = 1
             route_expire = 1
          routerevalidate = 0
                 rto_high = 64
               rto_length = 13
                rto_limit = 7
                  rto_low = 1
                     sack = 0
                   sb_max = 1048576
       send_file_duration = 300
              site6_index = 0
               sockthresh = 85
                  sodebug = 0
              sodebug_env = 0
                somaxconn = 1024
                 strctlsz = 1024
                 strmsgsz = 0
                strthresh = 85
               strturncnt = 15
          subnetsarelocal = 1
       tcp_bad_port_limit = 0
                  tcp_ecn = 0
       tcp_ephemeral_high = 65535
        tcp_ephemeral_low = 32768
             tcp_finwait2 = 1200
           tcp_icmpsecure = 0
          tcp_init_window = 0
    tcp_inpcb_hashtab_siz = 24499
              tcp_keepcnt = 8
             tcp_keepidle = 14400
             tcp_keepinit = 150
            tcp_keepintvl = 150
     tcp_limited_transmit = 1
              tcp_low_rto = 0
             tcp_maxburst = 0
              tcp_mssdflt = 1460
          tcp_nagle_limit = 65535
        tcp_nagleoverride = 0
               tcp_ndebug = 100
              tcp_newreno = 1
           tcp_nodelayack = 0
        tcp_pmtu_discover = 0
            tcp_recvspace = 16384
            tcp_sendspace = 16384
            tcp_tcpsecure = 0
             tcp_timewait = 1
                  tcp_ttl = 60
           tcprexmtthresh = 3
                  thewall = 1048576
         timer_wheel_tick = 0
       udp_bad_port_limit = 0
       udp_ephemeral_high = 65535
        udp_ephemeral_low = 32768
    udp_inpcb_hashtab_siz = 24499
        udp_pmtu_discover = 0
            udp_recvspace = 42080
            udp_sendspace = 9216
                  udp_ttl = 30
                 udpcksum = 1
                 use_isno = 1
           use_sndbufpool = 1



rcp command:
------------

Purpose
Transfers files between a local and a remote host or between two remote hosts.

Syntax

rcp [ -p] [ -F] [ -k realm ] { { User@Host:File | Host:File | File } 
    { User@Host:File | Host:File | File | User@Host:Directory | Host:Directory | Directory } | 
    [ -r] { User@Host:Directory | Host:Directory |Directory } { User@Host:Directory | Host:Directory | Directory } }

-r Recursively copies 

Description
The /usr/bin/rcp command is used to copy one or more files between the local host and a remote host, 
between two remote hosts, or between files at the same remote host.

Remote destination files and directories require a specified Host: parameter. If a remote host name is not 
specified for either the source or the destination, the rcp command is equivalent to the cp command. 
Local file and directory names do not require a Host: parameter

- Using Standard Authentication
The remote host allows access if one of the following conditions is satisfied:

The local host is included in the remote host /etc/hosts.equiv file and the remote user is not the root user. 
The local host and user name is included in a $HOME/.rhosts file on the remote user account.
Although you can set any permissions for the $HOME/.rhosts file, it is recommended that the permissions 
of the .rhosts file be set to 600 (read and write by owner only).

In addition to the preceding conditions, the rcp command also allows access to the remote host if the 
remote user account does not have a password defined. However, for security reasons, the use of a password 
on all user accounts is recommended.

- For Kerberos 5 Authentication
The remote host allows access only if all of the following conditions are satisfied:

The local user has current DCE credentials. 
The local and remote systems are configured for Kerberos 5 authentication (On some remote systems, 
this may not be necessary. It is necessary that a daemon is listening to the klogin port). 
The remote system accepts the DCE credentials as sufficient for access to the remote account. 
See the kvalid_user function for additional information.

Examples:

In the following examples, the local host is listed in the /etc/hosts.equiv file at the remote host.

- To copy a local file to a remote host, enter: 

# rcp localfile host2:/home/eng/jane
The file localfile from the local host is copied to the remote host host2.

- The following example uses rcp to copy the local file, YTD_sum from the directory /usr/reports 
on the local host to the file year-end in the directory /usr/acct on the remote host moon: 

# rcp /usr/reports/YTD_sum  moon:/usr/acct/year-end 

- To copy a remote file from one remote host to another remote host, enter:  

# rcp host1:/home/eng/jane/newplan host2:/home/eng/mary
The file /home/eng/jane/newplan is copied from remote host host1 to remote host host2.

- To send the directory subtree from the local host to a remote host and preserve the modification times and modes, 
enter: 
# rcp  -p  -r report jane@host2:report

The directory subtree report is copied from the local host to the home directory of user jane 
at remote host host2 and all modes and modification times are preserved. 
The remote file /home/jane/.rhosts includes an entry specifying the local host and user name. 

Note:
rcp is ofcourse used to copy files between unix systems. On nt/w2k/xp computers, rcp could be available
with some different syntax, like
rcp [{-a | -b}] [-h] [-r] [Host][.User:] [Source] [Host][.User:] [Path\Destination]


Notes on the FTP services:
==========================

Note 1:
=======

Have a look at '/usr/lpp/tcpip/samples/anon.ftp'. It is a shell script
and will set up a anonymous ftp site on your local RS/6000.  Note: the
ftpd that comes with AIX does not support the display messages every
time a user changes a directory or even when they login.

Note 2:
=======

ftpd Daemon
Purpose
Provides the server function for the Internet FTP protocol.

Syntax
Note: The ftpd daemon is normally started by the inetd daemon. It can also be controlled from the command line, 
using SRC commands.
/usr/sbin/ftpd [ -d ] [ -k ] [ -l ] [ -t TimeOut ] [ -T MaxTimeOut ] [ -s ] [ -u OctalVal ]


Description
The /usr/sbin/ftpd daemon is the DARPA Internet File Transfer Protocol (FTP) server process. The ftpd daemon 
uses the Transmission Control Protocol (TCP) to listen at the port specified with the ftp command service 
specification in the /etc/services file. 

Changes to the ftpd daemon can be made using the System Management Interface Tool (SMIT) or 
System Resource Controller (SRC), by editing the /etc/inetd.conf or /etc/services file. 
Entering ftpd at the command line is not recommended. The ftpd daemon is started by default when it is 
uncommented in the /etc/inetd.conf file.

The inetd daemon gets its information from the /etc/inetd.conf file and the /etc/services file.

- The ftpaccess.ctl file:

The /etc/ftpaccess.ctl file is searched for lines that start with allow:, deny:, readonly:, writeonly:, 
readwrite:, useronly:, grouponly:, herald: and/or motd:. Other lines are ignored. If the file doesn't exist, 
then ftp access is allowed for all hosts. The allow: and deny: lines are for restricting host access. 
The readonly:, writeonly: and readwrite: lines are for restricting ftp reads (get) and writes (put). 
The useronly: and grouponly: lines are for defining anonymous users. The herald: and motd: lines are 
for multiline messages before and after login.

- If the current authentication method is the Standard Operating system authentication method:
Before the ftpd daemon can transfer files for a client process, it must authenticate the client process. 
The ftpd daemon authenticates client processes according to these rules:

The user must have a password in the password database, /etc/security/passwd. 
(If the user's password is not null, the client process must provide that password.) 
The user name must not appear in the /etc/ftpusers file. 
The user's login shell must appear in the shells attribute of the /etc/security/login.cfg file. 
If the user name is anonymous, ftp or is a defined anonymous user in the /etc/ftpaccess.ctl file, 
an anonymous FTP account must be defined in the password file. In this case, the client process 
is allowed to log in using any password. By convention, the password is the name of the client host. 
The ftpd daemon takes special measures to restrict access by the client process to the anonymous account.


Note 3:
=======

FTP memory-to-memory transfer 
This is useful for testing network performance between two machines while eliminating 
disk I/O (1 GB transfer example):

ftp> bin
ftp> put "| dd if=/dev/zero bs=512k count=2000" /dev/null


Note 4:
=======

Subject:	ftp, anonymous setup, troubleshooting - hp

Document Text
Title	    : How to setup anonymous ftp, and troubleshooting ftp
Date	    : 970828
Type	    : EN
Document ID : A4786122

Problem Description

Can you explain the proper setup of anonymous FTP and how to
troubleshoot any problems?

Configuration Info

Operating System -HP-UX
    Version -10.10
Hardware System - HP 9000
    Series -K400

Solution

Verification and setup of services:

1.   Verify that the following line is in /etc/inetd.conf and not
     commented out (there should be no # in the first column):

     10.X:
	ftp	     stream tcp nowait root /usr/lbin/ftpd	ftpd

     9.X:
	ftp	     stream tcp nowait root /etc/ftpd		ftpd

     or
     netstat -a |grep ftp
     the output should look like:

     tcp      0	    0  *ftp.		    *.*

2.   Verify the following services are in /etc/services and not
     commented out (with no # in the first column):

     ftp-data	   20/tcp	     # File Transfer Protocol (Data)
     ftp	   21/tcp	     # File Transfer Protocol (Control)

    *Note: If you are using NIS (Network Information Services)
	   then verify on the master server that these services
	   are available, or do 'ypcat services |grep ftp'

Creation of anonymous FTP:

If possible use SAM to create anonymous ftp by entering SAM Areas:
Networking and Communications, and then Networking Services.  Select
the desired service then choose Actions and Enable.  If this method is
either undesirable or you are experiencing difficulties with SAM
then do the following steps:

1.   Create an ftp user in /etc/passwd:

     10.X:
	ftp:*:500:1:Anonymous FTP user:/home/ftp:/usr/bin/false

     9.X:
	ftp:*:500:1:Anonymous FTP user:/users/ftp:/bin/false

	*Note: If UID 500 is not available, use a UID that
	 is not currently being used.
	*Note: GID 1 is usually group 'other', verify that group 'other'

	 does exist, and match its group ID in this field.

2.   Create a home directory for the ftp user that is owned by ftp and
     has permissions set to 0555:

     10.X:
	mkdir /home/ftp
	chmod 555 /home/ftp
	chown ftp:other /home/ftp

     9.X:
	mkdir /users/ftp
	chmod 555 /users/ftp
	chown ftp:other /users/ftp

3.   Create a bin directory that is owned by root and has
     permissions set to	 0555:

     10.X:
	mkdir -p /home/ftp/usr/bin
	chmod 555 /home/ftp/usr/bin /home/ftp/usr
	chown root /home/ftp/usr/bin /home/ftp/usr

	*Note: ftp structure has changed from 9.X to 10.x, there is
	 no longer a /home/ftp/bin.  The bin directory was moved to
	 be under /home/ftp/usr:

     9.X:
	mkdir /users/ftp/bin
	chmod 555 /users/ftp/bin
	chown root /users/ftp/bin

4.   Copy 'ls' to the new bin directory with permissions set to 0111:

     10.X:
	cp /sbin/ls /home/ftp/usr/bin/ls
	chmod 111 /home/ftp/usr/bin/ls

     9.X:
	cp /bin/ls /users/ftp/bin/ls
	chmod 111 /users/ftp/bin/ls

5.   Create an etc directory that is owned by root and has permissions
     of 0555:

     10.X:
	mkdir /home/ftp/etc
	chmod 555 /home/ftp/etc
	chown root /home/ftp/etc

     9.X:
	mkdir /users/ftp/etc
	chmod 555 /users/ftp/etc
	chown root /users/ftp/etc

     This directory should contain versions of the files passwd and
     group.  These files must be owned by root and have
     permissions of 0444:

     10.X:
	cp /etc/passwd /etc/group /home/ftp/etc
	chown root /home/ftp/etc/passwd /home/ftp/etc/group
	chmod 444 /home/ftp/etc/passwd /home/ftp/etc/group

     9.X:
	cp /etc/passwd /etc/group /users/ftp/etc
	chown root /users/ftp/etc/passwd /users/ftp/etc/group
	chmod 444 /users/ftp/etc/passwd /users/ftp/etc/group

6.   OPTIONAL:
     Create a dist directory that is owned by root and has permissions
     of 755.  Superuser can put read-only files in this directory to
     make them available to anonymous ftp users.

     10.X:
	mkdir /home/ftp/dist
	chown root /home/ftp/dist
	chmod 755 /home/ftp/dist

     9.X:
	mkdir /users/ftp/dist
	chown root /users/ftp/dist
	chmod 755 /users/ftp/dist

7.   OPTIONAL:
     Create a pub directory that is owned by ftp and writable by all.
     Anonymous ftp users can put files in this directory to make them
     available to other anonymous ftp users.

     10.X:
	mkdir /home/ftp/pub
	chown ftp:other /home/ftp/pub
	chmod 777 /home/ftp/pub

     9.X:
	mkdir /users/ftp/pub
	chown ftp:other /users/ftp/pub
	chmod 777 /users/ftp/pub


Troubleshooting FTP:

1.   Verify the installation steps.

2.   If receiving message: ftp: connect: Connection refused.

     Verify that inetd is running by entering 'ps -ef|grep inetd'.
     You should see output like:

     root  3730	 2217  1 13:54:57 ttyp2	    0:00 grep inetd
     root  2324	    1  0 13:43:28 ?	    0:00 inetd

     *Note: You may not see the grep process.
     If inetd is not currently running, then as root type 'inetd'

3.   If receiving either message: 530 access denied login failed,
     or 530 User [name] access denied.

     A.	  Verify netrc. in the user's home directory.
	  If the netrc. file contains password or account information
	  for use other than for anonymous ftp, its owner must match
	  the effective user ID of the current process.	 Its read,

	  write, and execute permission bits for group and other must
	  all be zero, and it must be readable by its owner.
	  Otherwise, the file is ignored.

	  So if you are unsure about this file, rename it to netrc.old.
	  for troubleshooting purposes.

     B.	  Check /etc/ftpusers.
	  ftpd rejects remote logins to local user accounts that are
	  named in /etc/ftpusers.  Each restricted account name must
	  appear alone on a line in the file.  The line cannot contain
	  any white space.  User accounts that specify a restricted
	  login shell in /etc/passwd should be listed in /etc/ftpusers
	  because ftpd accesses local accounts without using their
	  login shells.

     C.	  You need to add or verify /etc/shells.
	  /etc/shells is an ASCII file containing a list of legal shells
	  on the system.  Each shell is listed in the file by its
	  absolute path name. To learn more about this file, run 'man
	  shells'.  To see the legal shells for your system run 'man
	  getusershell'.  This will list all valid shells for your
	  system.  If you use both 9.X and 10.X environments, include
	  the shells for both operating systems.

	  Example entries:

	  /bin/sh	   <<<-
	  /bin/rsh	       |
	  /bin/ksh	       |
	  /bin/rksh		> 9.X valid shells
	  /bin/csh	       |
	  /bin/pam	       |
	  /usr/bin/keysh       |
	  /bin/posix/sh	   <<<-

	  /sbin/sh	   <<<-
	  /usr/bin/sh	       |
	  /usr/bin/rsh	       |
	  /usr/bin/ksh		> 10.X valid shells
	  /usr/bin/rksh	       |
	  /usr/bin/csh	       |
	  /usr/bin/keysh   <<<-

	  All shells referred to in /etc/passwd or in the NIS passwd map
	  should be valid shells or links on this system and be listed
	  in /etc/shells.

4.   If receiving message: ftp: ftp/tcp: unknown service.

     Check your /etc/services file.  If you make a change to
     /etc/services, you must force the system to recognize the new
     changes by typing:
	  inetd -c

     Verify that permissions for /etc/services are 444 (-r--r--r--).

5.   If receiving message: 421 Service not available, remote server
     has closed connection.

     Verify that /var/adm/inetd.sec does not contain an ftp entry of
     either deny or allow.  When you allow one user, you deny all other
     users.  For troubleshooting purposes you could rename
     /var/adm/inetd.sec to /var/adm/inetd.sec.old.  inetd.sec is not
     needed unless you have a need for tightened security beyond login
     verification.

6.   If receiving message: 150 Opening ASCII mode data connection for
     /usr/bin/ls. crt0: ERROR couldn't open /usr/lib/dld.sl
     errno:000000002.

     You have the wrong version of the command ls in /home/ftp/usr/bin.
     To resolve this execute:
	  cp /sbin/ls /home/ftp/usr/bin/ls


Note 5:
=======

ftpd(1M), the file transfer protocol server, is run by the Internet daemon (see inetd(1M)) when a service request 
is received at the port indicated in /etc/services.

ftpd rejects remote logins to local user accounts named in /etc/ftpusers. Each restricted account name must appear 
by itself on a line in the file. The line cannot contain any spaces or tabs. User accounts with restricted 
login shells in /etc/passwd should be listed in /etc/ftpusers, because ftpd accesses local accounts without 
using their login shells. uucp accounts also should be listed in /etc/ftpusers. If /etc/ftpusers does not exist, 
ftpd skips the security check.


Note 6:
=======

On HP-UX:

Symptom: Some or all users can't ftp to an HP-UX system. 

If no users can ftp to a given system, check first of all that inetd is running on that system:

# ps -ef | grep inetd 
 
If inetd is not running, start it: 

It is also possible that the FTP service is disabled. Check /etc/inetd.conf for the following line: 
 
FTP stream tcp nowait root /usr/lbin/FTPd FTPd -l 

If this line does not exist, or is commented out (preceded by a pound sign, (#) add it (or remove the pound sign) 
and restart inetd:

# /usr/sbin/inetd -c 


Note 7:
=======

There are five files used to hold FTP configuration information. These files are listed here:

/etc/ftpd/ftpaccess       The primary configuration file defining the operation of the ftpd daemon. 
/etc/ftpd/ftpconversions  Defines options for compression/decompression and tar/untar operations.  
/etc/ftpd/ftphosts        Lets you allow/deny FTP account access according to source IP addresses and host names. 
/etc/ftpd/ftpusers        Restricts FTP access for specified users. For more information see ftpusers(4).
/etc/ftpd/ftpgroups       The group password file for use with the SITE GROUP and SITE GPASS commands.  


The /etc/ftpd/ftpaccess configuration file is the primary configuration file for defining how the 
ftpd daemon operates. It is not necessary to enable the ftpacess file inorder to run ftpd. 

The configuration files allow you to configure FTP features, such as the number of FTP login tries permitted, 
FTP banner displays, logging of incoming and outgoing file transfers, access permissions, 
use of regular expressions, etc. For complete details on these files, see the ftpaccess(4), ftpgroups(4), 
ftpusers(4), ftphosts(4), and ftpconversion(4) manpages.

- If the ftpaccess file is enabled:

Settings in the ftpaccess file override any similar settings in the other files.
Any settings in the other files that are not present in ftpaccess are treated as supplemental or additional 
configuration information.

- If the ftpaccess file is disabled:

The settings in the ftpusers, ftphosts, and ftpconversion files will be used.
The ftpgroups file will not be used.

Enabling/Disabling the /etc/ftpd/ftpaccess Configuration File 
 
-- To enable the /etc/ftpd/ftpaccess file, specify the -a option for the ftp entry in the /etc/inetd.conf file. 
For example, 

ftp  stream tcp nowait root /usr/lbin/ftpd ftpd -a -l -d
(The -l option logs all commands sent to the ftpd server into syslog. The -d option logs debugging information 
into syslog.)

-- To disable the /etc/ftpd/ftpaccess file, specify the -A option for the ftp entry in the /etc/inetd.conf file. 
For example,

ftp  stream tcp nowait root /usr/lbin/ftpd ftpd -A -L -d


Note 8: ftp commandline and batches:
------------------------------------

It can be interresting if you transfer a file with ftp from a scheduled script.
Here are some examples on how to do this:

Example 1:
----------

#!/usr/bin/ksh
ftp -v -n "YOUR.IP.ADD.RESS" << cmd
user "user" "passwd"
cd /distant/directory
lcd /local/directoryget ssh_install
get ( or put) your files
quit
cmd



Example 2:
----------

autounix.sh 

#!/bin/ksh 

# Declaring all the variables 
s_filepath='/sap/usr/sap/trans/data/' 
s_backuppath='/sap/usr/sap/trans/data/autozip/' 
s_unixfile1=$s_filepath'FILE1' 
s_unixfile2=$s_filepath'FILE2' 
s_unixfile3=$s_filepath'FILE3' 
  

# This has been changed to accepting parameter pass in as date 
#s_date=`date '+%Y%m%d'` 
s_date=$1 
s_filename='SAP.'$s_date'.ZIP' 
s_donefilename=$s_filename'.DONE' 

# Execute the zip command 

/usr/local/bin/pkzip -add -pass=test123 $s_backuppath$s_filename $s_unixfile1 $s_unixfile2 $s_unixfile3 

# Execute the FTP transfer 
user='ftp' 
passwd='ftp1234' 
destdir='data/test' 

cd $s_backuppath 
ftp -in ftp-out.sapservx.com << EndHere 
   user $user $passwd 
   cd $destdir 
   bin 
   put $s_filename 
   rename $s_filename $s_donefilename 
   quit 
EndHere 




41.3 Linux:
===========

Much of the above network related commands, like ifconfig, applies to Linux distro's as well.
But many items in sections 41.1 (Solaris) and 41.2 (AIX), is specific to those Operating Systems.
 
Here we describe some specifics for Linux.


41.3.1 About TCP Wrappers:
--------------------------

- What is it?

TCP wrappers and xinetd control access to services by hostname and IP addresses. In addition, these tools 
also include logging and utilization management capabilities that are easy to configure. 
TCP wrappers is installed by default with a server-class installation of Red Hat Linux 8.0, and provides 
access control to a variety of services. Most modern network services, such as SSH, Telnet, and FTP, 
make use of TCP wrappers, a program that is designed to stand guard between an incoming request 
and the requested service. 

The idea behind TCP wrappers is that client requests to server applications are "wrapped" by an 
authenticating service, allowing a greater degree of access control and logging for anyone attempting 
to use the service. 
The functionality behind TCP wrappers is provided by libwrap.a, a library that network services, 
such as xinetd, sshd, and portmap, are compiled against. Additional network services, even networking programs 
you may write, can be compiled against libwrap.a to provide this functionality. Red Hat Linux bundles 
the necessary TCP wrapper programs and library in the tcp_wrappers-<version> RPM file. 

- Host-Based Access Control Lists

Host-based access for services that use TCP wrappers is controlled by two files: 

/etc/hosts.allow and /etc/hosts.deny. 

These file use a simple format to control access to services on a server. 
If no rules are specified in either hosts.allow or hosts.deny, then the default rule is to allow anyone 
to access to the services. 
Order is important since rules in hosts.allow take precedence over rules specified in hosts.deny. 
Even if a rule specifically denying all access to a particular service is defined in hosts.deny, 
hosts specifically given access to the service in hosts.allow are allowed to access the service. 
In addition, all rules in each file take effect from the top down. 
Any changes to these files take effect immediately, so restarting services is not required. 

Formatting Rules
All access control rules are placed on lines within hosts.allow and hosts.deny, and any blank lines 
or lines that start with the comment character (#) are ignored. Each rule needs to be on its own line. 

The rules must be formatted in the following manner: 

<daemon_list>: <client_list>[: spawn <shell_command> ]
 
Patterns are particularly helpful when specifying groups of clients that may or may not access a certain service. 
By placing a "." character at the beginning of a string, all hosts that share the end of that string 
are applied to that rule. So, .domain.com would catch both system1.domain.com and system2.domain.com. 
The "." character at the end of a string has the same effect, except going the other direction. 
This is primarily used for IP addresses, as a rule pertaining to 192.168.0. would apply to the entire 
class C block of IP addresses. Netmask expressions can also be used as a pattern to control access to a 
particular group of IP addresses. You can even use asterisks (*) or question marks (?) to select entire 
groups of hostnames or IP addresses, so long as you do not use them in the same string as the other 
types of patterns. 

This access control "language" can be extended with the following wildcards. They may be used in the access 
control rules instead of using specific hosts or groups of hosts: 

ALL      - Matches every client with a service. To allow a client access to all services, 
           use the ALL in the daemons section. 
LOCAL    - Matches any host that does not contain a "." character. 
KNOWN    - Matches any host where the hostname and host address are known or where the user is known. 
UNKNOWN  - Matches any host where the hostname or host address are unknown or where the user is unknown. 
PARANOID - Matches any host where the hostname does not match the host address. 

You can use the above wildcards in combination with the EXCEPT operator.

Example:

# all domain.com hosts are allowed to connect
# to all services except cracker.domain.com
ALL: .domain.com EXCEPT cracker.domain.com

# 123.123.123.* addresses can use all services except FTP
ALL EXCEPT in.ftpd: 123.123.123.

Users that wish to prevent any hosts other than specific ones from accessing services usually place 
ALL: ALL in hosts.deny. Then, they place lines in hosts.allow, such as: 

in.telnetd: 10.0.1.24
in.ftpd: 10.0.1. EXCEPT 10.0.1.1
 
- Shell commands:

Beyond simply allowing or denying access to services for certain hosts, the TCP wrappers also supports 
the use of shell commands. These shell commands are most commonly used with deny rules to set up booby traps, 
which usually trigger actions that log information about failed attempts to a special file or email 
an administrator. Below is an example of a booby trap in the hosts.deny file which will write a log line 
containing the date and client information every time a host from the the IP range 10.0.1.0 to 10.0.1.255 
attempts to connect via Telnet: 

in.telnetd: 10.0.1.: spawn (/bin/echo `date` %c >> /var/log/telnet.log) &
 
The following expansions can be used:

%a - The client's IP address.
%A - The server's IP address.
%c - Supplies a variety of client information, such as the username and hostname, or the username and IP address. 
%d - The daemon process name.
%h - The client's hostname (or IP address, if the hostname is unavailable). 
%H - The server's hostname (or IP address, if the hostname is unavailable). 
%n - The client's hostname. If unavailable, unknown is printed.  
%N - The server's hostname. If unavailable, unknown is printed.  
%p - The daemon process ID.
%s - Various types of server information, such as the daemon process and the host or IP address of the server. 
%u - The client's username. If unavailable, unknown is printed. 


41.3.2 About xinetd:
--------------------

- Access Control Using xinetd
The benefits offered by TCP wrappers are enhanced when the libwrap.a library is used in conjunction 
with xinetd, a super-daemon that provides additional access, logging, binding, redirection and resource 
utilization control. 

Red Hat Linux configures a variety of popular network services to be used with xinetd, including FTP, 
IMAP, POP, and Telnet. When any of these services are accessed via their port numbers in /etc/services, 
the xinetd daemon handles the request. Before bringing up the requested network service, xinetd ensures 
that the client host information meets the access control rules, the number of instances of this service 
is under a particular threshold, and any other rules specified for that service or all xinetd services 
are followed. Once the target service is brought up for the connecting client, xinetd goes back to sleep, 
waiting for additional requests for the services it manages. 

- xinetd Configuration Files
The xinetd service is controlled by the "/etc/xinetd.conf" file, as well as the various service-specific 
files in the "/etc/xinetd.d/" directory. 
The xinetd.conf file is the parent of all xinetd-controlled service configuration files, as the 
service-specific files are also parsed every time xinetd starts. By default, xinetd.conf contains some basic 
configuration settings that apply to every service. Below is an example of a typical xinetd.conf: 

defaults
{
        instances               = 60
        log_type                = SYSLOG authpriv
        log_on_success          = HOST PID
        log_on_failure          = HOST
        cps                     = 25 30
}

includedir /etc/xinetd.d
 
- Files in the /etc/xinetd.d/ Directory
The files in the /etc/xinetd.d/ directory are read every time xinetd starts, due to the includedir 
/etc/xinetd.d/ statement at the bottom of /etc/xinetd.conf. These files, with names such as finger, 
ipop3, and rlogin, correlate to the services controlled by xinetd. 
The files in /etc/xinetd.d/ use the same conventions as /etc/xinetd.conf. The primary reason they are stored 
in separate configuration files is to make it easier to add and remove a service from xinetd without affecting 
other services. 

To get an idea of how these files are structured, consider the wu-ftp file: 

service ftp
{
        socket_type             = stream
        wait                    = no
        user                    = root
        server                  = /usr/sbin/in.ftpd
        server_args             = -l -a
        log_on_success          += DURATION USERID
        log_on_failure          += USERID
        nice                    = 10
        disable                 = yes
}
 

The first line defines the service's name. The lines within the brackets contain settings that define how this 
service is supposed to be started and used. The wu-ftp file states that the FTP service uses a 
stream socket type (rather than dgram), the binary executable file to use, the arguments to pass 
to the binary, the information to log in addition to the /etc/xinetd.conf settings, the priority with which 
to run the service, and more. 

The use of xinetd with a service also can serve as a basic level of protection from a 
Denial of Service (DoS) attack. The max_load option takes a floating point value to set a CPU usage 
threshold when no more connections for a particular service will be accepted, preventing certain services 
from overwhelming the system. The cps option accepts an integer value to set a rate limit on the number 
of connections available per second. Configuring this value to something low, such as 3, will help prevent 
attackers from being able to flood your system with too many simultaneous requests for a particular service. 

The xinetd host access control available through its various configuration files is different from 
the method used by TCP wrappers. While TCP wrappers places all of the access configuration within two files, 
/etc/hosts.allow and /etc/hosts.deny, each service's file in /etc/xinetd.d can contain access control rules 
based on the hosts that will be allowed to use that service. 

For example, the following /etc/xinetd.d/telnet file can be used to block telnet access to a system 
by a particular network group and restrict the overall time range that even legitimate users can log in: 

service telnet
{
        disable         = no
        flags           = REUSE
        socket_type     = stream
        wait            = no
        user            = root
        server          = /usr/sbin/in.telnetd
        log_on_failure  += USERID
        no_access       = 10.0.1.0/24
        log_on_success  += PID HOST EXIT
        access_times    = 09:45-16:15
}
 

In this example, when any system from the 10.0.1.0/24 subnet, such as 10.0.1.2, tries to telnet into the server, 
they will receive a message stating Connection closed by foreign host. In addition, their login attempt 
is logged in /var/log/secure.



41.3.3 Linux Network files:
---------------------------

- Network Scripts

Using Red Hat Linux, all network communications occur between configured interfaces and physical 
networking devices connected to the system. The different types of interfaces that exist are as varied 
as the physical devices they support. 
The configuration files for network interfaces and the scripts to activate and deactivate them are located in the 

"/etc/sysconfig/network-scripts/" directory. 

While the existence of interface files can differ from system to system, the three different types of files 
that exist in this directory, interface configuration files, interface control scripts, and network 
function files, work together to enable Red Hat Linux to use various network devices. 
This chapter will explore the relationship between these files and how they are used. 

- Network Configuration Files

Before we review the interface configuration files themselves, let us itemize the primary configuration files 
used by Red Hat Linux to configure networking. Understanding the role these files play in setting up the 
network stack can be helpful when customizing your system. 

The primary network configuration files are as follows: 

/etc/hosts - The main purpose of this file is to resolve hostnames that cannot be resolved any other way. 
             It can also be used on resolve hostnames on small networks with no DNS serer. Regardless of the 
             type of network the computer is on, this file should contain a line specifying the IP address 
             of the loopback device (127.0.0.1) as localhost.localdomain.  

/etc/resolv.conf - This file specifies the IP addresses of DNS servers and the search domain. 
                   Unless configured to do otherwise, the network initialization scripts populate this file. 

/etc/sysconfig/network - Specifies routing and host information for all network interfaces.  

/etc/sysconfig/network-scripts/ifcfg-<interface-name> - For each network interface on a Red Hat Linux system, 
                                                        there is a corresponding interface configuration script. 
                                                        Each of these files provide information specific to a 
                                                        particular network interface.
Caution 
The "/etc/sysconfig/networking/" directory is used by the Network Administration Tool (redhat-config-network) 
and its contents should not be edited manually.
 

- Interface Configuration Files

Interface configuration files control the operation of individual network interface device. 
As your Red Hat Linux system boots, it uses these files to determine what interfaces to bring up and how 
to configure them. These files are usually named "ifcfg-<name>", where <name> refers to the name of the device 
that the configuration file controls. 

Ethernet Interfaces
One of the most common interface files is ifcfg-eth0, which controls the first network interface card or 
NIC in the system. In a system with multiple NICs, you will also have multiple ifcfg-eth files, 
each one with a unique number at the end of the file name. Because each device has its own configuration file, 
you can control how each interface functions individually. 

Below is a sample "/etc/sysconfig/network-scripts/ifcfg-eth0" file for a system using a fixed IP address: 

DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
NETWORK=10.0.1.0
NETMASK=255.255.255.0
IPADDR=10.0.1.27
USERCTL=no
 
The values required in an interface configuration file can change based on other values. 
For example, the ifcfg-eth0 file for an interface using DHCP looks quite a bit different, 
because IP information is provided by the DHCP server: 

DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes

Most of the time you will probably want to use a GUI utility, such as Network Administration Tool 
(redhat-config-network) to make changes to the various interface configuration files. 

You can also edit the configuration file for a given network interface by hand. Below is a listing of the parameters 
one can expect to configure in an interface configuration file. 

Within each of the interface configuration files, the following values are common: 

BOOTPROTO=<protocol>, where <protocol> is one of the following: 
 none - No boot-time protocol should be used. 
 bootp - The BOOTP protocol should be used. 
 dhcp - The DHCP protocol should be used. 

BROADCAST=<address>, where <address> is the broadcast address. This directive is deprecated. 

DEVICE=<name>, where <name> is the name of the physical device (except dynamically-allocated PPP devices 
              where it is the logical name). 

DNS{1,2}=<address>, where <address> is a name server address to be placed in /etc/resolv.conf if the PEERDNS 
                    directive is set to yes. 

IPADDR=<address>, where <address> is the IP address. 

NETMASK=<mask>, where <mask> is the netmask value. 

NETWORK=<address>, where <address> is the network address. This directive is deprecated. 

ONBOOT=<answer>, where <answer> is one of the following: 

 yes - This device should be activated at boot-time. 
 no - This device should not be activated at boot-time. 

PEERDNS=<answer>, where <answer> is one of the following: 

 yes - Modify /etc/resolv.conf if the DNS directive is set. If you are using DCHP, then yes is the default. 
 no - Do not modify /etc/resolv.conf. 

SRCADDR=<address>, where <address> is the specified source IP address for outgoing packets. 

USERCTL=<answer>, where <answer> is one of the following: 

 yes - Non-root users are allowed to control this device. 
 no - Non-root users are not allowed to control this device. 


- Network Functions

Red Hat Linux makes use of several files that contain important functions that are used in various ways 
to bring interfaces up and down. Rather than forcing each interface control file to contain the same functions 
as another, these functions are grouped together in a few files that can be sourced when needed. 

The most common network functions file is network-functions, located in the /etc/sysconfig/network-scripts/ directory. 
This file contains a variety of common IPv4 functions useful to many interface control scripts, such as 
contacting running programs that have requested information about changes in an interface's status, setting 
host names, finding a gateway device, seeing if a particular device is down or not, and adding a default route. 

As the functions required for IPv6 interfaces are different than IPv4 interfaces, a network-functions-ipv6 file 
exists specifically to hold this information. IPv6 support must be enabled in the kernel in order to communicate 
via that protocol. A function is present in this file that checks for the presence of IPv6 support. 
Additionally, functions that configure and delete static IPv6 routes, create and remove tunnels, add and 
remove IPv6 addresses to an interface, and test for the existence of an IPv6 address on an interface can also 
be found in this file. 


41.3.4 Linux packet filtering :
-------------------------------

Linux comes with advanced tools for packet filtering - the process of controlling network packets as they enter, 
move through, and exit the network stack within the kernel. Pre-2.4 kernels relied on ipchains for 
packet filtering and used lists of rules applied to packets at each step of the filtering process. 
The introduction of the 2.4 kernel brought with it iptables (also called netfilter), which is similar 
to ipchains but greatly expands on the scope and control available for filtering network packets. 

This chapter focuses on packet filtering basics, defines the differences between ipchains and iptables, 
explains various options available with iptables commands, and shows how filtering rules can be preserved 
between system reboots. 

Warning 
The default firewall mechanism under the 2.4 kernel is iptables, but iptables cannot be used if ipchains 
are already running. If ipchains are present at boot time, the kernel will issue an error and fail 
to start iptables. 

- Packet Filtering

Traffic moves through a network in packets. A network packet is collection of data in a specific size 
and format. In order to transmit a file over a network, the sending computer must first break the file 
into packets using the rules of the network protocol. Each of these packets holds a small part of the file data. 
Upon receiving the transmission, the target computer reassembles the packets into the file. 

Every packet contains information which helps it navigate the network and move toward its destination. 
The packet can tell computers along the way, as well as the destination machine, where it came from, 
where it is going, and what type of packet it is, among other things. Most packets are designed to carry data, 
although some protocols use packets in special ways. For example, the Transmission Control Protocol (TCP) 
uses a SYN packet, which contains no data, to initiate communication between two systems. 

The Linux kernel contains the built-in ability to filter packets, allowing some of them into the system 
while stopping others. The 2.4 kernel's netfilter has three built-in tables or rules lists. They are as follows: 

 filter - This is the default table for handling network packets. 
 nat - This table used to alter packets that create a new connection. 
 mangle - This table is used for specific types of packet alteration. 

Each of these tables in turn have a group of built-in chains which correspond to the actions performed 
on the packet by the netfilter. 

The built-in chains for the filter table are as follows: 

 INPUT - This chain applies to packets received via a network interface. 
 OUTPUT - This chain applies to packets sent out via the same network interface which received the packets. 
 FORWARD - This chain applies to packets received on one network interface and sent out on another. 

The built-in chains for the nat table are as follows: 

 PREROUTING - This chain alters packets received via a network interface when they arrive. 
 OUTPUT - This chain alters locally-generated packets before they are routed via a network interface. 
 POSTROUTING - This chain alters packets before they are sent out via a network interface. 

The built-in chains for the mangle table are as follows: 

 PREROUTING - This chain alters packets received via a network interface before they are routed. 
 OUTPUT - This chain alters locally-generated packets before they are routed via a network interface. 

Every network packet received by or sent out of a Linux system is subject to at least one table. 
A packet may be checked against multiple rules within each rules list before emerging at the end of the chain. 
The structure and purpose of these rules may vary, but they usually seek to identify a packet coming from 
or going to a particular IP address or set of addresses when using a particular protocol and network service. 
Regardless of their destination, when packets match a particular rule on one of the tables, they are 
designated for a particular target or action to be applied to them. If the rule specifies an ACCEPT target 
for a matching packet, the packet skips the rest of the rule checks and is allowed to continue to 
its destination. If a rule specifies a DROP target, that packet is refused access to the system and nothing 
is sent back to the host that sent the packet. If a rule specifies a REJECT target, the packet is dropped, 
but an error packet is sent to the packet's originator. 

Every chain has a default policy to ACCEPT, DROP, REJECT, or QUEUE the packet to be passed to user-space. 
If none of the rules in the chain apply to the packet, then the packet is dealt with in accordance 
with the default policy. 

The iptables command allows you to configure these rule lists, as well as set up new tables to be used 
for your particular situation. 

- iptables command:


41.3.5 Redhat and BIND:
-----------------------

BIND as a Nameserver:
Red Hat Linux includes BIND, which is a very popular, powerful, open source nameserver. BIND uses the named 
daemon to provide name resolution services. 

BIND version 9 also includes a utility called /usr/sbin/rndc which allows the administration of the running 
named daemon. More information about rndc can be found in the Section called Using rndc. 



41.4: tcpip timeouts:
---------------------

Note 1:
-------

The defaults for TCP Timeouts are:
AIX: 75 seconds
Solaris: 180 Seconds
NT: 9 Seconds

To view: 
# /usr/sbin/no -o tcp_keepinit 
The output should be something like:
tcp_keepinit = 150

To set:
# /usr/sbin/no -d tcp_keepinit 100


Note 2:
-------

Changing the TCP/IP timeout setting on your event server
If the Situation Update Forwarder cannot reach a monitoring server to send an update, depending on the TCP/IP settings 
for the computer where your event server is running, it could be up to 15 minutes before the Situation Update Forwarder tries 
to connect to the monitoring server again. This might occur if your event server is running on an AIXr, Solaris, or HP-UX computer.

Use the following steps to change the TCP/IP timeout for your computer.

On AIX, run the following command:

no -o tcp_keepinit=<timeout_value>

where <timeout_value> is the length of the timeout period, in half seconds. To configure a timeout of 30 seconds, 
set the <timeout_value> value to 60.

On Solaris and HP-UX, run the following command:

ndd -set /dev/tcp tcp_ip_abort_cinterval <timeout_value>

where <timeout_value> is the length of the timeout period, in milliseconds. To configure a timeout of 30 seconds, 
set the <timeout_value> value to 30000.







========================
42. SOME NOTES ON IPSEC:
========================


This section describes some important features of the IPSec implementations on AIX, HP-UX and Linux Redhat.


42.1 What is IPSec?
===================


IP Security, known commonly as IPSec, is a protocol developed by the Internet Engineering Task Force (IETF), 
designed to provide "end-to-end" Authentication and/or cryptographically-based security for IP network connections. 
Though not yet an official standard, compatible IPSec implementations are available for almost 
all modern operating systems. Inclusion of IPSec is required in every IPv6 implementation, 
and it has been designed to work equally well with the more common IPv4 system currently in use 
by most public and private networks.

All IP Security implementations include a common set of protocols and tools to enable interoperatability 
between different platforms, and provide the following three benefits:

- Authentication: proof that the identity of the host on the other end of the connection is valid and correct. 
- Integrity Checking: assurance that no data sent over the network connection was modified in transit. 
- Encryption: the rendering of network communications indecipherable to anyone who might intercept the transmitted data. 

IPSec implementations also include a method of restricting connections to various services, 
based on their origin and destination. This feature, often present in firewall devices, 
is known as packet filtering.

IPsec protocols operate at the network layer, layer 3 of the OSI model. Other Internet security protocols 
in widespread use, such as SSL, TLS and SSH, operate from the transport layer up (OSI layers 4 - 7). 
This makes IPsec more flexible, as it can be used for protecting layer 4 protocols, including both TCP and UDP, 
the most commonly used transport layer protocols. IPSec has an advantage over SSL and other methods that operate 
at higher layers. For an application to use IPsec no code change in the applications is required whereas 
to use SSL and other higher level protocols, applications must undergo code changes.

IPsec was intended to provide either "transport mode" (end-to-end) security of packet traffic in which 
the end-point computers do the security processing, or "tunnel mode" (portal-to-portal) communications security 
in which security of packet traffic is provided to several machines (even to whole LANs) by a single node.

IPsec can be used to create Virtual Private Networks (VPN) in either mode, and this is the dominant use. 
Note, however, that the security implications are quite different between the two operational modes.

End-to-end communication security on an Internet-wide scale has been slower to develop than many had expected. 
Part of the reason is that no universal, or universally trusted, Public Key Infrastructure (PKI) has emerged 
(DNSSEC was originally envisioned for this); another part is that many users understand neither their needs 
nor the available options well enough to promote inclusion in vendors' products.
This is why a "shared key" (or symmetric key) is used in IPSec. Both the sender and receiver must use the same key.



-- Transport mode
-- --------------

In transport mode, only the payload (the data you transfer) of the IP packet is authenticated and/or encrypted. 
The routing is intact, since the IP header is neither modified nor encrypted; however, when the authentication 
header is used, the IP addresses cannot be translated, as this will invalidate the hash value. The transport 
and application layers are always secured by hash, so they cannot be modified in any way (for example by 
translating the port numbers). Transport mode is used for host-to-host communications.

In its most simple form, using only an Authentication Header (AH) for identifying your communication
partner, the packet looks like this:

  ---------------------------------------
  | Original IP header | AH | TCP| DATA |
  ---------------------------------------

In transport mode, IPSec inserts the AH header after the IP header. The IP data and header are used to calculate 
the AH authentication value. 


-- Tunnel mode
-- -----------

In tunnel mode, the entire IP packet (data plus the message headers) is encrypted and/or authenticated. 
It must then be encapsulated into a new IP packet for routing to work. Tunnel mode is used for 
network-to-network communications (secure tunnels between routers) or host-to-network and host-to-host 
communications over the Internet.

You should be aware that tunnel mode is probably the most widely used implementation.
Many organizations use the Internet, to tunnel their traffic from site to site.

In its most simple form, using only an Authentication Header (AH) for identifying your communication
partner, the packet looks like this:

  --------------------------
  |NEW IP Header | Payload |
  --------------------------

  which is

  ----------------------------------------------------
  |NEW IP Header| AH | Original IP header| TCP| DATA |
  ----------------------------------------------------

In Tunnel mode, IPSec traffic can pass transparently through existing IP routers.



AH and/or ESP: or, just Authentication and/or Authentication plus Data Encryption:
-------------------------------------------------------------------------------

The IPSec Authentication Header (AH) provides integrity and authentication but no privacy--
the IP data is not encrypted. The AH contains an authentication value based on a symmetric-key hash function. 

Symmetric key hash functions are a type of cryptographic hash function that take the data and a key as input 
to generate an authentication value. Cryptographic hash functions are usually one-way functions, 
so that starting with a hash output value, it is difficult to create an input value that would generate 
the same output value. This makes it difficult for a third party to intercept a message and replace 
it with a new message that would generate the same authentication value. 

Symmetric key hash functions are also known as shared key hash functions because the sender and receiver 
must use the same (symmetric) key for the hash functions. In addition, the key must only be known by the 
sender and receiver, so this class of hash functions is sometimes referred to as secret key hash functions.

So, secret key must not be confused with the well-know Public/Private key encryptions.

-- Most implementations support the following for the AH:

HMAC-SHA1 (Hashed Message Authentication Code-Secure Hash Algorithm 1, 128-bit key)
HMAC-MD5 (HMAC-Message Digest 5, 160-bit key)

Ofcourse, total encryption of the DATA is also possible, instead of only the AH.
The IPSec Encapsulating Security Payload (ESP) provides data privacy. The ESP protocol also defines 
an authenticated format that provides data authentication and integrity, with data privacy 

-- Most implementations support the following for ESP:

DES-CBC (Data Encryption Standard Cipher Block Chaining Mode, 56-bit key length)
3DES-CBC (Triple-DES CBC, three encryption iterations, each with a different 56-bit key)
AES128-CBC (Advanced Encryption Standard CBC, 128-bit key length).

To be exact, With authenticated ESP, that is AH and ESP,  IPSec encrypts the payload using one symmetric key, 
then calculates an authentication value for the encrypted data using a second symmetric key.


How the shared key is generated:
--------------------------------

The Internet Key Exchange (IKE) protocol is used, for automatically generating and distributing cryptography keys 
for ESP and AH. IKE also authenticates the identity of the remote system, so AH and authenticated ESP 
with IKE keys provides data origin authentication.

Internet Key Exchange (IKE) is an automated protocol for dynamically negotiating the IPSec parameters. 
IKE provides dynamic secret key generation and exchange for IPSec and allows for scalability.
Before IPSec sends authenticated or encrypted IP data, both the sender and receiver must agree on the 
protocols, encryption algorithms and keys to use. IPSec uses the Internet Key Exchange (IKE) protocol 
to negotiate the encryption and authentication methods, and generate shared encryption keys. 
The IKE protocol also provides primary authentication - verifying the identity of the remote system 
before negotiating the encryption algorithm and keys.

The IKE protocol is a hybrid of three other protocols: Internet Security Association and 
Key Management Protocol (ISAKMP), Oakley, and Versatile Secure Key Exchange Mechanism for 
Internet protocol (SKEME). ISAKMP provides a framework for authentication and key exchange, but does not 
define them (neither authentication nor key exchange). The Oakley protocol describes a series of modes 
for key exchange and the SKEME protocol defines key exchange techniques.

Manual Keys, is an alternative to IKE. Instead of dynamically generating and distributing cryptography keys 
for ESP and AH, the cryptography keys are static and manually distributed. Manual keys are typically used only 
when the remote system does not support .


So IPSec uses "shared key" technology. If you use the manual keys, its clear how they get
generated: by you. But even if you use IKE, you still have a "negotiation phase" before the
keys are actually determined. In this phase, two models can be used:

 -> IKE Preshared Key Authentication
 With preshared key authentication, you must manually configure the same, shared symmetric key 
 on both systems, a preshared key. The preshared key is used only for the primary authentication. 
 The two negotiating entities then generate dynamic shared keys for the IKE SAs and IPSec/QM SAs.
 Preshared keys do not require a Certificate Authority or Public Key Infrastructure.

 -> Digital Signatures
 Digital signatures are based on security certificates, and are managed using a Public Key Infrastructure (PKI). 
 So, here you have a Public key infrastructure, only used in the "negotiation phase" before the 
 actual shared key is constructed.

 Two well known PKI products are:
 -VeriSign Managed PKI (formerly VeriSign OnSite for VPNs)
 -Baltimore UniCERT 3.5


Notes:
-----

Note 1:
-------

IPSec can be employed between hosts (that is, end nodes), between gateways, or between a host and a gateway 
in an IP network. Some implementations, like HP-UX IPSec, can only be installed on end nodes.

Note 2:
-------

Next to the Authentication and/or Data Encryption, IPSec also covers, or has implemented, "filter rules",
on a Host or gateway (router) which "allow/permit" or "deny" traffic based on IP addresses, masks, portnumbers etc..
Basically, this looks like the stuff you can find in Firewall implementations.
Thus rules are collected in socalled IPSec policies.

Note 3:
-------

In IPSec, you will often see the term "SA". This stands for "Security Association", which is actually
a term discribing and collecting all relevant parameters like Destination Address, Security Parameter Index SPI, Key, 
Autentication Algolrithm, Key lifetime etc..



42.2 IPSec and AIX:
===================


- Installing IPSec:

Installing the IP Security pieces
The software components needed to implement IPSec are included with AIX on the base installation media. 
To determine if the required filesets are already installed, run the command:

lslpp -L '*ipsec*'

The output from that command should contain the following filesets:

Fileset                      Level  State  Description  
----------------------------------------------------------------------------  
bos.msg.en_US.net.ipsec    4.3.3.0    C    IP Security Messages - U.S.                                             
bos.net.ipsec.keymgt      4.3.3.50    C    IP Security Key Management  
bos.net.ipsec.rte         4.3.3.50    C    IP Security  
bos.net.ipsec.websm       4.3.3.25    C    IP Security WebSM


One additional piece of software is required: the bos.crypto fileset, found on the AIX Bonus Pack CD. 
The name of this fileset may differ, depending on the country. To determine if this fileset is installed 
on the system, run the command:

lslpp -L 'bos.crypto*'

- Set up IPSec logging:

The IP Security software uses syslog to process messages and errors that it generates. 
Messages are sent to syslogd at the local4 facility. It is a good idea to setup logging of these messages 
before activating IPSec, to make troubleshooting easier. 

To have syslogd write all messages received at the local4 facility to the logfile /var/adm/ipsec.log, 
add the following line to the /etc/syslog.conf file:

local4.debug                    /var/adm/ipsec.log 

Create the empty log file by running the command touch /var/adm/ipsec.log, and then make syslogd aware 
of the changes to its configuration by running the command refresh -s syslogd.

- Using IPSec to create "rules":
--------------------------------

You can use smitty:

# smitty ips4_basic   for basic configuration for IP version 4 
# smitty ips6_basic   for basic configuration for IP version 6

or use the commandline with, for example, the "genfilt", "lsfilt" and other commands.


1. The genfilt Command

Purpose
Adds a filter rule. 

Syntax
genfilt -v 4|6 [ -n fid] [ -a D|P] -s s_addr -m s_mask [-d d_addr] [ -M d_mask] [ -g Y|N ] 
               [ -c protocol] [ -o s_opr] [ -p s_port] [ -O d_opr] [ -P d_port] [ -r R|L|B ] [ -w I|O|B ] [ -l Y|N ] 
               [ -f Y|N|O|H ] [ -t tid] [ -i interface] 


Description
Use the genfilt command to add a filter rule to the filter rule table. The filter rules generated by this command 
are called manual filter rules. IPsec filter rules can be configured using the genfilt command, 
IPsec smit (IP version 4 or IP version 6), or Web-based System Manager in the Virtual Private Network submenu.

Examples:

# genfilt -v 4 -a D -s 0.0.0.0 -m 0.0.0.0 -d 0.0.0.0 -M 0.0.0.0 -c udp -o any -O eq -P 123 -l n -w I -i all


2. The lsfilt Command

Purpose
Lists filter rules from either the filter table or the IP Security subsystem. 

Syntax
lsfilt -v 4|6 [-n fid_list] [-a] [-d] 

Description
Use the lsfilt command to list filter rules and their status. 


Example using IPSec on AIX:
---------------------------

To configure IP Sec, tunnels and filters must be configured. When a simple tunnel is defined for all traffic 
to use, the filter rules can be automatically generated. If more complex filtering is desired, filter rules 
can be configured separately.

You can configure IP Sec using the Web-based System Manager application Network or SMIT. If using SMIT, 
the following fastpaths will take you directly to the configuration panels you need:

- ips4_basic 
Basic configuration for IP version 4 
- ips6_basic 
Basic configuration for IP version 6

This section on IP Security Configuration discusses the following topics:

.Tunnels versus Filters 
.Tunnels and Security Associations 
.Choosing a Tunnel Type 
.Basic Configuration 
.Static Filter Rules and Examples 
.Advanced Manual Tunnel Configuration 
.Configuring IKE Tunnels 
.Predefined Filter Rules 
.Logging Facilities 
.Coexistence of IP Security and IBM Secured Network Gateway 2.2/IBM Firewall 3.1 or 3.2 


=> Tunnels versus Filters:

There are two related but distinct parts of IP Security: tunnels and filters. Tunnels require filters, 
but filters do not require tunnels.

Filtering is a basic function in which incoming and outgoing packets can be accepted or denied based 
on a variety of characteristics. This allows a system administrator to configure the host to control 
the traffic between this host and other hosts. Filtering is done on a variety of packet properties, 
such as source and destination addresses, IP Version (4 or 6), subnet masks, protocol, port, 
routing characteristics, fragmentation, interface, and tunnel definition. This filtering is done 
at the IP layer, so no changes are required to the applications. 

Tunnels define a security association between two hosts. These security associations involve specific 
security parameters that are shared between end points of the tunnel.

A packet comes in the network adapter to the IP stack. From there, the filter module is called to determine 
if the packet should be permitted or denied. If a tunnel ID is specified, the packet will be checked against 
the existing tunnel definitions. If the decapsulation from the tunnel is successful, the packet will be passed 
to the upper layer protocol. This function will occur in reverse order for outgoing packets. The tunnel 
relies on a filter rule to associate the packet with a particular tunnel, but the filtering function can occur 
without passing the packet to the tunnel. 

=> Tunnels and Security Associations

Tunnels are used whenever it is desired to have data authenticated, or authenticated and encrypted. 
Tunnels are defined by specifying a security association between two hosts (see figure). The security 
association SA, defines the parameters for the encryption and authentication algorithms and characteristics 
of the tunnel.

  -----------                              ---------
  |Host A   |                              |Host B |
  |         |------------------------------|       |
  |         |------------------------------|       |
  |         |                              |       |
  -----------  SA A------------------->    ---------
                   <------------------ SA B

SA = Security Association, consisting of {Destination Address, SPI, Key, Autentication Algolrithm, Key lifetime}

The Security Parameter Index (SPI) and the destination address identify a unique security association. 
Therefore, these two parameters are required for uniquely specifying a tunnel. Other parameters such as 
cryptographic algorithm, authentication algorithm, keys, and lifetime can be specified or defaults can be used.

=> Choosing a Tunnel Type

The decision to use IBM tunnels, manual tunnels, or, for AIX versions 4.3.2 and later, IKE tunnels, 
depends on the tunnel support of the remote end and the type of key management desired. IKE tunnels 
are preferable (when available) because they offer secure key negotiation and key refreshment in an 
industry-standard way. They also take advantage of the new IETF ESP and AH header types and support 
anti-replay protection.

IBM tunnels offer similar security, but their support is limited to a smaller set of encryption and 
authentication algorithms, but they provide backward compatibility and ease of use with their import/export 
functions with the IBM Firewall.

If the remote end does not support IBM tunnels, or uses one of the algorithms requiring manual tunnels,
 manual tunnels should be used. Manual tunnels ensure interoperability with a large number of hosts. 
Because the keys are static and difficult to change and may be cumbersome to update, they are not as secure.

IBM Tunnels may be used between any two AIX machines running AIX Version 4.3 or higher, or between an AIX 4.3 host and 
a host running IBM Secure Network Gateway 2.2 or IBM Firewall 3.1/3.2. Manual tunnels may be used between a host 
running AIX Version 4.3 and any other machine running IP Security and having a common set of cryptographic 
and authentication algorithms. Almost all vendors offer Keyed MD5 with DES, or HMAC MD5 with DES. 
This is a base subset that works with almost all implementations of IP Security.

When setting up manual or IBM tunnels, the procedure depends on whether you are setting up the first host 
of the tunnel or setting up the second host, which must have parameters matching the first host's setup. 
When setting up the first host, the keys may be autogenerated, and the algorithms can be defaulted. 
When setting up the second host, it is best to import the tunnel information from the remote end, if possible.

Another important consideration is determining whether the remote system is behind a firewall. If it is, 
the setup must include information about the intervening firewall.


=>Basic Configuration (Manual or IBM Tunnels)

- Setting Up Tunnels and Filters
For the simplest case, setting up a manual tunnel, it is not necessary to separately configure the filter rules. 
As long as all traffic between two hosts goes through the tunnel, the necessary filter rules are automatically 
generated. The process of setting up a tunnel is to define the tunnel on one end, import the definition 
on the other end, and activate the tunnel and filter rules on both ends. Then the tunnel is ready to use.

Information about the tunnel must be made to match on both sides if it is not explicitly supplied (see figure). 
For instance, the encryption and authentication algorithms specified for the source will be used for the destination 
if the destination values are not specified. This makes creating the tunnel much simpler.

- Creating a Manual Tunnel on Host A
You can configure a tunnel using the Web-based System Manager application Network, the SMIT fast path ips4_basic 
(for IP Version 4) or ips6_basic (for IP version 6), or you can use the following procedure.

The following is a sample of the gentun command used to create a manual tunnel: 

# gentun -v 4 -t manual -s 5.5.5.19 -d 5.5.5.8 -a HMAC_MD5 -e DES_CBC_8 -N 23567 

This will create a tunnel with output (using lstun -v 4) that looks similar to: 

Tunnel ID            : 1
IP Version           : IP Version 4
Source               : 5.5.5.19
Destination          : 5.5.5.8
Policy               : auth/encr
Tunnel Mode          : Tunnel
Send AH Algo         : HMAC_MD5 
Send ESP Algo        : DES_CBC_8 
Receive AH Algo      : HMAC_MD5 
Receive ESP Algo     : DES_CBC_8 
Source AH SPI        : 300
Source ESP SPI       : 300
Dest AH SPI          : 23576
Dest ESP SPI         : 23576
Tunnel Life Time     : 480
Status               : Inactive
Target               : -
Target Mask          : -
Replay               : No
New Header           : Yes
Snd ENC-MAC Algo     : -
Rcv ENC-MAC Algo     : -

The tunnel will be activated when the mktun command is used: 

# mktun -v 4 -t1

The filter rules associated with the tunnel are automatically generated and output (using lsfilt -v 4) 
looks similar to: 

Rule 4:

Rule action           : permit 
Source Address        : 5.5.5.19 
Source Mask           : 255.255.255.255 
Destination Address   : 5.5.5.8 
Destination Mask      : 255.255.255.255 
Source Routing        : yes 
Protocol              : all 
Source Port           : any 0 
Destination Port      : any 0 
Scope                 : both  
Direction             : outbound 
Logging control       : no 
Fragment control      : all packets 
Tunnel ID number      : 1 
Interface             : all 
Auto-Generated        : yes 

Rule 5: 

Rule action           : permit 
Source Address        : 5.5.5.8 
Source Mask           : 255.255.255.255 
Destination Address   : 5.5.5.19 
Destination Mask      : 255.255.255.255 
Source Routing        : yes 
Protocol              : all 
Source Port           : any 0 
Destination Port      : any 0 
Scope                 : both  
Direction             : inbound 
Logging control       : no 
Fragment control      : all packets 
Tunnel ID number      : 1 
Interface             : all 
Auto-Generated        : yes 

These filter rules in addition to the default filter rules are activated by the mktun -v 4 -t 1 command. 

To set up the other side (when it is another AIX machine), the tunnel definition can be exported on host A 
then imported to host B.

To export:

# exptun -v 4 -t 1 -f /tmp

This will export the tunnel definition into a file named ipsec_tun_manu.exp and any associated filter rules 
to the file ipsec_fltr_rule.exp in the directory indicated by the -f flag.

- Creating a manual tunnel on Host B
To create the matching end of the tunnel, the export files are copied to the remote side and imported into 
that remote AIX 4.3 machine by using the command:

# imptun -v 4 -t 1 -f /tmp

where 1 is the tunnel to be imported and /tmp is the directory where the import files reside. This tunnel number 
is system generated and must be referenced from the output of the gentun command, or by using the lstun command 
to list the tunnels and determine the correct tunnel number to import. If there is only one tunnel in the 
import file, or if all the tunnels are to be imported, then the -t option is not needed.

If the remote machine is not AIX 4.3, the export file can be used as a reference for setting up the algorithm, 
keys, and SPI values for the other end of the tunnel.

Export files from the IBM Secure Network Gateway (SNG) can be imported to create tunnels in AIX 4.3. To do this, 
use the -n option when importing the file:

# imptun -v 4 -f /tmp -n

- Creating an IBM tunnel on Host A
Setting up an IBM tunnel is similar to a manual tunnel, but some of the choices are different for the crypto 
algorithms and the keys are negotiated dynamically, so there is no need to import keys. IBM tunnels are limited 
to Keyed MD5 for authentication. If the HMAC MD5 or HMAC SHA algorithms are desired, a manual tunnel must be used.

# gentun -s 9.3.100.1 -d 9.3.100.245 -t IBM -e DES_CBC_8 -n 35564 

As with manual tunnels, from this point the tunnel and filter table must be activated to make the tunnel active:

# mktun -v 4 -t1

To set up the other side, if the other host is an AIX 4.3 IP Security machine, the tunnel definition can be exported 
on host A, then imported to host B. 

To export: 

# exptun -v 4 -f /tmp

This will export the tunnel definition into a file named ipsec_tun_ibm.exp and any associated filter rules 
to the file ipsec_fltr_rule.exp in the directory indicated by the -f flag. 

- Creating an IBM tunnel on Host B
The procedure is the same for creating the second end of the tunnel on host B for an IBM tunnel. 
The tunnel definition is exported from host A and imported onto host B. The -n flag can be used for a file exported 
by an IBM Secure Network Gateway or an IBM Firewall 3.1/3.2.

- Static Filter Rules and Examples
Filtering can be set up to be simple, using mostly autogenerated filter rules, or can be complex by defining 
very specific filter functions based on the properties of the IP packets. Matches on incoming packets are done 
by comparing the source address and SPI value to those listed in the filter table. Therefore, this pair 
must be unique. 

Each line in the filter table is known as a rule. A collection of rules will determine what packets are accepted 
in and out of the machine, and how they will be directed. Filter rules can be written based on source and destination 
addresses and masks, protocol, port number, direction, fragment control, source routing, tunnel, and interface.

Below is a sample set of filter rules. Within each rule, fields are shown in the following order 
(an example of each field from rule 1 is shown in parentheses): Rule_number (1), Action (permit), 
Source_addr (0.0.0.0), Source_mask (0.0.0.0), Dest_addr (0.0.0.0), Dest_mask (0.0.0.0), Source_routing (no), 
Protocol (udp), Src_prt_operator (eq), Src_prt_value (4001), Dst_prt_operator (eq), Dst_prt_value (4001), 
Scope (both), Direction (both), Logging (no), Fragment (all packets), Tunnel (0), and Interface (all).

1 permit 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 no udp eq 4001 eq 4001 both both no all 
packets 0 all

2 permit 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 no ah any 0 any 0 both both no all 
packets 0 all

3 permit 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 no esp any 0 any 0 both both no all 
packets 0 all

4 permit 10.0.0.1 255.255.255.255 10.0.0.2 255.255.255.255 no all any 0 any 0 
both outbound no all packets 1 all

5 permit 10.0.0.2 255.255.255.255 10.0.0.1 255.255.255.255 no all any 0 any 0 
both inbound no all packets 1 all

6 permit 10.0.0.1 255.255.255.255 10.0.0.3 255.255.255.255 no tcp lt 1024 eq 514 
local outbound yes all packets 2 all

7 permit 10.0.0.3 255.255.255.255 10.0.0.1 255.255.255.255 no tcp/ack eq 514 lt 
1024 local inbound yes all packets 2 all

8 permit 10.0.0.1 255.255.255.255 10.0.0.3 255.255.255.255 no tcp/ack lt 1024 lt 
1024 local outbound yes all packets 2 all

9 permit 10.0.0.3 255.255.255.255 10.0.0.1 255.255.255.255 no tcp lt 1024 lt 
1024 local inbound yes all packets 2 all

10 permit 10.0.0.1 255.255.255.255 10.0.0.4 255.255.255.255 no icmp any 0 any 0 
local outbound yes all packets 3 all

11 permit 10.0.0.4 255.255.255.255 10.0.0.1 255.255.255.255 no icmp any 0 any 0 
local inbound yes all packets 3 all

12 permit 10.0.0.1 255.255.255.255 10.0.0.5 255.255.255.255 no tcp gt 1023 eq 21 
local outbound yes all packets 4 all

13 permit 10.0.0.5 255.255.255.255 10.0.0.1 255.255.255.255 no tcp/ack eq 21 gt 
1023 local inbound yes all packets 4 all

14 permit 10.0.0.5 255.255.255.255 10.0.0.1 255.255.255.255 no tcp eq 20 gt 1023 
local inbound yes all packets 4 all

15 permit 10.0.0.1 255.255.255.255 10.0.0.5 255.255.255.255 no tcp/ack gt 1023 
eq 20 local outbound yes all packets 4 all

16 permit 10.0.0.1 255.255.255.255 10.0.0.5 255.255.255.255 no tcp gt 1023 gt 
1023 local outbound yes all packets 4 all

17 permit 10.0.0.5 255.255.255.255 10.0.0.1 255.255.255.255 no tcp/ack gt 1023 
gt 1023 local inbound yes all packets 4 all

18 permit 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 no all any 0 any 0 both both yes all 
packets 

Rule 1 is for the IBM Session Key daemon and will only appear in IP Version 4 filter tables. It uses port number 
4001 to control packets for refreshing the session key. It is an example of how the port number can be used 
for a specific purpose. This filter rule should not be modified except for logging purposes.

Rules 2 and 3 are used to allow processing of Authentication Headers (AH) and Encapsulating Security Payload 
(ESP) headers. They should not be modified except for logging purposes.

Rules 4 and 5 are a set of autogenerated rules that filter traffic between addresses 10.0.0.1 and 10.0.0.2 
through tunnel #1. Rule 4 is for outbound traffic and rule 5 is for inbound traffic.

Rules 6 through 9 are a set of user-defined rules that filter outbound rsh, rcp, rdump, rrestore, and rdist 
services between addresses 10.0.0.1 and 10.0.0.3 through tunnel #2. Note that logging is set to yes so the 
administrator can monitor this type of traffic.

Rules 10 and 11 are a set of user-defined rules that filter both inbound and outbound icmp services of any type 
between addresses 10.0.0.1 and 10.0.0.4 through tunnel #3.

Rules 12 through 17 are user-defined filter rules that filter outbound FTP service from 10.0.0.1 and 10.0.0.5 
through tunnel #4.

Rule 18 is an autogenerated rule always placed at the end of the table. In this case, it permits all packets 
that do not match the other filter rules. It may be set to deny all traffic not matching the other filter rules.

Each rule may be viewed separately (using lsfilt) to make each field clear. 



42.3 IPSEC and HP:
===================


As you have read in section 42.1, you should know beforehand if you want AH or AH plus ESP,
Manual keys or IKE, Transport mode or Tunnel mode, and what "filter rules" you want to apply.
Depending on the number of NIC's in your Host, and what traffic you want to permit or deny,
you will invest a certain a amount of effort to create those rules.


Introducing Configuring IPSec:
------------------------------

You configure HP-UX IPSec using a couple of commandline utilities like:

ipsec_config
ipsec_report
ipsec_admin
ipsec_policy

To configure security certificates (used in the negotiation phase in IKE), use the "ipsec_mgr" utility, 
which has a graphical user interface (GUI). So you need an X terminal.
You can also use preshared key instead of certificates (the preshared key is used only for the 
primary authentication).

As an example of using the commandline, take a look at the following command:

# ipsec_config add host my_host_policy -source 10.1.1.1 \
  -destination 10.0.0.0/8/TELNET -pri 100 \
  -action ESP_AES128_HMAC_SHA1

The above creates a "rule" or policy in the policy database "/var/adm/ipsec/config.db".

The syntax with respect of addresses and ports, resembles somewhat the common syntax found in many 
types of router, gateway, firewall products.

For example 
0.0.0.0   means here all possible IPv4 addresses
10.0.0.0  means here all possible IPv4 addresses in 10.

Instead of using a serie of individual commands to configure IPSec, HP recommends to create a "batchfile" 
with statements. All statements are parsed first, and either all statements pass and are executed, or all fail, 
even if only one statement is incorrectt.

For the above example, a batchfile would look like:

add host my_host_policy -source 10.1.1.1 \
-destination 10.0.0.0/8/TELNET -pri 100 \
-action ESP_AES128_HMAC_SHA1

Notice that we have used the "add" option of the ipsec_config command, indeed used to "add" 
to the config DB. It also suggest that there are other options, which is true:

You can use:

ipsec_config add        to add to the db
ipsec_config batch      to use a batchfile
ipsec_config delete     to delete from the db
ipsec_config show       to show information from the db

For example, the "ipsec_config show all" command displays the entire contents of the database.


profiles:

An ipsec_config profile file contains default argument values that are evaluated in ipsec_config add commands 
if the user does not specify the values in the command. The values are evaluated once, when the policy is 
added to the configuration database. Values used from the profile file become part of the configuration record 
for the policy.

You can specify a profile file name with the -profile argument as part of an ipsec_config command. By default, 
ipsec_config uses the /var/adm/ipsec/.ipsec_profile profile file, which is shipped with HP-UX IPSec. 
In most topologies, you can use the default values supplied in the /var/adm/ipsec/.ipsec_profile file.



Installation:
-------------

The software takes about 110MB. Most of the software goes into /var/adm/ipsec.
As root:

As usual at installation on HP-UX, run the swinstall program using the command:

# swinstall

This opens the "Software Selection" window and the "Specify Source" window. 
On the Specify Source window, change the Source Host Name if necessary. 
Enter the mount point of the drive in the Source Depot Path field and click OK to return to the 
Software Selection window. 

The Software Selection window now contains a list of available software bundles to install.
Highlight the HP-UX IPSec software for your system type. 

Choose Mark for Install from the Actions menu to choose the product to be installed. With the exception of 
the manpages and user's manual, you must install the complete IPSec product.

swinstall loads the fileset, runs the control scripts for the fileset, and builds the kernel. 
Estimated time for processing: 3 to 5 minutes.

Click OK on the Note window to reboot the system.

When the system reboots, check the log files "/var/adm/sw/swinstall.log" and 
"/var/adm/sw/swagent.log" to make sure the installation was successful.  

-- Setting the HP-UX IPSec Password:

When you install HP-UX IPSec, the HP-UX IPSec password is set to ipsec. You must change the HP-UX IPSec password 
after installing the product to use the autoboot feature and to load and configure security certificates. 
HP-UX IPSec uses the password to encrypt certificate files that contain cryptography keys for 
security certificates, and to control access to the ipsec_mgr security certificate configuration GUI.

To set the password, run the following command:

# ipsec_admin -newpasswd

The ipsec_admin utility prompts you to establish the HP-UX IPSec password.


Configuring IPSec (2):
----------------------

From the HP-UX documentation, it is shown that you should do the following actions:

Step 1: Configuring Host IPSec Policies
Step 2: Configuring Tunnel IPSec Policies
Step 3: Configuring IKE Policies
Step 4: Configuring Preshared Keys Using Authentication Records (Or do Step 5)
Step 5: Configuring Certificates
Step 6: Configuring the Bypass List (Local IPv4 Addresses)
Step 7: Verify Batch File Syntax
Step 8: Committing the Batch File Configuration and Verifying Operation
Step 9: Configuring HP-UX IPSec to Start Automatically
Step 10: Creating Backup Copies of the Batch File and Configuration Database
 





43. SOLARIS OpenBoot PROM commands:
===================================

-- Getting help
-- ------------
ok help / ok help [category] / ok help command

For example, if you want to see the help messages for all commands in the category "diag", type the following:

ok help diag

-- Display your physical devices
-- -----------------------------
ok show-devs [device path]

-- Create or show device aliases
-- -----------------------------
A device pathnames can be long and hard to enter. A device alias allows a short name to represent
an entire device pathname. For example the alias "disk0" might represent the device
/sbus@1,f8000000/esp@0,40000/sd@3,0:a

ok devalias               displays all current devices aliases
ok <alias> <device name>  creates the alias corresponding to the physical device

The following example creates a device alias named "disk3" which represents a SCSI disk
with a target ID of 3.

ok devalias disk3 /iommu/sbus/espdma@f,400000/esp@f,800000/sd@3,0

To make this permanent in NVRAM use:
ok nvalias disk3 /iommu/sbus/espdma@f,400000/esp@f,800000/sd@3,0

-- OpenBoot Diagnostics
-- --------------------
Various hardware diagnostics can be run in OpenBoot.

ok probe-scsi      identifies devices attached to as SCSI bus
ok probe-ide       identifies IDE devices attached to the PCI bus
ok test device     executes the self-test method of the device
ok test-all        test all devices that have a build-in self-test method
ok watch-clock     tests the clock function
ok watch-net       monitors the network connection

-- OpenBoot NVRAM
-- --------------
System configuration parameters, like "auto-boot", are stored in NVRAM.
You can list or modify these configuration parameters and any changes you make
remain in effect, even after a power cycle because the are stored in NVRAM.

Some of the most important parameters:
auto-boot?     default true        if true, the machine boots automatically
boot-command   default boot        the command that is executed if auto-boot is true
boot-device    disk or net         device from which to start up
input-device   keyboard            console input device, usually keyboard, ttya, ttyb
security-mode  none                none, command, or full
etc..

To show a parameter: ok printenv <parameter>
To set a parameter : ok setenv <parameter> <value>

ok setenv auto-boot? false
ok printenv auto-boot?

Once unix is loaded, root can also use the /usr/sbin/eeprom command to view or change an OpenBoot parameter.
/usr/sbin/eeprom auto-boot?=true



44. Process priority:
===================== 


Solaris:
--------

NICE and PRIOCTL commands:

nice:
-----

A high nice value means a low priority for your process: you are goiing to be nice.
A low or negative value means a high priority: you are not very nice.

Examples:

# nice +10 ~/bin/longtask
# renice -5 8829

The nice command uses the programname as an argument. The renice command takes the PID as argument.

System	   Range
------     -----
Solaris    0-39
HPUX       0-39
Read Hat   -20-20
FreeBSD    -20-20

prioctl:
--------

Solaris uses the prioctl command, intended as an improvement over the nice command,
to modify process priorities.

Syntax:
# prioctl -s -p <new_priority> -i pid <process_id>

Example:
# prioctl -s -p -5 -i pid 8200


AIX:
----

In AIX we can use the nice and renice commands as well.

About the schedtune Command:
Purpose
Sets parameters for CPU scheduler and Virtual Memory Manager processing.

Syntax
schedtune [ -D | { [ -d n ] [ -e n ] [ -f n ] [ -h n ] [ -m n ] [ -p n ] [ -r n ] [ -t n ] [ -w n ] } ]

Description
Priority-Calculation Parameters
The priority of most user processes varies with the amount of CPU time the process has used recently. The CPU scheduler's priority calculations are based on two parameters that are set with schedtune: -r and -d. The r and d values are in thirty-seconds (1/32); that is, the formula used by the scheduler to calculate the amount to be added to a process's priority value as a penalty for recent CPU use is:

CPU penalty = (recently used CPU value of the process) * (r/32)
and the once-per-second recalculation of the recently used CPU value of each process is:

new recently used CPU value = (old recently used CPU value of the process) * (d/32)



44. ttymon and terminals:
=========================

Solaris:
--------

The configuration of terminals in Solaris 8,9 is somewhat more elaborate than
adding such a device on AIX, for example with the mkdev command.
Here we shall only show the configuration in Solaris 8,9.

Note 1:
-------

In Solaris, the usual getty is taken over by the portmonitor ttymon. 

$ cd /etc
$ ls -al get*
lrwxrwxrwx   1 root     root          21 Aug 10  2004 getty -> ../usr/lib/saf/ttymon


/var/saf/zsmon >sacadm -l
PMTAG          PMTYPE         FLGS RCNT STATUS     COMMAND
zsmon          ttymon         -    0    ENABLED    /usr/lib/saf/ttymon #


$ pmadm -l
PMTAG          PMTYPE         SVCTAG         FLGS ID       <PMSPECIFIC>
zsmon          ttymon         ttya           u    root     /dev/term/a I - /usr/bin/login - 9600 ldterm,ttcompat ttya login:  - tvi925 y  #
zsmon          ttymon         ttyb           u    root     /dev/term/b I - /usr/bin/login - 9600 ldterm,ttcompat ttyb login:  - tvi925 y  #


ls -al \dev\term

lrwxrwxrwx   1 root     root          48 Aug 10  2004 a -> ../../devices/pci@1e,600000/isa@7/serial@0,3f8:a
lrwxrwxrwx   1 root     root          48 Aug 10  2004 b -> ../../devices/pci@1e,600000/isa@7/serial@0,2e8:b


Note 2:
-------

Solaris 2.x systems come with a ttymon port monitor named zsmon and with serial ports A and B already 
configured with default settings for terminals, as shown in the following example:

castle% /usr/sbin/sacadm -l
PMTAG          PMTYPE         FLGS RCNT STATUS     COMMAND
zsmon          ttymon         -    0    ENABLED    /usr/lib/saf/ttymon #

castle% /usr/sbin/pmadm -l
PMTAG         PMTYPE         SVCTAG         FLGS ID       <PMSPECIFIC>
tcp      listen   lp          - root    - p -

$ sacadm -l
PMTAG          PMTYPE         FLGS RCNT STATUS     COMMAND
zsmon          ttymon         -    0    ENABLED    /usr/lib/saf/ttymon #

Note 3:
-------

$ tail -30 /var/saf/zsmon/log

Wed Mar 16 13:13:59 2005; 453; ********** ttymon starting **********
Wed Mar 16 13:13:59 2005; 453; PMTAG:            zsmon
Wed Mar 16 13:13:59 2005; 453; Starting state: enabled
Wed Mar 16 13:13:59 2005; 453; Got SC_ENABLE message
Wed Mar 16 13:13:59 2005; 453; max open files    = 1024
Wed Mar 16 13:13:59 2005; 453; max ports ttymon can monitor = 1017
Wed Mar 16 13:13:59 2005; 453; *ptr == 0
Wed Mar 16 13:13:59 2005; 453; SUCCESS
Wed Mar 16 13:13:59 2005; 453; *ptr == 0
Wed Mar 16 13:13:59 2005; 453; SUCCESS
Wed Mar 16 13:13:59 2005; 453; Initialization Completed
Mon Mar 21 08:02:27 2005; 453; caught SIGTERM
Mon Mar 21 08:02:27 2005; 453; ********** ttymon exiting ***********
Mon Mar 21 08:05:43 2005; 453;
Mon Mar 21 08:05:43 2005; 453; ********** ttymon starting **********
Mon Mar 21 08:05:43 2005; 453; PMTAG:            zsmon
Mon Mar 21 08:05:43 2005; 453; Starting state: enabled
Mon Mar 21 08:05:43 2005; 453; Got SC_ENABLE message
Mon Mar 21 08:05:43 2005; 453; max open files    = 1024
Mon Mar 21 08:05:43 2005; 453; max ports ttymon can monitor = 1017
Mon Mar 21 08:05:43 2005; 453; *ptr == 0
Mon Mar 21 08:05:43 2005; 453; SUCCESS
Mon Mar 21 08:05:43 2005; 453; *ptr == 0
Mon Mar 21 08:05:43 2005; 453; SUCCESS
Mon Mar 21 08:05:43 2005; 453; Initialization Completed

Note 4:
-------

     ttymon is a STREAMS-based TTY port monitor.  Its function is
     to  monitor  ports,  to  set terminal modes, baud rates, and
     line disciplines for the ports, and   to  connect  users  or
     applications  to  services  associated  with the ports. Nor-
     mally, ttymon is configured  to run under the Service Access
     Controller, sac(1M), as part of the  Service Access Facility
     (SAF). It is configured using the  sacadm(1M) command.  Each
     instance  of  ttymon  can  monitor multiple ports. The ports
     monitored by an instance of ttymon are specified in the port
     monitor's  administrative  file.  The administrative file is
     configured using the pmadm(1M) and ttyadm(1M) commands. When
     an  instance  of  ttymon  is  invoked by the sac command, it
     starts to monitor its ports. For  each  port,  ttymon  first
     initializes the line disciplines, if they are specified, and
     the speed and terminal settings. For ports with  entries  in
     /etc/logindevperm,  device  owner, group and permissions are
     set. (See logindevperm(4).) The values used for  initializa-
     tion  are  taken  from the appropriate entry in the TTY set-
     tings file. This file is maintained by the sttydefs(1M) com-
     mand.  Default  line disciplines on ports are usually set up
     by the autopush(1M) command of the Autopush Facility.

     ttymon then writes the prompt and waits for user  input.  If
     the user indicates that the speed is inappropriate by press-
     ing the BREAK key, ttymon tries the next  speed  and  writes
     the  prompt  again.  When  valid  input  is received, ttymon
     interprets the per-service configuration file  for the port,
     if  one  exists,  creates  a  utmpx  entry  if required (see
     utmpx(4)), establishes the  service  environment,  and  then
     invokes  the  service  associated with the port. Valid input
     consists of a string of at least one non-newline  character,
     terminated  by  a  carriage  return.  After the service ter-
     minates,  ttymon cleans up the utmpx entry, if  one  exists,
     and returns the port to its initial state.

     If autobaud is enabled for a port, ttymon will try to deter-
     mine  the  baud  rate on the port automatically.  Users must
     enter a carriage return before ttymon can recognize the baud
     rate  and  print  the prompt. Currently, the baud rates that
     can be determined by autobaud are 110, 1200, 2400, 4800, and
     9600.

SunOS 5.9           Last change: 11 Dec 2001                    1

System Administration Commands                         ttymon(1M)

     If a port is configured as a bidirectional port, ttymon will
     allow  users  to  connect  to a service, and, if the port is
     free, will allow uucico(1M), cu(1C), or ct(1C) to use it for
     dialing out. If a port is bidirectional, ttymon will wait to
     read a character before it prints a prompt.

     If the connect-on-carrier flag is set  for  a  port,  ttymon
     will immediately invoke the port's associated service when a
     connection request is received. The prompt message will  not
     be sent.

     If a port is disabled, ttymon will not start any service  on
     that  port.  If a disabled message is specified, ttymon will
     send out the disabled message when a connection  request  is
     received.  If  ttymon  is  disabled,  all  ports  under that
     instance of ttymon will also be disabled.

SERVICE INVOCATION
     The service ttymon invokes for a port is  specified  in  the
     ttymon  administrative file.  ttymon will scan the character
     string giving the service to be invoked for this port, look-
     ing for a %d or a %% two-character sequence. If %d is found,
     ttymon will modify the service command  to  be  executed  by
     replacing those two characters by the full path name of this
     port (the device  name).  If  %%  is  found,  they  will  be
     replaced  by  a  single %. When the service is invoked, file
     descriptor 0, 1, and 2 are opened to  the  port  device  for
     reading  and  writing.  The service is invoked with the user
     ID, group ID and current home directory set to that  of  the
     user  name  under  which  the  service  was  registered with
     ttymon. Two environment variables, HOME and  TTYPROMPT,  are
     added to the service's environment by ttymon. HOME is set to
     the home directory of the user name under which the  service
     is invoked. TTYPROMPT is set to the prompt string configured
     for the service on the port. This is provided so that a ser-
     vice  invoked  by  ttymon  has  a  means of determining if a
     prompt was actually issued by ttymon and, if so,  what  that
     prompt actually was.

     See ttyadm(1M) for options that can be set for  ports  moni-
     tored by ttymon under  the Service Access Controller.

SECURITY
     ttymon uses pam(3PAM) for session management.  The PAM  con-
     figuration  policy,  listed through /etc/pam.conf, specifies
     the modules to  be  used  for  ttymon.  Here  is  a  partial
     pam.conf file with entries for ttymon using the UNIX session
     management module.

     ttymon  session   required  /usr/lib/security/pam_unix.so.1

SunOS 5.9           Last change: 11 Dec 2001                    2

System Administration Commands                         ttymon(1M)

     If there are no entries for the  ttymon  service,  then  the
     entries for the "other" service will be used.

Note 5:
-------

To add a login service to configure an existing port. Follow these steps to configure the SAF for 
a character terminal:

1.  Become superuser. 
2.  Type sacadm -l and press Return. Check the output to make sure that a ttymon port monitor is configured. 
    It is unlikely that you will need to add a new port monitor. If you do need to add one, type 

    sacadm -a -p pmtag -t ttymon -c /usr/lib/saf/ttymon -v `ttymon -V` and press Return. 

3.  Type 

    pmadm -a -p pmtag -s svctag -i root -fu -v `ttymon -V` -m "`ttyadm -t terminfo-type -d dev-path \
    -l ttylabel -s /usr/bin/login`" 

    and press Return. The port is configured for a login service. 
4.  Attach all of the cords and cables to the terminal and turn it on. 


In this example, a ttymon port monitor called ttymon0 is created and a login is 
enabled for serial port /dev/term/00:


oak% su
Password:
# sacadm -l
PMTAG          PMTYPE    FLGS RCNT STATUS      COMMAND
zsmon        ttymon  -  O  ENABLED  /usr/lib/saf/ttymon #
# sacadm -a -p ttymonO -t ttymon -c /usr/lib/saf/ttymon -v`ttyadm -V`
# sacadm -l
PMTAG          PMTYPE    FLGS RCNT STATUS     COMMAND
ttymonmO     ttymon   -  O  STARTING   /usr/lib/saf/ttymon #
zsmon        ttymon  -  O  ENABLED  /usr/lib/saf/ttymon #
# pmadm -a -p ttymonO -s ttyOO -i root -fu
-v `ttyadm -V` -m "`ttyadm -t tvi925 -d
/dev/term/OO -l 96OO -s
/usr/bin/login`"
# pmadm -l
PMTAG          PMTYPE         SVCTAG        FLGS ID       <PMSPECIFIC>
zsmon        ttymon   ttya       u root     /dev/term/a I -
/usr/bin/login - 96OO ldterm,ttcompat ttya login:  - tvi925 y
#
zsmon        ttymon   ttyb       u root     /dev/term/b I -
/usr/bin/login - 96OO ldterm,ttcompat
ttyb login:  - tvi925 y
#
ttymonO         ttymon    ttyOO    u root     /dev/term/OO - - -
?/usr/bin/login - 96OO login: - tvi925 - #
#


Add a port monitor         sacadm -a -p pmtag -t ttymon -c /usr/lib/saf/ttymon -v `ttyadm -V` -y "comment"  
Disable a port monitor     sacadm -d -p pmtag  
Enable a port monitor      sacadm -e -p pmtag  
Kill a port monitor        sacadm -k -p pmtag  
List status information 
for a port monitor         sacadm -l -p pmtag  
Remove a port monitor      sacadm -r -p pmtag  
Start a port monitor       sacadm -s -p pmtag  
Add a listen port monitor  sacadm -a -p pmtag -t listen -c /usr/lib/saf/listen -v `ttyadm -V` -y "comment"  


Add a standard terminal service  pmadm -a -p pmtag -s svctag -i root -v `ttyadm -V` -m "`ttyadm -i `terminal disabled.' -l contty -m ldterm,ttcompat -d dev-path -s /usr/bin/login`"  
Disable a ttymon port monitor    pmadm -d -p pmtag -s svctag  
Enable a ttymon port monitor     pmadm -e -p pmtag -s svctag  
List all services                pmadm -l  
List status information for one 
ttymon port monitor              pmadm -l -p pmtag -s svctag  
Add a listen service             pmadm -a -p pmtag -s lp -i root -v `nlsadmin -V` -m "`nlsadmin -o /var/spool/lp/fifos/listenS5`"  
Disable a listen port monitor    pmadm -d -p pmtag -s lp  
Enable a listen port monitor     pmadm -e -p pmtag -s lp  
List status information for 
one ttymon port monitor          pmadm -l -p pmtag  


Note 7:
-------

3.23) What has happened to getty? What is pmadm and how do you use it? 
I was hoping you wouldn't ask. PMadm stands for Port Monitor Admin, and it's part of a ridiculously complicated 
bit of software over-engineering that is destined to make everybody an expert. 

Best advice for workstations: don't touch it! It works out of the box. For servers, you'll have to read the manual. 
This should be in admintool in Solaris 2.3 and later. For now, here are some basic instructions from Davy Curry. 

"Not guaranteed, but they worked for me." 

To add a terminal to a Solaris system: 

1. Do a "pmadm -l" to see what's running. The serial ports on the CPU board are probably already being monitored by "zsmon". 


PMTAG          PMTYPE         SVCTAG         FLGS ID       <PMSPECIFIC>
zsmon          ttymon         ttya           u    root     \
	    /dev/term/a I - /usr/bin/login - 9600 ldterm,ttcompat ttya \
	    login:  - tvi925 y  #

2. If the port you want is not being monitored, you need to create a new port monitor with the command 


	    sacadm -a -p PMTAG -t ttymon -c /usr/lib/saf/ttymon -v VERSION

where PMTAG is the name of the port monitor, e.g. "zsmon" or "alm1mon", and VERSION is the output of "ttyadm -V". 

3. If the port you want is already being monitored, and you want to change something, you need to delete the current instance of the port monitor. To do this, use the command 


	    pmadm -r -p PMTAG -s SVCTAG

where PMTAG and SVCTAG are as given in the output from "pmadm -l". Note that if the "I" is present in the <PMSPECIFIC> field (as it is above), you need to get rid of it. 

4. Now, to create a specific instance of ttymon for a port, issue the command: 


pmadm -a -p PMTAG -s SVCTAG -i root -fu -v 1 -m \
	    "`ttyadm -m ldterm,ttcompat -p 'PROMPT' -S YORN -T TERMTYPE \
	    -d DEVICE -l TTYID -s /usr/bin/login`"

Note the assorted quotes; Bourne shell (sh) and Korn (ksh) users leave off the second backslash! 

In the above: 

PMTAG is the port monitor name you made with "sacadm", e.g. "zsmon". 
SVCTAG is the service tag, which can be the name of the port, e.g., "ttya" or "tty21". 
PROMPT is the prompt you want to print, e.g. "login: ". 
YORN is "y" to turn software carrier on (you want this for directly connected terminals" and "n" to leave it off 
(you want this for modems). 
TERMTYPE is the value you want in $TERM. 
DEVICE is the name of the device, e.g. "/dev/term/a" or "/dev/term/21". 
TTYID is the line you want from /etc/ttydefs that sets the baud rate and stuff. I suggest you use one of the 
"contty" ones for directly connected terminals. 

5. To disable ("turn off") a terminal, run 


	    pmadm -d -p PMTAG -s SVCTAG

To enable ("turn on") a terminal, run 


	    pmadm -e -p PMTAG -s SVCTAG

Ports are enabled by default when you "create" them as above. 


Note 8:
-------


You use three SAF commands to administer modems and alphanumeric terminals: sacadm, pmadm, and ttyadm.

-- The sacadm command adds and removes port monitors. This command is your main link with the Service Access Controller (SAC) 
and its administrative file (/etc/saf/_sactab).

-- The pmadm command adds or removes a service and associates a service with a particular port monitor.

-- The ttyadm command formats information for inclusion in various SAF administrative files. A ttyadm command often is embedded 
within a sacadm or pmadm command to provide some of the data needed by those commands. 



Function  			   Program  		Description  
Overall administration  	   sacadm		Command for adding and removing port monitors  
Service Access Controller	   sac			SAF's master program  
Port monitors  			   ttymon		Monitors serial port login requests  
 				   listen		Monitors requests for network services  
Port monitor service administrator pmadm		Command for controlling port monitors' services  
Services  			   logins; 
				   remote procedure calls; 
				   other		Services to which SAF provides access  



45: CDE:
========

Start Login Manager:
--------------------

The login Server, also called the Login Manager, usually starts up the CDE environment when the system
is booted and the "/etc/rc2.d/S99dtlogin" script is run.
The login Server is a server responsible for displaying a graphical logon screen, authenticating users,
and starting a user session.
It can display a login screen on local or network bitmap displays 

It can also be started from the command line, for example, to start the Login Server use either:

# /etc/init.d/dtlogin start
or
# /usr/dt/bin/dtlogin -deamon; exit

To set the Login Manager to start CDE the next time the system is booted, give the command

# /usr/dt/bin/dtconfig -e


Stop Login manager:
-------------------

To stop the Login Manager, use

# /etc/init.d/dtlogin stop
or
# /usr/dt/bin/dtconfig -kill

If you do not want the CDE startup if the system is booted use

# /usr/dt/bin/dtconfig -d


Other facts of the Login manager:
---------------------------------

By default the Login manager stores its PID in /var/dt/Xpid

The login manager is configurable throug a number of files like "Xconfig".
You should copy "/usr/dt/config" to "/etc/dt/config" and make modifications there.
To tell the Login Manager to reread Xconfig, use

# /usr/dt/bin/dtconfig -reset


Displaying a Login screen:
--------------------------

Upon startup, the Login Server checks the Xservers file to determine if an X server needs to be
started and to determine if and how login screens should be displayed on local or network displays.
To modify Xservers, copy Xservers from /usr/dt/config to /etc/dt/config.
After modifying, tell the login server to reread Xservers by
# /usr/dt/bin/dtconfig -reset

The format of a record in Xservers is:

display_name display_class display_type X_server_command

display_name     = the connection name to use when connecting to the X server (:0)
                   An * is expanded to hostname:0
display_class    = identifies resources specific to this display (for example Local)
display_type     = tells the Login manager whether the display is local or a network display.
X_server_command = identifies the commandline, connection number, and other options the
                   Login server will use to start the X server (/usr/bin/X11/X :0)
                   The connection number must match the number specified in display_name.

The default Xservers line is similar to:

:0 Local local@console /usr/bin/X11/X :0


Running the Login Server without a Local bitmap display:
--------------------------------------------------------

If your login server has no bitmap display, you should comment ou the line shown above like:

# :0 Local local@console /usr/bin/X11/X :0

So when the login server starts, it runs in the background waiting for requests from
network displays.


46. Make command:
================

Note 1: (Not geared to any particular unix version):
----------------------------------------------------

ABOUT MAKE

The make utility executes a list of shell commands associated with each target, typically to create 
or update a file of the same name. makefile contains entries that describe how to bring a target 
up to date with respect to those on which it depends, which are called dependencies.

SYNTAX

/usr/ccs/bin/make [ -d ] [ -dd ] [ -D ] [ -DD ] [ -e ] [ -i ] [ -k ] [ -n ] [ -p ] [ -P ] [ -q ] 
[ -r ] [ -s] [ -S ] [ -t ] [ -V ] [ -f makefile ] ... [-K statefile ] ... [ target ... ] [ macro = value ... ]

/usr/xpg4/bin/make [ -d ] [ -dd ] [ -D ] [ -DD ] [ -e ] [ -i ] [ -k ] [ -n ] [ -p ] [ -P ] [ -q ] 
[ -r ] [ -s] [ -S ] [ -t ] [ -V ] [ -f makefile ] ... [ target... ] [ macro = value ... ]



DESCRIPTION
     The make utility executes a list of shell  commands  associ-
     ated  with each target, typically to create or update a file
     of the same name. makefile contains  entries  that  describe
     how  to  bring  a target up to date with respect to those on
     which it depends, which are called dependencies. Since  each
     dependency is a target, it may have dependencies of its own.
     Targets, dependencies, and sub-dependencies comprise a  tree
     structure  that  make traces when deciding whether or not to
     rebuild a target.

     The make utility recursively checks each target against  its
     dependencies,  beginning  with  the  first  target  entry in
     makefile if no target argument is supplied  on  the  command
     line. If, after processing all of its dependencies, a target
     file is found either to be missing, or to be older than  any
     of  its dependencies, make rebuilds it. Optionally with this
     version of make, a target can be treated as out-of-date when
     the commands used to generate it have changed since the last
     time the target was built.

     To build a given target, make executes the list of commands,
     called  a  rule.  This  rule may be listed explicitly in the
     target's makefile entry, or it may be supplied implicitly by
     make.

     If no target is specified on the command line, make uses the
     first target defined in makefile.

     If a target has no makefile entry, or if its  entry  has  no
     rule,  make attempts to derive a rule by each of the follow-
     ing methods, in turn, until a suitable rule is  found.  Each
     method is described under USAGE below.


Note 2: An example
------------------

# find . -name "make" -print
./usr/ccs/bin/make
./usr/share/lib/make
./usr/xpg4/bin/make
./usr/appserver/samples/rmi-iiop/cpp/src/client/make

/opt/app/oracle/product/9.2/sqlplus/lib >/usr/ccs/bin/make -f ins_sqlplus.mk install


If you want to do compilations on Solaris, it is best not have /usr/ucb
in your PATH. If you want to have /usr/ucb in the PATH it must be the last
entry. You also should put /usr/ccs/bin/ before /usr/xpg4/bin/ in the PATH
to make sure that /usr/ccs/bin/make is used and not /usr/xpg4/bin/make.

To be able to use 'make' 'as' and 'ld' you need to make sure that 
/usr/ccs/bin is in your path.

Alan Coopersmith <alanc@alum.calberkeley.org> wrote:
> rhugga@yahoo.com (Keg) writes in comp.sys.sun.admin:
> |Just curious what the stuff under /usr/ucb is for? I was looking at
> |the ps utility and apparently they are the same fiel in 2 different
> |places:

> For users and scripts that expect the BSD style options, in cases such
> as ps & ls where they are incompatible with the SvsV options found in
> the /usr/bin versions.

It's there for historical reasons.  SunOS 4.x was based on BSD unix.
Solaris 2.x (= SunOS 5.x) was based on SYSV, with a bunch of commands
having different syntax and behavior.  To ease the transition, the
/usr/ucb directory was created to hold the incompatible BSD versions.
People who really wanted BSD could put /usr/ucb before /usr in their
PATH.

Note 3:
-------

How to write a simple makefile.
Let use start with a very simple example. Suppose the executable sortit depends on the main Fortran source file
"sortit_main.f90" and 2 additional files "readN.f90" and "sortarray.f90". 
The source files can be compiled and linked in 1 f90 command:

f90 -o sortit sortit_main.f90 readN.f90 sortarray.f90

Now suppose only one file changes, and the files are not small but contains many codelines, then
a better approach could be this:
Suppose you seperate the compilation and linking stages:

- compile into objectfiles:
f90 -c sortit sortit_main.f90 readN.f90 sortarray.f90

- link the files:
f90 -o sortit sortit_main.o readN.f90.o sortarray.o

Suppose there were many source files, and thus many objectfiles.
In this case it's better to make one definitionfile which explains it all. So if one source changes,
the corresponding objectfile is out of date, and needs to be recreated. 
All that information can be in a definitionfile, for example:

sortit:  sortit_main.o readN.o sortarray.o
	f90 -o sortit sortit_main.o readN.o sortarray.o

sortit_main.o: sortit_main.f90
	f90 -c sortit_main.f90

readN.o: readN.f90
	f90 -c readN.f90

sortarray.o: sortarray.f90
	f90 -c sortarray.f90

By default, make looks for a makefile called "makefile" in the current directory. Alternative files can
be specified with the -f option followed by the name of the makefile, for example:

make -f makefile1.mk

or

make -f makefile1.mk install

One of the labels present in the Makefile happens to be named ' install ' .

Further explanation:
--------------------

The make utility is embedded in UNIX history. It is designed to decrease a programmer's need to remember things. 
I guess that is actually the nice way of saying it decreases a programmer's need to document. In any case, 
the idea is that if you establish a set of rules to create a program in a format make understands, you don't have 
to remember them again. 

To make this even easier, the make utility has a set of built-in rules so you only need to tell it what new things 
it needs to know to build your particular utility. For example, if you typed in make love, make would first look 
for some new rules from you. If you didn't supply it any then it would look at its built-in rules. One of those 
built-in rules tells make that it can run the linker (ld) on a program name ending in .o to produce the 
executable program. 

So, make would look for a file named love.o. But, it wouldn't stop there. Even if it found the .o file, 
it has some other rules that tell it to make sure the .o file is up to date. In other words, newer than 
the source program. The most common source program on Linux systems is written in C and its file name ends in .c. 

If make finds the .c file (love.c in our example) as well as the .o file, it would check their timestamps 
to make sure the .o was newer. If it was not newer or did not exist, it would use another built-in rule to 
build a new .o from the .c (using the C compiler). This same type of situation exists for other 
programming languages. The end result, in any case, is that when make is done, assuming it can find the 
right pieces, the executable program will be built and up to date. 

The old UNIX joke, by the way, is what early versions of make said when it could not find the necessary files. 
In the example above, if there was no love.o, love.c or any other source format, the program would have said:
make: don't know how to make love. Stop. 

Getting back to the task at hand, the default file for additional rules in Makefile in the current directory. 
If you have some source files for a program and there is a Makefile file there, take a look. It is just text. 
The lines that have a word followed by a colon are targets. That is, these are words you can type following 
the make command name to do various things. If you just type make with no target, the first target will be executed. 

What you will likely see at the beginning of most Makefile files are what look like some assignment statements. 
That is, lines with a couple of fields with an equal sign between them. Surprise, that is what they are. 
They set internal variables in make. Common things to set are the location of the C compiler (yes, there is a default), 
version numbers of the program and such. 

This now beings up back to configure. On different systems, the C compiler might be in a different place, you might 
be using ZSH instead of BASH as your shell, the program might need to know your host name, it might use a 
dbm library and need to know if the system had gdbm or ndbm and a whole bunch of other things. 
You used to do this configuring by editing Makefile. Another pain for the programmer and it also meant that 
any time you wanted to install software on a new system you needed to do a complete inventory of what was where. 

As more and more software became available and more and more POSIX-compliant platforms appeared, this got harder 
and harder. This is where configure comes in. It is a shell script (generally written by GNU Autoconf) that goes up 
and looks for software and even tries various things to see what works. It then takes its instructions 
from Makefile.in and builds Makefile (and possibly some other files) that work on the current system. 

Background work done, let me put the pieces together. 

You run configure (you usually have to type ./configure as most people don't have the current directory in their 
search path). This builds a new Makefile. 
Type make This builds the program. That is, make would be executed, it would look for the first target in Makefile 
and do what the instructions said. The expected end result would be to build an executable program. 
Now, as root, type make install. This again invokes make, make finds the target install in Makefile and files 
the directions to install the program. 
This is a very simplified explanation but, in most cases, this is what you need to know. With most programs, 
there will be a file named INSTALL that contains installation instructions that will fill you in on 
other considerations. For example, it is common to supply some options to the configure command to change the 
final location of the executable program. There are also other make targets such as clean that remove unneeded 
files after an install and, in some cases test which allows you to test the software between the make and 
make install steps.


47. mkitab:
===========

AIX:

mkitab Command
Purpose
Makes records in the /etc/inittab file.

Syntax
mkitab [ -i Identifier ] { [ Identifier ] : [ RunLevel ] : [ Action ] : [ Command ] }

Description
The mkitab command adds a record to the /etc/inittab file. 
The Identifier:RunLevel:Action:Command parameter string specifies the new entry to the /etc/inittab file. 
You can insert a record after a specific record using the -i Identifier flag. The command finds the field 
specified by the Identifier parameter and inserts the new record after the one identified by 
the -i Identifier flag.

Example:

To add a new record to the /etc/inittab file, telling the init command to handle a login on tty2, 
enter: 

mkitab "tty002:2:respawn:/usr/sbin/getty /dev/tty2"

To change currently existing entries from the file, use the chitab command. For example, to change 
tty2's runlevel, enter the command

chitab "tty002:23:respawn:/usr/sbin/getty /dev/tty2"

chitab "rcnfs:23456789:off:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons"


This is also why an /etc/inittab is usually much bigger in AIX compared to Solaris.

rmitab Command
Purpose
Removes records in the /etc/inittab file. 

Syntax
rmitab Identifier


Description
The rmitab command removes an /etc/inittab record. You can specify a record to remove by 
using the Identifier parameter. The Identifier parameter specifies a field of one to fourteen 
characters used to uniquely identify an object. If the Identifier field is not unique, the command is unsuccessful. 

Examples
To remove the tty entry for tty2 , enter:

rmitab "tty002"



48. Starting and stopping deamons:
==================================

AIX:
----

AIX has a unique way of managing processes: the System Resource Controller (SRC). The SRC takes 
the form of a daemon, "/usr/sbin/srcmstr", which is started by init via /etc/inittab. srcmstr manages requests 
to start, stop, or refresh a daemon or a group of daemons. Instead of typing the name of a 
daemon to start it, or instead of using the kill command to stop a daemon, you use an SRC command 
that does it for you. In this way you don't have to remember, for example, whether to use an ampersand 
when starting a daemon, or what signal to use when killing one. SRC also allows you to stop and start 
groups of related daemons with one command. 

AIX has a hierarchical organization of system processes, and this organization is configured into the ODM 
in the form of the SRCsubsys and SRCsubsvr object classes. Daemons at the lowest levels are subservers. 
On a newly loaded system the only subservers are those of the inetd subsystem: 
ftp, telnet, login, finger, etc. To view these subservers, use the odmget command: 

To start a subsystem, for example
# startsrc -s lpd

To stop a subsystem, for example
# stopsrc -s lpd

You can also use the refresh command, after for example editing a .conf file and you need the
subsystem to reparse the config file.
For example, you have started the httpd demon 

# startsrc -s httpd

Now you have edited the /etc/httpd.conf file. To refresh the deamon, use the following command:

# refresh -s httpd


To list the status of a subsystem, use for example
# lssrc -g nfs
# lssrc -s sshd

Subsystem     Group    Pid    Status
biod          nfs      11354  active
rpc.lockd     nfs      11108  active
nfsd          nfs             inoperative
rpc.statd     nfs             inoperative
rpc.mountd    nfs             inoperative
rpc.mountd    nfs             inoperative



Starting and stopping daemons in general:
-----------------------------------------

In general, and in most cases, daemons which are not under the control of some resource controller, can be
stopped or started in a way as shown in the following "stanza":

# <script_name> stop
# <script_name> start

In many occasions, a script associated with the daemon is available, that will take "stop"or "start"
as an argument.
 


49. Inodes, the superblock and related items:
=============================================


49.1 Solaris:
-------------

Following is a "light weight" discussion about the superblock and inodes in the UFS filesystem in Solaris:

When you create an UFS filesystem, the disk slice is divided into cylindergroups. The slice is then divided
into blocks to control and organize the structure of files within the cylinder group.
Each block performs a specific function in the filesystem. 
A UFS filesystem has the following types of blocks:

Boot block: stores information used when booting the system, and is the first 8KB in a slice (partition).
Superblock: stores much of the information about the filesystem. Its located after the bootblock.
Inode     : stores all information about a file except its name
datablock : stores data for each file

The bootblock stores the procedures used in booting the system. Without a bootblock the system does not boot.
If a filesystem is not used for booting, the bootblock is left blank. The bootblock appears only
in the first cylinder group (cylinder group 0) and is the first 8KB in a slice.

The superblock stores much of the information about the filesystem. Following are the items 
contained in a superblock:
- size and status of the fs
- label (filesystem name and volume name)
- size of the fs logical block
- date and time of the last update
- cylinder group size
- number of datablocks in a cylinder group
- summary data block
- fs state (clean, stable, or active)
- pathname of the last mount point

The superblock is located at the beginning of the disk slice and is replicated in each cilinder group.
Because it contains critical data, multiple superblocks are made when the fs is created.
A copy of the superblock for each filesystem is kept up-to-date in memory.
The sync command forces every superblock in memory to write its data to disk.

An inode contains all the information about a file except its name which is kept in a directory.
An inode is 128 bytes. For each file there corresponds one inode. 
The inode information is kept in the cylinder information block and contains the
following:

- the type of file (regular file, directory, block special, character special, link)
- mode of the file (rwxrwxrwx)
- number of hard links to the file
- userid of the owner
- groupid
- number of bytes in the file
- an array of 15 disk-block addresses
- date and time the file was last accessed
- date and time the file was last modified
- date and time the file was created

The maximum number of files per UFS file system is determined by the number of inodes
allocated for a filesystem. The number of inodes depends on the amount of diskspace that
is allocated for each inode and the total size of the filesystem.
By default, on inode is allocated for each 2KB of dataspace. You can change this default
with the newfs command. 

Inodes include pointers to the data blocks. Each inode contains 15 pointers: 


the first 12 pointers point directly to data blocks 
the 13th pointer points to an indirect block, a block containing pointers to data blocks 
the 14th pointer points to a doubly-indirect block, a block containing 128 addresses of singly indirect blocks 
the 15th pointer points to a triply indirect block (which contains pointers to doubly indirect blocks, etc.) 

-------------------------------
| | | | | | | | | | | | | | | |
-------------------------------
 | | | | | | | | | | | | | | |--------------------------
      data blocks        | |-----------|               |
                         |             |               |
                       -----         -----           -----
                       |   |         |   |           |   |
                       -----         -----           -----
                        |||           |||             |||
                        data         -----           -----
                                     |   |           |   |
                                     -----           -----
                                      |||             |||
                                      data           -----
                                                     |   |
                                                     -----
                                                      |||
                                                      data



---------------------------------------------------------------------------
|          |           | | | | | | | |        |  |  |  |      |  |        |
| B. B.    | S. B.     | Inodes  | | | ...    |  Many Data Blocks ......  |
|          |           | | | | | | | |        |  |  |  |      |  |        |
---------------------------------------------------------------------------

In order to create a UFS filesystem on a formatted disk that already has been divided into slices
you need to know the raw device filename of the slice that will contain the filesystem.
Example:

# newfs /dev/rdsk/c0t3d0s7

defaults on UFS on Solaris: 
blocksize 8192
fragmentsize 1024
one inode for each 2K of diskspace


49.2 AIX:
---------

Although we use the LVM to create Volume Groups, and Logical Volumes within a Volume Group,
a file system resides on a single logical volume. 
Every file and directory belongs to a file system within a logical volume. 

The mkfs (make file system) command, or crfs command, or the System Management Interface Tool (smit command) 
creates a file system on a logical volume. 

- crfs
The crfs command creates a file system on a logical volume within a previously created volume group. 
A new logical volume is created for the file system unless the name of an existing logical volume 
is specified using the -d. An entry for the file system is put into the /etc/filesystems file.

By the way, a newly installed AIX 5.x system has the following filesystem structure:

"/" root is a filesystem. Certain standard directories are present within "/", like for example /bin.
But also a set of separate filesystems like hd2=/usr, hd3=/tmp, hd9var=/var, are MOUNTED over the 
coresponding named directories or mountpoints.

                              /
                              |
                 ----------------------------------------
                 |      |     |      |      |     |     |
                /bin   /dev  /etc    /usr   /tmp  /var  /home
               directories           file systems


So, when you unmount all extra (later on) defined filesystems like /export, /software etc..
you still have / (with its standard directories like /etc, /bin etc..) and the standard filesystems 
like /usr etc..


inodes:
-------

-- Working with JFS i-nodes:
-- -------------------------


Files in the journaled file system (JFS) are represented internally as index nodes (i-nodes). Journaled file system 
i-nodes exist in a static form on disk and contain access information for the file as well as pointers to the 
real disk addresses of the file's data blocks. The number of disk i-nodes available to a file system is 
dependent on the size of the file system, the allocation group size (8 MB by default), and the number of bytes 
per i-node ratio (4096 by default). These parameters are given to the mkfs command at file system creation. 
When enough files have been created to use all the available i-nodes, no more files can be created, even if 
the file system has free space. The number of available i-nodes can be determined by using the df -v command. 
Disk i-nodes are defined in the /usr/include/jfs/ino.h file. 

When a file is opened, an in-core i-node is created by the operating system. The in-core i-node contains 
a copy of all the fields defined in the disk i-node, plus additional fields for tracking the in-core i-node. 
In-core i-nodes are defined in the /usr/include/jfs/inode.h file. 

Disk i-node Structure for JFS

Each disk i-node in the journaled file system (JFS) is a 128-byte structure. 

The offset of a particular i-node within the i-node list of the file system produces the unique number 
(i-number) by which the operating system identifies the i-node. A bit map, known as the i-node map, tracks the 
availability of free disk i-nodes for the file system. 

Disk i-nodes include the following information: 

Field         Contents  
i_mode        Type of file and access permission mode bits  
i_size        Size of file in bytes  
i_uid         Access permissions for the user ID  
i_gid         Access permissions for the group ID  
i_nblocks     Number of blocks allocated to the file  
i_mtime       Last time file was modified  
i_atime       Last time file was accessed  
i_ctime       Last time i-node was modified  
i_nlink       Number of hard links to the file  
i_rdaddr[8]   Real disk addresses of the data  
i_rindirect   Real disk address of the indirect block, if any  


It is impossible to change the data of a file without changing the i-node, but it is possible to change the i-node 
without changing the contents of the file. For example, when permission is changed, the information within the 
i-node (i_ctime) is modified, but the data in the file remains the same. 

The i_rdaddr field within the disk i-node contains 8 disk addresses. These addresses point to the first 
8 data blocks assigned to the file. The i_rindirect field address points to an indirect block. 
Indirect blocks are either single indirect or double indirect. Thus, there are three possible geometries 
of block allocation for a file: direct, indirect, or double indirect. Use of the indirect block and other 
file space allocation geometries are discussed in the article JFS File Space Allocation . 

Disk i-nodes do not contain file or path name information. Directory entries are used to link file names to 
i-nodes. Any i-node can be linked to many file names by creating additional directory entries with the 
link or symlink subroutine. To discover the i-node number assigned to a file, use the ls -i command. 

The i-nodes that represent files that define devices contain slightly different information from i-nodes 
for regular files. Files associated with devices are called special files. There are no data block addresses 
in special device files, but the major and minor device numbers are included in the i_rdev field. 

In normal situations, a disk i-node is released when the link count (i_nlink) to the i-node equals 0. 
Links represent the file names associated with the i-node. When the link count to the disk i-node is 0, 
all the data blocks associated with the i-node are released to the bit map of free data blocks for the file system. 
The i-node is then placed on the free i-node map. 

In-core i-node Structure

When a file is opened, the information in the disk i-node is copied into an in-core i-node for easier access. 
The in-core i-node structure contains additional fields which manage access to the disk i-node's valuable data. 
The fields of the in-core i-node are defined in the inode.h file. Some of the additional information tracked 
by the in-core i-node is: 

-Status of the in-core i-node, including flags that indicate: 
  An i-node lock 
  A process waiting for the i-node to unlock 
  Changes to the file's i-node information 
  Changes to the file's data 
-Logical device number of the file system that contains the file 
-i-number used to identify the i-node 
-Reference count. When the reference count field equals 0, the in-core i-node is released. 

When an in-core i-node is released (for instance with the close subroutine), the in-core i-node 
reference count is reduced by 1. If this reduction results in the reference count to the in-core i-node 
becoming 0, the i-node is released from the in-core i-node table, and the contents of the in-core i-node 
are written to the disk copy of the i-node (if the two versions differ). 



-- Working with JFS2 i-nodes:
-- --------------------------

Files in the enhanced journaled file system (JFS2) are represented internally as index nodes (i-nodes). 
JFS2 i-nodes exist in a static form on the disk and they contain access information for the files as well as 
pointers to the real disk addresses of the file's data blocks. The i-nodes are allocated dynamically by JFS2. 

When a file is opened, an in-core i-node is created by the operating system. The in-core i-node contains 
a copy of all the fields defined in the disk i-node, plus additional fields for tracking the in-core i-node. 
In-core i-nodes are defined in the /usr/include/j2/j2_inode.h file. 


Disk i-node Structure for JFS2
Each disk i-node in JFS2 is a 512 byte structure. The index of a particular i-node allocation map of the 
file system produces the unique number (i-number) by which the operating system identifies the i-node. 
The i-node allocation map tracks the location of the i-nodes on the disk as well as their availability. 

Disk i-nodes include the following information: 

Field      Contents  
di_mode    Type of file and access permission mode bits  
di_size    Size of file in bytes  
di_uid     Access permissions for the user ID  
di_gid     Access permissions for the group ID  
di_nblocks Number of blocks allocated to the file  
di_mtime   Last time file was modified  
di_atime   Last time file was accessed  
di_ctime   Last time i-node was modified  
di_nlink   Number of hard links to the file  
di_btroot  Root of B+ tree describing the disk addresses of the data  





50. sendmail:
=============

Solaris:
--------


To receive SMTP mail from the network, run sendmail as a daemon during system startup. The sendmail daemon listens 
to TCP port 25 and processes incoming mail. In most cases the code to start sendmail is already in one of 
your boot scripts. If it isn't, add it. 


# Start the sendmail daemon:
if [ -x /usr/sbin/sendmail ]; then
  echo "Starting sendmail daemon (/usr/sbin/sendmail -bd -q 15m)..."
  /usr/sbin/sendmail -bd -q 15m
fi

First, this code checks for the existence of the sendmail program. If the program is found, the code displays 
a startup message on the console and runs sendmail with two command-line options. 
One option, the -q option, tells sendmail how often to process the mail queue. In the sample code, the queue is 
processed every 15 minutes (-q15m), which is a good setting to process the queue frequently. 
Don't set this time too low. Processing the queue too often can cause problems if the queue grows very large, 
due to a delivery problem such as a network outage. For the average desktop system, every hour (-q1h) or 
half hour (-q30m) is an adequate setting.

The other option relates directly to receiving SMTP mail. The option (-bd) tells sendmail to run as a daemon 
and to listen to TCP port 25 for incoming mail. Use this option if you want your system to accept incoming TCP/IP mail.

The Linux example is a simple one. Some systems have a more complex startup script. 
Solaris 2.5, which dedicates the entire /etc/init.d/sendmail script to starting sendmail, is a notable example. 
The mail queue directory holds mail that has not yet been delivered. It is possible that the system went down while 
the mail queue was being processed. Versions of sendmail prior to sendmail V8, such as the version that comes 
with Solaris 2.5, create lock files when processing the queue. Therefore lock files may have been left 
behind inadvertently and should be removed during the boot. Solaris checks for the existence of the mail queue directory 
and removes any lock files found there. If a mail queue directory doesn't exist, it creates one. The additional 
code found in some startup scripts is not required when running sendmail V8. 

All you really need is the sendmail command with the -bd option.


nlih30207858-08:/etc/rc2.d $ ps -ef | grep "sendmail"
   smmsp   412     1  0   Jan 09 ?        0:00 /usr/lib/sendmail -Ac -q15m
    root   413     1  0   Jan 09 ?        0:03 /usr/lib/sendmail -bd -q15m


Setup sendmail user and group
Before doing anything else, check that the mail user and group are set up. 
Look in /etc/passwd for user smmsp with uid 25. Then check in /etc/group for group smmsp with gid 25. 
If they are there, good. If not, add them with: 

groupadd -g 25 smmsp 
useradd -u 25 -g smmsp -d / smmsp 

Then edit /etc/passwd and remove the shell. You want the line to look something like "smmsp:x:25:25::/:". 
I notice that Slackware has the line set to "smmsp:x:25:25:smmsp:/var/spool/clientmqueue:", and that's okay too, 
so I leave it at that. 

In Solaris you should have an entry in passwd as follows:
smmsp:x:25:25:SendMail Message Submission Program:/:/sbin/noshell


Stoping and starting sendmail
/etc/rc2.d/S88sendmail stop then start on Sun systems.
/etc/rc.d/init.d/sendmail stop then start on Linux systems.


Note: About mail:
-----------------

mail -f    = show mail in your box
enter the number at the ? prompt to read the mail

examples:

# mail -f
Mail [5.2 UCB] [AIX 5.X]  Type ? for help.
"/root/mbox": 0 messages


# mail -f
Mail [5.2 UCB] [AIX 5.X]  Type ? for help.
"/root/mbox": 3 messages
>   1 root              Tue Nov  1 17:05  13/594
    2 MAILER-DAEMON     Sun Oct 30 07:59 109/3527 "Postmaster notify: see trans"
    3 daemon            Wed Jan 26 10:59  34/1618
? 1
Message  1:
From root Tue Nov  1 17:05:34 2005
Date: Tue, 1 Nov 2005 17:05:34 +0100
From: root
To: root

..
..

Example on how to send a mail:

echo "${FIXED_TEXT}${LOGT}" |mail -s "${SM_MAIL_ONDERWERP}" ${EMAIL_TAB}
echo "Hallo" | mail -s "niet belangrijk" ${EMAIL_TAB}



51. SAR:
========

AIX:
----

sar Command 
Purpose
Collects, reports, or saves system activity information. 

Syntax
/usr/sbin/sar [ { -A | [ -a ] [ -b ] [ -c ] [ -k ] [ -m ] [ -q ] [ -r ] [ -u ] [ -V ] [ -v ] [ -w ] [ -y ] } ] 
[ -P ProcessorIdentifier, ... | ALL ] [ -ehh [ :mm [ :ss ] ] ] [ -fFile ] [ -iSeconds ] [ -oFile ] [ -shh [ :mm [ :ss ] ] ]
[ Interval [ Number ] ]

The sar command writes to standard output the contents of selected cumulative activity counters in the operating system. 
The accounting system, based on the values in the Number and Interval parameters, writes information 
the specified number of times spaced at the specified intervals in seconds. The default sampling interval 
for the Number parameter is 1 second. The collected data can also be saved in the file specified by the -o File flag.

The sar command extracts and writes to standard output records previously saved in a file. This file can be either 
the one specified by the -f flag or, by default, the standard system activity daily data file, 
the /var/adm/sa/sadd file, where the dd parameter indicates the current day.

To report system unit activity, enter: 
# sar

To report current tty activity for each 2 seconds for the next 20 seconds, enter: 
# sar -y -r 2 20

To watch system unit for 10 minutes and sort data, enter: 
# sar -o temp 60 10

To report cpu activity for the first two processors, enter: 
# sar -u -P 0,1
cpu  %usr  %sys  %wio  %idle
0      45    45     5      5
1      27    65     3      5

To report message, semaphore, and cpu activity for all processors and system-wide, enter: 
# sar -mu -P ALL
On a four-processor system, this produces output similar to the following (the last line indicates 
system-wide statistics for all processors): 
cpu  msgs/s  sema/s  %usr  %sys  %wio  %idle
0      7       2       45    45     5     5
1      5       0       27    65     3     5
2      3       0       55    40     1     4
3      4       1       48    41     4     7
-     19       3       44    48     3     5

To collect all the statistics that sar monitors at 60 second intervals for a 10 hour period. 
Also redirects console output to null device

# nohup sar -A -o /tmp/SAR.STATS 60 600 > /dev/null &

The -A switch will cause all of the data collected by sar to be reported. The -ubcwyaqvm switch prevents some 
data from being reported.

On the obsolete AIX versions 4.2 throught 5.1, you should also make sure that the schedtune and vmtune utilities 
can be found in /usr/samples/kernel . If they're not there, install bos.adt.samples. These utilites are used 
to report on the tunable parameters for the VMM and the scheduler, and SarCheck is much more useful if it can 
analyze the values of these parameters. On newer versions of AIX, this is not necessary because we look at 
ioo, schedo, vmo, and vmstat -v for the data we need.


Solaris:
--------

Some specifics for Solaris with regards to the sar command:

How to check File Access:
# sar -a

How to check Buffer Activity: (metadata= inodes, cylinder group blocks etc..)
# sar -b

How to check System Call Statistics:
# sar -c

How to check Disk Activity:
# sar -d

How to check Page-Out and memory:
# sar -g

How to check Kernel Memory Allocation:
# sar -k

How to check Interprocess Communication:
# sar -m

How to check Page-In activity:
# sar -p

How to check Queue Activity:
# sar -q

How to check Unused Memory:
# sar -r

How to check CPU Utilization:
# sar -u



52. Xwindows:
=============

52.1 About the XWindows system:
-------------------------------

The X Window System is a graphics system primarily used on Unix systems (and, less commonly, on VMS, MVS, 
and MS-Windows systems) that provides an inherently client/server oriented base for displaying windowed graphics. 
It provides a public protocol by which client programs can query and update information on X servers. 

The representation of "client" and "server" appears a little bit backwards from most client/server systems. 
Usually, people expect the "local" programs to be called a "client," and for the "server" to be something off 
in the back room. Which nicely represents the way database applications usually work, with many "clients" 
connecting to a central database "server." 

X reverses these roles, which, as the locations of the hosts are reversed, is quite appropriate: 


An X server is a program that manages a video system (and possibly other "interactive" I/O devices such as mice, 
keyboards, and some more unusual devices).

The X server thus typically runs on a user's desktop, typically a relatively non-powerful host that would commonly 
be termed a "client system." It is, in this context, nonetheless acting as a server as it provides graphics services. 

On the other hand, an X client is typically an application program which must connect to an X Server 
in order to display things. 

The client will often run on another host, often a powerful Unix box that would commonly be known as a "server." 
The X client might itself also be a "server process" from some other point of view; there is no contradiction here. 
(Although calling it such may be unwise as it will naturally result in further confusion.)

X nomenclature treats anything that provides display services as an X server. Which is not particularly different 
from someone saying that a program that provides database services is a database server. 

The upshot (and the point) of all this is that this allows use of the X system that allows processes on 
various computers on a network to display stuff on display devices elsewhere on the network.

- GNOME:

GNOME - GNU Network Object Model Environment
GNOME is not a window manager. 

GNOME is an application framework that consists of libraries to assist in application development and a set 
of applications that use those libraries. 

It seeks to provide: 

An API for interapplication communications. This will represent a set of objects running via a CORBA 
Object Request Broker called ORBit.

This is crucial piece of the infrastructure, with which they intend to implement a component architecture 
to build "compound documents" not entirely unlike OpenDoc; without this, GNOME is merely a "pretty face," 
consuming memory and disk space for relatively little value. 

This description strongly parallels that of CDE... 

- K Desktop Environment - KDE

The KDE (K Desktop Environment) Project is building an integrated desktop environment including a window manager, 
file manager/web browser, and other components using the Trolltech "Qt" toolset, a development toolset written 
for C++ that allows applications to be deployed atop either X11 or Win32. 

KDE had been using the MICO CORBA ORB to construct an application embedding framework known as KOM and OpenParts. 
According to the [ KDE-Two: Second KDE Developers Conference], they found themselves unable to use 
the standardized CORBA framework, citing problems with concurrency, reliability and performance, and have 
instead decided to create Yet Another IPC Framework involving a shared library called libICE. 

On the other hand, the KDE Technology Overview for Version 2.0 provides a somewhat different story, 
so it's not completely clear just what is going on; they indicate the use of an IPC scheme called DCOP, 
indicating it to be a layer atop libICE, with the option of also using XML-RPC as an IPC scheme.


52.2 Running Cygwin on a PC, to have a Xwin Server:
---------------------------------------------------

Example of starting a xwin session

C:\cygwin\usr\X11R6\bin\XWin.exe -query hostname -fullscreen -fp tcp/hostname:7100". 


X &
xhost +
export DISPLAY=:0

When using X from a terminal server session, take note of the right ip and port.

-- Enabling x on your terminal:

$ cd ~
$ chmod 755 .
$ chmod 777 .Xauthority
$ set |grep DISPLAY
DISPLAY=localhost:10.0

As the other user"
$ sudo su – otheruser
$ export DISPLAY=localhost:10.0
$ xauth merge /home/beab_krn/sela/.Xauthority



52.3 XWin on AIX:
-----------------

The xdm (X Display Manager) command manages a collection of X displays, which may be on the local host 
or remote servers. The design of the xdm command was guided by the needs of X terminals as well as 
the X Consortium standard XDMCP, the X Display Manager Control Protocol. The xdm command provides services 
similar to those provided by the init, getty, and login commands on character terminals: prompting for 
login name and password, authenticating the user, and running a session.

Starting xdm 
xdm is typically started at system boot time. This is typically done in either an rc file in the /etc directory, 
or in the inittab file. 

Starting xdm in an rc file is usually simply a matter of adding the desired command line to the file, 
as in the example below. 

/usr/bin/X11/xdm -daemon -config /usr/lib/X11/xdm/xdm-config &

IBM wants xdm to integrate into their src subsystem. The AIX version of the above command is a bit different. 

start /usr/bin/X11/xdm $src_running

The problem with this is that since xdm is not supported in R4 under AIX, it is not really integrated into 
the src subsystem, so the attendant startup, shutdown, and other src commands do not work properly. 
An alternative, which works on many other systems as well, is to start xdm from the inittab file. 

xdm:2:respawn:/usr/bin/X11/xdm -nodaemon -config /usr/lib/X11/xdm-config

The -nodaemon flag keeps xdm from starting a daemon and exiting, which would cause the respawn option 
to start another copy of xdm, whereupon the process would repeat itself, quickly filling up your 
process table and dragging your system to its knees attempting to run oodles of managers and servers. 
xdm attempts to use system lock calls to prevent this from happening. It nevertheless happens on some systems. 


52.4 XWin on Linux:
-------------------

52.4.1 Redhat:
--------------

While the heart of Red Hat Linux is the kernel, for many users, the face of the operating system is the 
graphical environment provided by the X Window System, also called simply X. 
This chapter is an introduction to the behind-the-scenes world of XFree86, the open-source implementation 
of X provided with Red Hat Linux. 

X uses a client-server architecture. An X server process is started and X client processes can connect to it 
via a network or local loopback interface. The server process handles the communication with the hardware, 
such as the video card, monitor, keyboard, and mouse. The X client exists in the user-space, issuing requests 
to the X server. 

The X server performs many difficult tasks using a wide array of hardware, requiring detailed configuration. 
If some aspect of your system changes, such as the monitor or video card, XFree86 will need 
to be reconfigured. In addition, if you are troubleshooting a problem with XFree86 that cannot 
be solved using a configuration utility, such as the X Configuration Tool (redhat-config-xfree86), 
you may need to access its configuration file directly. 

Red Hat Linux 8.0 uses XFree86 version 4.2 as the base X Window System, which includes the various 
necessary X libraries, fonts, utilities, documentation, and development tools. 

- The X Window System resides primarily in two locations in the file system: 

/usr/X11R6/ directory 
A directory containing X client binaries (the bin directory), assorted header files (the include directory), 
libraries (the lib directory), and manual pages (the man directory), and various other X documentation 
(the /usr/X11R6/lib/X11/doc/ directory). 

/etc/X11/ directory 
The /etc/X11/ directory hierarchy contains all of the configuration files for the various components 
that make up the X Window System. This includes configuration files for the X server itself, 
the X font server (xfs), the X Display Manager (xdm), and many other base components. 
Display managers such as gdm and kdm, as well as various window managers, and other X tools also store their 
configuration in this hierarchy. 


- The Redhat X configuration tool:

from command line: # redhat-config-xfree86
from X: go to the Main Menu Button (on the Panel) => System Tools => Display

- XFree86 configuration file "etc/X11/XF86Config"

XFree86 version 4 server is a single binary executable - /usr/X11R6/bin/XFree86. This server dynamically 
loads various X server modules at runtime from the "/usr/X11R6/lib/modules/" directory including video drivers, 
font engine drivers, and other modules as needed. Some of these modules are automatically loaded by the server, 
whereas some are optional features that you must specify in the XFree86 server's configuration file, 
"/etc/X11/XF86Config", before they can be used. The video drivers are located in the 
/usr/X11R6/lib/modules/drivers/ directory. The DRI hardware accelerated 3D drivers are located in the 
/usr/X11R6/lib/modules/dri/ directory. 

- Running a simple X client:

You do not have to run a complicated window manager in conjunction with a particular desktop environment 
to use X client applications. Assuming that you are not already in an X environment and do not have 
an .xinitrc file in your home directory, type the xinit command to start X with a basic terminal window 
(the default xterm application). You will see that this basic environment utilizes your keyboard, mouse,
video card, and monitor with the XFree86 server, using the server's hardware preferences. 
Type exit at the xterm prompt to leave this basic X environment.

- Running X: The startx command

When you start X using the "startx" command, a pre-specified desktop environment is utilized. 
To change the default desktop environment used when X starts, open a terminal and type the 
switchdesk command. This brings up a graphical utility that allows you to select the desktop environment 
or window manager to use the next time X starts.

Most users run X from runlevels 3 or 5. Runlevel 3 places your system in multi-user mode with full 
networking capabilities. The machine will boot to a text-based login prompt with all necessary 
preconfigured services started. Most servers are run in runlevel 3, as X is not necessary to provide 
any services utilized by most users. Runlevel 5 is similar to 3, except that it automatically starts X 
and provides a graphical login screen. Many workstation users prefer this method, because it never forces 
them to see a command prompt. 

The default runlevel used when your system boots can be found in the /etc/inittab file. 
If you have a line in that file that looks like id:3:initdefault:, then your system will boot 
to runlevel 3. If you have a line that looks like id:5:initdefault:, your system is set to boot 
into runlevel 5. As root, change the runlevel number in this file to set a different default. 
Save the file and restart your system to verify that it boots to the correct runlevel.

When in runlevel 3, the preferred way to start an X session is to type the startx command. 
startx, a front-end to the xinit program, launches the XFree86 server and connects the X clients to it.



53. TAPE DRIVES:
================

53.1 AIX:
---------

Some usefull examples, using a tape:
------------------------------------

# mksysb -i /dev/rmt0
# backup -0 -uf /dev/rmt0 /data
# tctl -f /dev/rmt0 rewind
# savevg -if /dev/rmt0 uservg

# lsdev -Cc tape
rmt0  Available  10-60-00-5,0  SCSI 8mm Tape Drive

# lsattr -El rmt0
mode           yes     Use DEVICE BUFFERS during writes    True
block_size     1024    Block size (0=variable length)      True
extfm          no      Use EXTENDED file marks             True
ret            no      RETENSION on tape change or reset   True
..
..
To list the default values for that tape device (-D flag), use
# lsattr -l -D rmt0

# lscfg -vl rmt0
Manufacturer...............EXABYTE
Machine Type and Model.....IBM-20GB
Device Specific(Z1)........38zA
Serial Number..............60089837
..
..

Its very important which /dev/rmtx.y you use in some backup command like tar. See the following table:

special file       rewind on close         retension on open     density setting
--------------------------------------------------------------------------------
/dev/rmtx          yes                     no                     #1
/dev/rmtx.1        no                      yes                    #1
/dev/rmtx.2        yes                     yes                    #1
/dev/rmtx.3        no                      yes                    #2
/dev/rmtx.4        yes                     no                     #2
/dev/rmtx.5        no                      no                     #2
/dev/rmtx.6        yes                     yes                    #2
/dev/rmtx.7        no                      yes                    #2


54. WSM Web based System Manager:
=================================

AIX only:
---------

Web based System manager is a graphical user interface administration tool for AIX 5.x systems.
This is a Java based suite of system management tools. 
To start WSM, use the following command from the command line of a graphical console:
# wsm

- The WSM can be run in stand-alone mode, that is, you can use the tool to perform system administration
on the AIX system you are currently running on. 
- However, the WSM also supports a client-server environment.
In this environment, it is possible to administer an AIX system from a remote PC or from another AIX system
using a graphics terminal.
In this environment, the AIX system being administered is the Server and the system you are
performing the administration functions from is the client.

The client can operate in either application mode on AIX with jave 1.3, or in applet mode
on platforms that support Java 1.3. Thus, the AIX system can be managed from another AIX system
or from a PC with a browser and Java.



55. SOFTWARE INSTALLATIONS ON AIX 5.x:
======================================


55.1 Installing VisualAge C++ / C compiler on AIX 5.x:
======================================================

IBM VisualAge is a commandline C and C++ compiler for the AIX operating system.
You can use VisualAge as a C compiler for files with a .c suffix, or as a C++ compiler
for files with a .C, .cc, .cpp or .cxx suffix. The compiler processes your text-based
program source files to create an executable object module.
In most cases you should use the xlC command to compile your C++ source files, 
and the xlc command to compile C source files.
You can use VisualAge to develop both 32 bit and 64 bit appliactions.

If you want to install VisualAge C++ for AIX, check first if the following required filesets are installed.

bos.adt.include                 Base Application Development Include Files
bos.adt.l1b                     Base Application Development Libraries
bos.adt.l1bm                    Base Application Development Math Libraries
bos.net.ncs                     Base Network Computing Services
1for_ls.compat                  License Use Management version 4 compatibility
1for_ls.base                    License Use Management version 4 Base

Use the following command to see whether these are installed:

# lslpp -h bos.adt.include bos.adt.l1b bos.adt.l1bm \
           bos.net.ncs 1for_ls.compat 1for_ls.base

For some components, the following needs to be installed as well:
X11.base.rte, bos.rte.11bpthreads, 1pfx.rte, 1for_ls.base.gu1, 1for_ls.client.gui

Make sure the AppDev package has been installed in order to have access to commands like "make" etc...


Notes:
======

Note 1:
-------

IBM C and C++ Compilers

  Usage:
     xlC [ option | inputfile ]...
     xlc [ option | inputfile ]...
     cc [ option | inputfile ]...
     c89 [ option | inputfile ]...
     xlC128 [ option | inputfile ]...
     xlc128 [ option | inputfile ]...
     cc128 [ option | inputfile ]...
     xlC_r [ option | inputfile ]...
     xlc_r [ option | inputfile ]...
     cc_r [ option | inputfile ]...
     xlC_r4 [ option | inputfile ]...
     xlc_r4 [ option | inputfile ]...
     cc_r4 [ option | inputfile ]...
     CC_r4 [ option | inputfile ]...
     xlC_r7 [ option | inputfile ]...
     xlc_r7 [ option | inputfile ]...
     cc_r7 [ option | inputfile ]...

  Description:
     The xlC and related commands compile C and C++ source files.
     They also processes assembler source files and object files. Unless the
     -c option is specified, xlC calls the linkage editor to produce a
     single object file. Input files may be any of the following:
       1. file name with .C suffix: C++ source file
       2. file name with .i suffix: preprocessed C or C++ source file
       3. file name with .c suffix: C source file
       4. file name with .o suffix: object file for ld command
       5. file name with .s suffix: assembler source file
       6. file name with .so suffix: shared object file


xlc : ANSI C compiler with UNIX header files. Use this command for most new C programs. 

cc  : Extended C compiler. This command invokes a non-ANSI compliant compiler. Use it for legacy C programs. 

c89 : Strict ANSI C compiler with ANSI header files. Use this command for maximum portability of your C programs. 

xlC : Native (i.e., non-cfront) C++ compiler. Use this command for compiling and linking all C++ code. 


The following additional command names, plus their "-tst" and "-old" variants, are also available at SLAC 
for compiling and linking reentrant programs: 
xlc_r, cc_r; xlC_r            : For use with POSIX threads 
xlc_r4, cc_r4; xlC_r4, CC_r4  : For use with DCE threads 


Note 2:
-------

install VisualAge C++:

- insert CD
- smitt install_latest
- press F4 to display all devices
- select CDROM device
- press F4 to select the filesets you want to install

After you have installed VisualAge C++ for AIX, you need to enroll your license for the product
before using it.

VisualAge C++ is not automatically installed in /usr/bin. To invoke the compiler without
having to specify the full path, do one of the following steps:
- create symbolic links for the specific driver contained in /usr/vacpp/bin and
  /usr/vac/bin to /usr/bin
- add /usr/vacpp/bin and /usr/vac/bin to your path


Note 3:
-------

Note: usage of vac examples:

Example 1:

xlc -I/usr/local/include -L/usr/local/lib simple.c -lcurl -lz 

Example 2:

The commands listed below invoke versions of the XL C compiler, which then translates C source code statements 
into object code, sends .s files to the assembler, and links the resulting object files with object files 
and libraries specified on the command line in the order in which they are listed, producing a single executable file 
called "a.out" by default. The -o flag may be used to rename the resulting executable file. 
Where commands are shown, they are generally given as generic examples. In any case, you type the appropriate 
command and press the Return (or Enter) key as usual. 

You compile a source program and/or subprograms by typing the following command: 

xlc cmd_line_opts input_files

input_files are source files (.c or .i), object files (.o), or assembler files (.s) 

For example, to compile a C program whose source is in source file "prog.c" you would enter the following command: 

xlc prog.c

After the xlc command completes, you will see a new executable file named "a.out" in your directory. 

If you specify -c as a compiler option, XL C only compiles the source program, producing an object file 
whose default name is that of the program with a .o extension. Before running the program, 
you must invoke the linkage editor phase. Either invoke the linker using the ld command or issue 
the xlc command a second time without the -c option, using the desired object (.o) filenames. 

For example, you may compile a subprogram "second.c" and then use it in your main program "prog.c" 
with the following sequence of commands: 

xlc -c second.c
xlc prog.c second.o 

Some important files on a test system:

# find -name "crt0_64.o" -print

/usr/lib/crt0_64.o
/usr/css/lib/crt0_64.o


# find -name "crt0_32.o" -print

/usr/lib/crt0_64.o
/usr/css/lib/crt0_64.o


Check out if vac is installed:

root@zd110l02:/root#lslpp -l vacpp*
lslpp: Fileset vacpp* not installed.


root@zd110l02:/root#lslpp -l xlC*
  Fileset                      Level  State      Description
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  xlC.aix50.rte              7.0.0.6  COMMITTED  C Set ++ Runtime for AIX 5.0
  xlC.cpp                    6.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.rte                    7.0.0.1  COMMITTED  C Set ++ Runtime



Note 4:
-------

At a certain organisation, the installation goes as follows:

install:

# cd /prj/tmp
# tar xv       (tape in rmt0)
# ./driver

config licentie:

# /usr/vac/bin/vac6_licentie
# l4blt -r6
# /usr/opt/ifor/ls/aix/bin/i4blt -r6

test:

- using existing sourcefile:

# cd /prj/vac/cctst
# cc fac.c -o fac
# ./fac

Or...

- make a simple c source and compile it:

#include <stdio.h>
int main(void)
{
   printf("Hello World!\n");
   return 0;
}

now compile it
# /usr/vac/bin/xlc hello.c -o hello

now run it
# ./hello



Note 5: LUM
-----------

i4lmd - Network License Server Subsystem

The i4lmd subsystem starts the network license server on the local node. 

Examples
Start a license server and do not log checkin, vendor, product, timeout, or message events: 

startsrc -s i4lmd -a "-no cvptm"

Start a license server changing the default log-file: 

startsrc -s i4lmd -a "-l /ifor/ls/my_log"


On an example p520 systeem:
---------------------------

In /etc/inittab:

i4ls:2:wait:/etc/i4ls.rc > /dev/null 2>&1 # Start i4ls

cat /etc/i4ls.rc

#!/bin/ksh
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
# bos520 src/bos/usr/opt/ifor/var/i4ls.rc 1.8
#
# Licensed Materials - Property of IBM
#
# (C) COPYRIGHT International Business Machines Corp. 1996,2001
# All Rights Reserved
#
# US Government Users Restricted Rights - Use, duplication or
# disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
#
# IBM_PROLOG_END_TAG
/usr/opt/ifor/ls/os/aix/bin/i4cfg -start -nopause
exit 0

On an example p550 system 29-12-2006, all apps down:
----------------------------------------------------

# ps -ef

     UID     PID    PPID   C    STIME    TTY  TIME CMD
    root       1       0   0   Dec 11      -  3:08 /etc/init
    root  327918       1   0   Dec 11      -  0:00 /usr/lib/errdemon
    root  352504       1   0   Dec 11      -  0:00 /usr/ccs/bin/shlap64
    root  360466       1   0   Dec 11      - 253:18 /usr/sbin/syncd 60
    root  548880 1724510   0 08:33:45  pts/0  0:00 -ksh
    root  585948  548880   1 09:11:19  pts/0  0:00 ps -ef
  cissys  880788 1060964   0 09:07:51      -  0:00 /usr/sbin/sftp-server
    root  983044 1011962   0   Dec 11      -  0:00 /usr/sbin/qdaemon
    root  999432       1   0   Dec 11      -  0:00 /usr/sbin/uprintfd
    root 1003764       1   0   Dec 11      -  0:34 /usr/sbin/cron
    root 1011962       1   0   Dec 11      -  0:00 /usr/sbin/srcmstr
    root 1024034       1   0   Dec 11      -  0:00 /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf
    root 1028102       1   0   Dec 11      -  0:00 ./mflm_manager
    root 1036402 1011962   0   Dec 11      -  0:00 /etc/ncs/llbd
    root 1040402 1052716   0   Dec 11      -  0:00 /usr/opt/ifor/bin/i4lmd -l /var/ifor/logdb -n clwts
    root 1052716 1011962   0   Dec 11      -  0:44 /usr/opt/ifor/bin/i4lmd -l /var/ifor/logdb -n clwts
    root 1056788 1011962   0   Dec 11      -  0:00 /usr/sbin/rsct/bin/IBM.AuditRMd
  cissys 1060964 1532138   0 09:07:51      -  0:01 sshd: cissys@notty
    root 1065016 1011962   0   Dec 11      -  0:05 /usr/sbin/rsct/bin/IBM.CSMAgentRMd
    root 1073192 1011962   0   Dec 11      -  0:00 /usr/sbin/rsct/bin/IBM.ServiceRMd
    root 1077274       1   0   Dec 11      -  0:01 /opt/hitachi/HNTRLib2/bin/hntr2mon -d
    root 1081378 1011962   0   Dec 11      -  0:28 /usr/DynamicLinkManager/bin/dlmmgr
    root 1085478 1011962   0   Dec 11      -  0:06 /etc/ncs/glbd
    root 1089574 1101864   0   Dec 11      -  0:00 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg
    root 1101864 1011962   0   Dec 11      -  3:14 /usr/opt/ifor/bin/i4llmd -b -n wcclwts -l /var/ifor/llmlg
    root 1110062 1011962   0   Dec 11      -  0:01 /usr/sbin/rsct/bin/rmcd -a IBM.LPCommands -r
    root 1114172 1011962   0   Dec 11      -  0:00 /usr/sbin/rsct/bin/IBM.ERrmd
    root 1122532 1167500   0 08:23:22      -  0:00 sshd: reserve [priv]
    root 1126476       1   0   Dec 27   lft0  0:00 -ksh
    root 1167500 1011962   0 03:17:38      -  0:00 /usr/sbin/sshd -D
  oracle 1175770       1   0   Dec 11      - 12:29 /apps/oracle/product/9.2/bin/tnslsnr listener -inherit
    root 1532138 1167500   0 09:07:50      -  0:00 sshd: cissys [priv]
    root 1708224 1126476   4 08:40:14   lft0  0:45 tar -cvf /dev/rmt0 /prj/was
 reserve 1724510 1786036   0 08:23:34  pts/0  0:00 -ksh
 reserve 1786036 1122532   0 08:23:34      -  0:00 sshd: reserve@pts/0


inittab:
--------

init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # Power Failure Detection
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
load64bit:2:wait:/etc/methods/cfg64 >/dev/console 2>&1 # Enable 64-bit execs
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
: rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
: piobe:2:wait:/usr/lib/lpd/pio/etc/pioinit >/dev/null 2>&1  # pb cleanup
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
: writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability daemon
l2:2:wait:/etc/rc.d/rc 2
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
: itess:23456789:once:/usr/IMNSearch/bin/itess -start search >/dev/null 2>&1
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
: httpdlite:23456789:once:/usr/IMNSearch/httpdlite/httpdlite -r /etc/IMNSearch/httpdlite/httpdlite.conf & >/dev/console 2>&1
ha_star:h2:once:/etc/rc.ha_star >/dev/console 2>&1
cons:0123456789:respawn:/usr/sbin/getty /dev/console
hntr2mon:2:once:/opt/hitachi/HNTRLib2/etc/D002start
dlmmgr:2:once:startsrc -s DLMManager
ntbl_reset:2:once:/usr/bin/ntbl_reset_datafiles
rcml:2:once:/usr/sni/aix52/rc.ml > /dev/console 2>&1
perfstat:2:once:/usr/lib/perf/libperfstat_updt_dictionary >/dev/console 2>&1
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
tty1:2:off:/usr/sbin/getty /dev/tty1
tty0:2:off:/usr/sbin/getty /dev/tty0
: i4ls:2:wait:/etc/i4ls.rc > /dev/null 2>&1 # Start i4ls
mF:2345:wait:sh /etc/mflmrcscript > /dev/null 2>&1
i4ls:2:wait:/etc/i4ls.rc > /dev/null 2>&1 # Start i4ls
documentum:2:once:/etc/rc.documentum start >/dev/null 2>&1 


Note 7:
-------


IBM C/C++ Compilers
This describes the IBM implementation of the C and C++ compilers. 

Contents
Invoking the Compiler 
C Compiler Modes 
C++ Compiler Modes 
Source Files and Preprocessing 
Default Datatype Sizes 
Distributed-memory parallelism 
Shared-memory parallelism 
64-bit addressing 
Optimization 
Related Information 
Memory Management 
Porting programs from the Crays to the SP 
Mixing C and Fortran 

--------------------------------------------------------------------------------

Invoking the Compiler
The IBM C compiler is described in the IBM C for AIX User's Manual and the IBM C++ compiler is described 
in the IBM Visual Age C++ Batch Compiler manual. Both of these manuals are on line. 

As with the IBM XL Fortran compiler, there are several different commands that invoke the C or C++ compilers, 
each of which is really an alias for the main C or C++ command packaged with a set of commonly used options. 

The most basic C compile is of the form 

% xlc source.c

This will produce an executable named a.out. The other C Compiler modes are described below in the 
section C Compiler Modes. 

The most basic C++ compile is of the form 

%	xlC source.C

This will produce an executable named a.out. The other C++ Compiler modes are described below 
in the section C++ Compiler Modes. 

Note: There is no on-line man page for the C++ compiler. "man xlC" brings up the man page for the C compiler. 
For complete documentation of C++ specific options and conventions see the on-line C++ manual. 
The commands xlc, mpcc, and mpCC all have on-line man pages. 

C Compiler Modes
There are four basic compiler invocations for C compiles: xlc, cc, c89, and mpcc. All but c89 have one or more 
subinvocations with different defaults. 

xlc
xlc invokes the compiler for C with an ansi language level. This is the basic invocation that IBM recommends. 

These are the two most useful subinvocations of xlc: 

xlc_r 
This invokes the thread safe version of xlc. It should be used when any kind of multi-threaded code is being built. 
This is equivalent to invoking the compiler as xlc -D_THREAD_SAFE and the loader as 
xlc -L/usr/lib/threads -Lusr/lib/dce -lc_r -lpthreads. 

xlc128 
This is equivalent to invoking the compiler as xlc -qldbl128 -lC128. It increases the size of long double data types 
from 64 to 128 bits. 

cc
cc invokes the compiler for C with an extended language level. This is for source files with legacy C code 
that IBM refers to as "RT compiler extensions". This include older pre-ansi features such as those in the 
Kernighan and Ritchie's "The C Programming Language". 

The two most useful subinvocations are cc_r which is the cc equivalent of xlc_r and cc128 which is the cc equivalent 
of xlc128. 

c89
c89 should be used when strict conformance to the C ANSI ANSI standard (ISO/IEC 9899:1990) is desired. 
There are no subinvocations associated with this compiler invocation. 

mpcc
mpcc is a shell script that compiles C programs with the cc compiler while linking in the Partition Manager, 
Message Passing Interface (MPI), and/or Message Passing Library (MPL). Flags are passed by mpcc to the xlc command, 
so any of the xlc options can be used with mpcc as well. When mpcc is used to link a program the Partition Manager 
and message passing interface are automatically linked in. The script creates an executable that dynamically binds 
with the message passing libraries. 

There is one subinvocation with mpcc, mpcc_r which is the mpcc equivalent of cc_r. This invocation also links 
in the Partition Manager, the threaded implementation of Message Passing Interface (MPI), and Low-level 
Applications Programming Interface (LAPI). 

ANSI compliance can be achieved by compiling with the option -qlanglvl=ansi. 

Compiler summary
This table summarizes the features of several different C compiler invocations: 

Compiler Name Functionality 
C defaults DM Parallel SM Parallel 
xlc ansi No No 
xlc_r ansi No Yes 
xlc128 ansi No No 
cc extended No No 
cc_r extended No Yes 
cc128 extended No No 
c89 strict No No 
mpcc extended* Yes No 
mpcc_r extended* Yes Yes 
* ANSI compliance can be achieved by compiling with the option -qlanglvl=ansi. 
In the table above, C defaults indicates the default C standards behavior of the compiler. 

DM Parallel refers to distributed-memory parallelism through the MPI library. 

SM Parallel refers to shared-memory parallelism, available through OpenMP, IBM tasking directives, 
automatic parallelization by the compiler, or the pthreads API. 

C++ Compiler Modes
There are two basic compiler invocations for C++ compiles: xlC and mpCC. If a program consists of source code modules 
in different program languages, it must be linked with a form of one of these invocations in order to use the 
correct C++ run time libraries. 

All of the C++ invocations will compile source files with a .c suffix as ansi C source files unless the 
-+ option to the C++ compiler is specified. Any of the C compiler invocations will also compile a file with 
the appropriate suffix as a C++ file. 

xlC
Among the subinvocations of xlC are: 

xlC_r: the xlC equivalent of xlc_r 
xlC128: the xlC equivalent of xlc128 
xlC128_r: this combines the features of the xlC_r and xlC128 subinvocations. 
mpCC
mpCC is a shell script that compiles C++ programs with the xlC compiler while linking in the Partition Manager, 
Message Passing Interface (MPI), and/or Message Passing Library (MPL). Flags are passed by mpCC to the xlC command, 
so any of the xlC options can be used on the mpCC shell script. When mpCC is used to link a program the 
Partition Manager and message passing interface are automatically linked in. The script creates an executable 
that dynamically binds with the message passing libraries. 

By default, the mpCC compiler uses the regular C program MPI bindings. In order to use the full C++ MPI bindings 
use the compiler flag -cpp 

There is one mpCC subinvocation, mpCC_r. This invokes a shell script that compiles C++ programs while linking 
in the Partition Manager, the threaded implementation of Message Passing Interface (MPI), and Low-level Applications 
Programming Interface (LAPI). 

Source Files and Preprocessing
All of the C and C++ compiler invocations process assembler source files and object files as well as preprocessing 
and compiling C and C++ source files. Unless the -c option is specified, they also call the linkage editor to produce 
a single executable object file. 

All invocations of the C or C++ compilers follow these suffix conventions for input files: 

.C, .cc, .cpp, or .cxx - C++ source file. 
.c - C source file 
.i - preprocessed C source file 
.so - shared object file 
.o - object file for ld command 
.s - assembler source file 
By default, the preprocessor is run on both C and C++ source files. 

Default Datatype Sizes
These are the default sizes of the standard C/C++ datatypes. 

Type Length (bytes) 
bool1 1 
char 1 
wchar_t1 2 
short 2 
int 4 
long 4 /8 2 
float 4 
double 8 
long double 8 /163 
1C++ only.
264 bit mode -q64.
3 128 suffix compiling mode. 
Distributed-Memory Parallelism

Invoking any of the compilers starting with "mp" enables the program for running across several nodes. 
Of course, you are responsible for using a library such as MPI to arrange communication and coordination 
in such a program. Any of the mp compilers sets the include path and library paths to pick up the MPI library. 

To use the MPI with C++ or to use the MPI I/O subroutines, the thread-safe version of the compiler must be used. 

% mpcc_r a.c
% mpCC_r -cpp a.C

The example, hello.c, demonstrates the use of MPI from a C code. 

The example, hello.C, demonstrates the use of MPI from a C++ code. 

Shared-Memory Parallelism
The IBM C and C++ compilers support a variety of shared-memory parallelism. 

OpenMP
OpenMP directives are fully supported by the IBM C and C++ compilers when one of the invocations with _r suffix 
is used. See Using OpenMP on seaborg for details. 

Automatic Parallelization
The IBM C compiler will attempt to automatically parallelize simple loop constructs. Use the option "-qsmp" 
with one of the _r invocations: 

% xlc_r -qsmp a.c

64 Bit Addressing
Both the IBM C and C++ compilers support 64 bit addressing through the -q64 option. The default mode can be set 
through the environment variable OBJECT_MODE on Bassi, OBJECT_MODE=64 has been set to make 64-bit mode the default. 
On Seaborg the default is 32-bit addressing mode. In 64-bit mode all pointers are 64 bits in length and length 
of long datatypes increase from 32 to 64 bits. It does not change the default size of any other datatype. 

The following points should be kept in mind if 64-bit is used: 

If you have some object files that were compiled in 32-bit mode and others compiled in 64-bit mode the objects 
will not bind. You must recompile to ensure that all objects are in the same mode. 
Your link options must reflect the type of objects you are linking. If you compiled 64-bit objects, you must 
also link these objects with the -q64 option. 

Optimization
The default for all IBM compilers is for there to be no optimization. The NERSC/IBM recommended optimization options 
for both C and C++ compiles are -O3 -qstrict -qarch=auto -qtune=auto. 






55.2 Installing Tuxedo 8.1 or 9:
================================

Before installing make sure you understand the BEA and Tuxedo home dirs, and give appropriate
ownership/permissions to a dedicated BEA account.

GUI mode or console mode are available.

GUI:
====

Go to the directory where you downloaded the installer and invoke the installation procedure by entering 
the following command: 
prompt> sh filename.bin

where filename is the name of the BEA Tuxedo installer file.

Select the install set that you want installed on your system. The following seven choices are available:

Full Install (the default)-all Tuxedo server and client software components
Server Install-Tuxedo server software components only
Full Client Install-Tuxedo client software components only
Jolt Client Install-Jolt client software components only
ATMI (/WS) Client Install-Tuxedo ATMI client software components only
CORBA Client Install-Tuxedo CORBA client software components only
Custom Install-select specific Tuxedo server and client software components. The following table entry provides 
a summary of options for the Custom Install.

For a detailed list of software components for each install set, see Install Sets.

Select (add) or deselect (clear) one or more software components from the selected install set, 
or choose one of the other five install sets or Custom Set from the drop-down list menu and customize 
its software components. For a description of the JRLY component, see Jolt Internet Relay.

Observe the following software component mappings:

Server-contains ATMI server software; CORBA C++ server software; BEA Jolt server software; BEA SNMP Agent software, 
and BEA Tuxedo Administration Console software
ATMI Client-contains BEA ATMI Workstation (/WS) client software
CORBA Client-contains BEA CORBA C++ client software (C++ client ORB) including environmental objects
Jolt JRLY-contains BEA Jolt Relay software
Jolt Client-contains BEA Jolt client software

After selecting or deselecting one or more software components from the selected install set, 
click Next to continue with the installation. The appropriate encryption software for LLE and/or SSL 
is automatically included.

Specify the BEA Home directory that will serve as the central support directory for all BEA products 
installed on the target system. If you already have a BEA Home directory on your system, you can select 
that directory (recommended) or create a new BEA Home directory. If you choose to create a new directory, 
the BEA Tuxedo installer program automatically creates the directory for you. For details about the 
BEA Home directory, see BEA Home Directory.

Choose a BEA Home directory and then click Next to continue with the installation.

Console mode:
=============

Console-mode installation is the text-based method of executing the BEA Installation program. 
It can be run only on UNIX systems and is intended for UNIX systems with non-graphics consoles. 
Console-mode installation offers the same capabilities as graphics-based installation 

Go to the directory where you downloaded the installer and invoke the installation procedure 
by entering the following command: 

prompt> sh filename.bin -i console 

where filename is the name of the BEA Tuxedo installer file.

The tekstbased installation resembles from then on, the GUI installation.

Tuxedo 8.1 binaries and what can you do with them:
==================================================

/spl/SPLDEV1/product/tuxedo8.1/bin:>ls
AUTHSVR              TMNTSFWD_T           dmadmin              snmp_integrator.pbk  tpaclcvt
AUTHSVR.pbk          TMQFORWARD           dmadmin.pbk          snmp_version         tpacldel
BBL                  TMQUEUE              dmloadcf             snmp_version.pbk     tpaclmod
BBL.pbk              TMS                  dmloadcf.pbk         snmpget              tpaddusr
BRIDGE               TMS.pbk              dmunloadcf           snmpget.pbk          tpdelusr
BRIDGE.pbk           TMSYSEVT             dmunloadcf.pbk       snmpgetnext          tpgrpadd
BSBRIDGE             TMSYSEVT.pbk         epifreg              snmpgetnext.pbk      tpgrpdel
BSBRIDGE.pbk         TMS_D                epifregedt           snmptest             tpgrpmod
CBLDCLNT             TMS_QM               epifunreg            snmptest.pbk         tpmigldap
CBLDSRVR             TMS_QM.pbk           esqlc                snmptrap             tpmodusr
CBLVIEWC             TMS_SQL              evt2trapd            snmptrap.pbk         tpusradd
CBLVIEWC32           TMS_SQL.pbk          evt2trapd.pbk        snmptrapd            tpusrdel
DBBL                 TMUSREVT             genicf               snmptrapd.pbk        tpusrmod
DMADM                TMUSREVT.pbk         idl                  snmpwalk             tux_snmpd
DMADM.pbk            WSH                  idl2ir               snmpwalk.pbk         tux_snmpd.pbk
GWADM                WSH.pbk              idltojava            sql                  tuxadm
GWTDOMAIN            WSL                  idltojava.pbk        stop_agent           tuxadm.pbk
GWTDOMAIN.pbk        bldc_dce             ir2idl               stop_agent.pbk       tuxwsvr
GWTOPEND             blds_dce             irdel                tidl                 txrpt
ISH                  build_dgw            jrly                 tlisten              ud
ISH.pbk              buildclient          jrly.pbk             tlisten.pbk          ud32
ISL                  buildish             mkfldhdr             tlistpwd             uuidgen
ISL.pbk              buildobjclient       mkfldhdr32           tmadmin              viewc
JRAD                 buildobjserver       ntsadmin             tmadmin.pbk          viewc.pbk
JRAD.pbk             buildserver          qmadmin              tmboot               viewc32
JREPSVR              buildtms             reinit_agent         tmboot.pbk           viewc32.pbk
JSH                  buildwsh             reinit_agent.pbk     tmconfig             viewdis
JSH.pbk              cleanupsrv           restartsrv           tmipcrm              viewdis32
JSL                  cleanupsrv.pbk       restartsrv.pbk       tmipcrm.pbk          wgated
LAUTHSVR             cns                  rex                  tmloadcf             wgated.pbk
TMFFNAME             cnsbind              rmskill              tmloadcf.pbk         wlisten
TMFFNAME.pbk         cnsls                sbbl                 tmshutdown           wlisten.pbk
TMIFRSVR             cnsunbind            show_agent           tmshutdown.pbk       wtmconfig
TMNTS                cobcc                show_agent.pbk       tmunloadcf           wud
TMNTSFWD_P           cobcc.pbk            snmp_integrator      tpacladd             wud32


txrpt:
------

Name
txrpt-BEA TUXEDO system server/service report program 

Synopsis
txrpt [-t]  [-n names]  [-d mm/dd]  [-s time]  [-e time]
Description
txrpt analyzes the standard error output of a BEA TUXEDO system server to provide a summary 
of service processing time within the server. The report shows the number of times dispatched 
and average elapsed time in seconds of each service in the period covered. txrpt takes its input 
from the standard input or from a standard error file redirected as input. Standard error files 
are created by servers invoked with the -r option from the servopts(5) selection; the file can be 
named by specifying it with the -e servopts option. Multiple files can be concatenated into a single 
input stream for txrpt. Options to txrpt have the following meaning: 


-t 
order the output report by total time usage of the services, with those consuming the most total time printed first. 
If not specified, the report is ordered by total number of invocations of a service. 

-n names 
restrict the report to those services specified by names. names is a comma-separated list of service names. 

-d mm/dd 
limit the report to service requests on the month, mm, and day, dd, specified. The default is the current day. 

-s time 
restrict the report to invocations starting after the time given by the time argument. 
The format for time is hr[:min[:sec]]. 

-e time 
restrict the report to invocations that finished before the specified time. The format for time is the 
same as the -s flag. 
The report produced by txrpt covers only a single day. If the input file contains records from more than one day, 
the -d option controls the day reported on.

tuxadm:
-------

Name

tuxadm-BEA Tuxedo Administration Console CGI gateway.

Synopsis

http://cgi-bin/tuxadm[TUXDIR=tuxedo_directory | INIFILE=initialization_file][other_parameters]
Description

tuxadm is a common gateway interface (CGI) process used to initialize the Administration Console from a browser. 
As shown in the "Synopsis" section, this program can be used only as a location, or URL from a Web browser; 
normally it is not executed from a standard command line prompt. Like other CGI programs, 
tuxadm uses the QUERY_STRING environment variable to parse its argument list.

tuxadm parses its arguments and finds a Administration Console initialization file. If the TUXDIR parameter 
is present, the initialization file is taken to be $TUXDIR/udataobj/webgui/webgui.ini by default. 
If the INIFILE option is present, then the value of that parameter is taken to be the full path to the 
initialization file. Other parameters may also be present. 

Any additional parameters can be used to override values in the initialization file. See the wlisten 
reference page for a complete list of initialization file parameters. The ENCRYPTBITS parameter may not be 
overridden by the tuxadm process unless the override is consistent with the values allowed in the actual 
initialization file.

The normal action of tuxadm is to generate, to its standard output, HTML commands that build a Web page 
that launches the Administration Console applet. The general format of the Web page is controlled by 
the TEMPLATE parameter of the initialization file, which contains arbitrary HTML commands, 
with the special string %APPLET% on a line by itself in the place where the Administration Console applet 
should appear. Through the use of other parameters from the initialization file 
(such as CODEBASE, WIDTH, HEIGHT, and so on) a correct APPLET tag is generated that contains 
all the parameters necessary to create an instance of the Administration Console.

Errors

tuxadm generates HTML code that contains an error message if a failure occurs. Because of the way CGI 
programs operate, there is no reason to return an error code of any kind from tuxadm.

See Also

tuxwsvr(1), wlisten(1) 

MSTMACH:
--------

Is the machine name, and usually corresponds to the LMID, the logical machine ID.
There should be an entry of the hostname in /etc/hosts.


tmboot:
-------

tmboot(1)

Name

tmboot-Brings up a BEA Tuxedo configuration.

Synopsis

tmboot [-l lmid] [-g grpname] [-i srvid] [-s aout] [-o sequence] [-S] [-A] [-b] [-B lmid] [-T grpname] [-e command] 
       [-w] [-y] [-g] [-n] [-c] [-M] [-d1]

Description

tmboot brings up a BEA Tuxedo application in whole or in part, depending on the options specified. tmboot can be invoked 
only by the administrator of the bulletin board (as indicated by the UID parameter in the configuration file) 
or by root. The tmboot command can be invoked only on the machine identified as MASTER in the RESOURCES section 
of the configuration file, or the backup acting as the MASTER, that is, with the DBBL already running 
(via the master command in tmadmin(1)). Except, if the -b option is used; in that case, the system can be booted 
from the backup machine without it having been designated as the MASTER.

With no options, tmboot executes all administrative processes and all servers listed in the SERVERS section 
of the configuration file named by the TUXCONFIG and TUXOFFSET environment variables. If the MODEL is MP, 
a DBBL administrative server is started on the machine indicated by the MASTER parameter in the RESOURCES section. 
An administrative server (BBL) is started on every machine listed in the MACHINES section. For each group 
in the GROUPS section, TMS servers are started based on the TMSNAME and TMSCOUNT parameters for each entry. 
All administrative servers are started followed by servers in the SERVERS sections. Any TMS or gateway servers 
for a group are booted before the first application server in the group is booted. The TUXCONFIG file is propagated 
to remote machines as necessary. tmboot normally waits for a booted process to complete its initialization 
(that is, tpsvrinit()) before booting the next process.

Booting a gateway server implies that the gateway advertises its administrative service, and also advertises 
the application services representing the foreign services based on the CLOPT parameter for the gateway. 
If the instantiation has the concept of foreign servers, these servers are booted by the gateway at this time.

Booting an LMID is equivalent to booting all groups on that LMID.

Application servers are booted in the order specified by the SEQUENCE parameter, or in the order of server entries 
in the configuration file (see the description in UBBCONFIG(5)). If two or more servers in the SERVERS section 
of the configuration file have the same SEQUENCE parameter, then tmboot may boot these servers in parallel and 
will not continue until they all complete initialization. Each entry in the SERVERS section can have a 
MIN and MAX parameter. tmboot boots MIN application servers (the default is 1 if MIN is not specified for 
the server entry) unless the -i option is specified; using the -i option causes individual servers to be 
booted up to MAX occurrences.

If a server cannot be started, a diagnostic is written on the central event log (and to the standard output, 
unless -q is specified), and tmboot continues-except that if the failing process is a BBL, servers that depend 
on that BBL are silently ignored. If the failing process is a DBBL, tmboot ignores the rest of the 
configuration file. If a server is configured with an alternate LMID and fails to start on its primary machine, 
tmboot automatically attempts to start the server on the alternate machine and, if successful, sends a message 
to the DBBL to update the server group section of TUXCONFIG.

For servers in the SERVERS section, only CLOPT, SEQUENCE, SRVGRP, and SRVID are used by tmboot. Collectively, 
these are known as the server's boot parameters. Once the server has been booted, it reads the configuration file 
to find its run-time parameters. (See UBBCONFIG(5) for a description of all parameters.)

All administrative and application servers are booted with APPDIR as their current working directory. 
The value of APPDIR is specified in the configuration file in the MACHINES section for the machine on which 
the server is being booted.

The search path for the server executables is APPDIR, followed by TUXDIR/bin, followed by /bin and /usr/bin, 
followed by any PATH specified in the ENVFILE for the MACHINE. The search path is used only if an absolute pathname 
is not specified for the server. Values placed in the server's ENVFILE are not used for the search path.

When a server is booted, the variables TUXDIR, TUXCONFIG, TUXOFFSET, and APPDIR, with values specified in the 
configuration file for that machine, are placed in the environment. The environment variable LD_LIBRARY_PATH 
is also placed in the environment of all servers. Its value defaults to $APPDIR:$TUXDIR/lib:/lib:/usr/lib:lib> 
where <lib> is the value of the first LD_LIBRARY_PATH= line appearing in the machine ENVFILE. See UBBCONFIG(5) 
for a description of the syntax and use of the ENVFILE. Some Unix systems require different environment variables. 
For HP-UX systems, use the SHLIB_PATH environment variable. FOR AIX systems, use the LIBPATH environment variable.

The ULOGPFX for the server is also set up at boot time based on the parameter for the machine in the 
configuration file. If not specified, it defaults to $APPDIR/ULOG.

All of these operations are performed before the application initialization function, tpsvrinit(), is called.

Many of the command line options of tmboot serve to limit the way in which the system is booted and can be used 
to boot a partial system. The following options are supported.


-l lmid 

For each group whose associated LMID parameter is lmid, all TMS and gateway servers associated with the group 
are booted and all servers in the SERVERS section associated with those groups are executed.

-g grpname 


All TMS and gateway servers for the group whose SRVGRP parameter is grpname are started, followed by all servers 
in the SERVERS section associated with that group. TMS servers are started based on the TMSNAME and TMSCOUNT 
parameters for the group entry.

-i srvid 

All servers in the SERVERS section whose SRVID parameter is srvid are executed.

-s aout 

All servers in the SERVERS section with name aout are executed. This option can also be used to boot TMS and 
gateway servers; normally this option is used in this way in conjunction with the -g option.

-o sequence

All servers in the SERVERS section with SEQUENCE parameter sequence are executed.

-S

All servers in the SERVERS section are executed.

-A

All administrative servers for machines in the MACHINES section are executed. Use this option to guarantee 
that the DBBL and all BBL and BRIDGE processes are brought up in the correct order. (See also the description 
of the -M option.)

-b

Boot the system from the BACKUP machine (without making this machine the MASTER).

-B lmid 

A BBL is started on a processor with logical name lmid.

-M

This option starts administrative servers on the master machine. If the MODEL is MP, a DBBL administrative server 
is started on the machine indicated by the MASTER parameter in the RESOURCES section. A BBL is started on the 
MASTER machine, and a BRIDGE is started if the LAN option and a NETWORK entry are specified in the configuration file.

-d1

Causes command line options to be printed on the standard output. Useful when preparing to use sdb to debug 
application services.

-T grpname 

All TMS servers for the group whose SRVGRP parameter is grpname are started (based on the TMSNAME and TMSCOUNT 
parameters associated with the group entry). This option is the same as booting based on the TMS server name 
(-s option) and the group name (-g).

-e command 

Causes command to be executed if any process fails to boot successfully. command can be any program, script, 
or sequence of commands understood by the command interpreter specified in the SHELL environment variable. 
This allows an opportunity to bail out of the boot procedure. If command contains white space, the entire 
string must be enclosed in quotes. This command is executed on the machine on which tmboot is being run, 
not on the machine on which the server is being booted.

Note: If you choose to do redirection or piping on a Windows 2000 system, you must use one of the following methods:


Do redirection or piping from within a command file or script. 

To do redirection from within the queue manager administration program, precede the command with cmd. For example:
cmd /c ipconfig > out.txt 

If you choose to create a binary executable, you must allocate a console within the binary executable using 
the Windows AllocConsole() API function 

-w 

Informs tmboot to boot another server without waiting for servers to complete initialization. This option 
should be used with caution. BBLs depend on the presence of a valid DBBL; ordinary servers require a running BBL 
on the processor on which they are placed. These conditions cannot be guaranteed if servers are not started 
in a synchronized manner. This option overrides the waiting that is normally done when servers have sequence numbers.

-y 

Assumes a yes answer to a prompt that asks if all administrative and server processes should be booted. 
(The prompt appears only when the command is entered with none of the limiting options.)

-q 

Suppresses the printing of the execution sequence on the standard output. It implies -y.

-n

The execution sequence is printed, but not performed.

-c

Minimum IPC resources needed for this configuration are printed.

When the -l, -g, -i, -o, and -s options are used in combination, only servers that satisfy all qualifications 
specified are booted. The -l, -g, -s, and -T options cause TMS servers to be booted; the -l, -g, and -s options 
cause gateway servers to be booted; the -l, -g, -i, -o, -s, and -S options apply to application servers. 
Options that boot application servers fail if a BBL is not available on the machine.The -A, -M, and -B options 
apply only to administrative processes.

The standard input, standard output, and standard error file descriptors are closed for all booted servers.

Interoperability

tmboot must run on the master node, which in an interoperating application must be the highest release available. 
tmboot detects and reports configuration file conditions that would lead to the booting of administrative servers 
such as Workstation listeners on sites that cannot support them.

Portability 

tmboot is supported on any platform on which the BEA Tuxedo server environment is supported.

Environment Variables

During the installation process, an administrative password file is created. When necessary, the BEA Tuxedo system 
searches for this file in the following directories (in the order shown): APPDIR/.adm/tlisten.pw and 
TUXDIR/udataobj/tlisten.pw. To ensure that your password file will be found, make sure you have set the 
APPDIR and/or TUXDIR environment variables.

Link-Level Encryption

If the link-level encryption feature is in operation between tmboot and tlisten, link-level encryption will be 
negotiated and activated first to protect the process through which messages are authenticated.

Diagnostics

If TUXCONFIG is set to a non-existent file, two fatal error messages are displayed: 

error processing configuration file 

configuration file not found 
If tmboot fails to boot a server, it exits with exit code 1 and the user log should be examined for further details. 
Otherwise tmboot exits with exit code 0.

If tmboot is run on an inactive non-master node, a fatal error message is displayed:

tmboot cannot run on a non-master node.
If tmboot is run on an active node that is not the acting master node, the following fatal error message is displayed:

tmboot cannot run on a non acting-master node in an active application.
If the same IPCKEY is used in more than one TUXCONFIG file, tmboot fails with the following message: 

Configuration file parameter has been changed since last tmboot
If there are multiple node names in the MACHINES section in a non-LAN configuration, the following fatal error 
message is displayed:

Multiple nodes not allowed in MACHINES for non-LAN application.
If tlisten is not running on the MASTER machine in a LAN application, a warning message is printed. 
In this case, tmadmin(1) cannot run in administrator mode on remote machines; it is limited to read-only operations. 
This also means that the backup site cannot reboot the master site after failure.

Examples

To start only those servers located on the machines logically named CS0 and CS1, enter the following command:

tmboot -l CS0 -l CS1
To start only those servers named CREDEB that belong to the group called DBG1, enter the following command:

tmboot -g DBG1 -s CREDEB1

To boot a BBL on the machine logically named PE8, as well as all those servers with a location specified as PE8, 
enter the following command.

tmboot -B PE8 -l PE8

To view minimum IPC resources needed for the configuration, enter the following command.

tmboot -c

The minimum IPC requirements can be compared to the parameters set for your machine. See the system administration 
documentation for your machine for information about how to change these parameters. If the -y option is used, 
the display will differ slightly from the previous example.

Notices

The tmboot command ignores the hangup signal (SIGHUP). If a signal is detected during boot, the process continues.

Minimum IPC resources displayed with the -c option apply only to the configuration described in the configuration 
file specified; IPC resources required for a resource manager or for other BEA Tuxedo configurations are not 
considered in the calculation.

See Also

tmadmin(1), tmloadcf(1), tmshutdown(1), UBBCONFIG(5) 

Administering BEA Tuxedo Applications at Run Time




 
Notes in Dutch on Tuxedo:
=========================


Note 1 CDX or ETM application (middleware component, based on Tuxedo):
----------------------------------------------------------------------

Recompile van de tuxconfig.bin file, na changes in de ubb file.

ETM op AIX wordt geinstalleerd in het directory "/prj/spl/<naam_van_de_instance>", zoals
bijvoorbeeld "/prj/spl/ivocf01" of bijvoorbeeld "/prj/spl/SPLS3".

Het "gedrag" van Tuxedo wordt bijna geheel bepaald door de configuratie file
"/prj/spl/<ETM_Instance_Name>/etc/tuxconfig.bin".

De source van tuxconfig.bin, is de ascii file "/prj/spl/<ETM_Instance_Name>/etc/ubb".
Dit houdt in, dat als men een wijziging pleegt in de ubb file (bijv. het aantal servers verhogen),
dan moet er een nieuwe tuxconfig.bin file worden gegenereerd.

Hiervoor heeft SPL in het directory "/prj/spl/<ETM_Instance_Name>/bin" een shell script gemaakt,
met de naam "gentuxedo.sh".

Het script kan verschillende flags worden meegegeven. Zie opmerking 2 hieronder.

1. Logon als de ETM software owner (bijv. ccbsys of etmsys)
2. Check je environment (staan al je environment vars goed?)
3. Attach jezelf aan de juiste environment. Gebruikelijk is, dat er een "alias" bestaat
   met de naam die gelijk is aan de Instance Name. Je hoeft dan alleen maar de alias 
   vanaf de promt in te voeren.
   Voorbeeld: Stel de instance naam is "IVOCF01". Direkt na het inloggen
   als de ETM owner (of via "su - ownerccount"), kun je vanaf de prompt de alias
   aanroepen met IVOCF01, en je wordt geattached aan de IVOCF01 instance.

4. Change directory naar "/prj/spl/<ETM_Instance_Name>/bin"

5. Om nu, na een wijziging in de ubb file, de tuxconfig.bin file opnieuw te compileren,
   gebruik dan het commando

   ./gentuxedo.sh -m

   Gebruik "./gentuxedo.sh -u" om de ubb en de bin file vanaf de template te genereren.

Opmerkingen:

1. Hetzelfde kan bereikt worden met het tuxedo utility "tmloadcf"

   tmloadcf -y $SPLEBASE/etc/ubb

2. De flags die men aan gentuxedo.sh kan meegeven:

#%       USAGE:   gentuxedo.sh
#%       USAGE:    -h = HELP
#%       USAGE:    -r = Recreate the default tuxedo server
#%       USAGE:       This will recreate all of the default service lists
#%       USAGE:       ( see option -n ) as well as create UBB files
#%       USAGE:    -u = Create the UBB file from template only
#%       USAGE:    -m = use tmloadcf to recreate ubb binary from $SPLEBASE/etc/ubb
#%       USAGE:        Once modifications have been made to the $SPLEBASE/etc/ubb
#%       USAGE:        file it is necessary to compile those changes. use the -m
#%       USAGE:        option to do this.
#%       USAGE:    -s = Create the Servers
#%       USAGE:       This will create only the Servers as defined in the -n option.


Note 2:
-------

Connecten, of attachen, naar een AIX ETM instance

Op een AIX lpar (logical partition, ofwel een virtual machine, ofwel een volledig zelfstandige AIX machine), 
draaien 1 of meer ETM instance(s). Een ETM instance, is middleware, bestaande uit tuxedo services, een 
Cobol Application server, en Cobol business objects.

De ETM user (of software owner) kan zich "verbinden" met een dergelijke Instance, bijvoorbeeld om 
administratieve handelingen uit te voeren zoals het starten of stoppen van de Instance.

Op je te verbinden (of attachen) naar een bepaalde ETM instance, kun je het "splenviron.sh" script gebruiken 
welke is gelegen in "/prj/spl/<Instance_name>/bin" directory. Mogelijk is het pad toch iets anders van vorm, 
zoals bijvoorbeeld "/prj/etm_1520/IVOOCF/bin" of zoiets dergelijks. Het belangrijkste is om te weten dat 
binnen de directorytree die bij een instance hoort, dat er een "bin" directory bestaat met een aantal .sh 
shell scripts, waaronder dus ook het "splenviron.sh" script. 

Het .profile van de etm user, dient echter zodanig te zijn ingesteld, dat reeds een aantal environment variabelen 
"goed" zijn neergezet, en correct verwijzen naar de juiste Cobol, Tuxedo en DB2 locaties.
Er vanuit gaande dat het .profile goed is, kan de etm user zich verbinden met een Instance via:

splenviron.sh -e <Instance_Name>


Voorbeeld:

Stel op een AIX machine (of lpar) bestaat de ETM Instance "SPLDEV1" welke geinstalleerd is in het directory 
"/spl/SPLDEV1".

Men kan zich dan attachen via het command:

/spl/SPLDEV1/bin $ ./splenviron.sh -e SPLDEV1

Version ................ (SPLVERSION) : V1.5.20.0
Database Type ............... (SPLDB) : oracle
ORACLE_SID ............. (ORACLE_SID) : SPLDEV1
NLS_LANG ............... (NLS_LANG)   : AMERICAN_AMERICA.WE8ISO8859P15
App Dir - Logs ............. (SPLAPP) : /spl/splapp/SPLDEV1
Environment Name ....... (SPLENVIRON) : SPLDEV1
Environment Code Directory (SPLEBASE) : /spl/SPLDEV1
Build Directory .......... (SPLBUILD) : /spl/SPLDEV1/cobol/build
Runtime Directory .......... (SPLRUN) : /spl/SPLDEV1/runtime/oracle
Cobol Copy Path ......... (SPLCOBCPY) : /spl/SPLDEV1/cobol/source/cm:/spl/SPLDEV1/tuxedo/templates:
                                        /spl/SPLDEV1/tuxedo/genSources:/spl/SPLDEV1/cobol/source:
                                        /spl/SPLDEV1/product/tuxedo8.1/cobinclude

De belangrijkste functie van "splenviron.sh" is hier dan, dat een aantal variablen correct worden neergezet 
zodat alle kenmerken van deze applicatie (zoals build directory e.d.) goed staan.

Behalve het gebruik van splenviron.sh, is het heel goed mogelijk dat in het .profile van de etm user, reeds een 
aantal "aliases" zijn gedefinieerd.
Als er inderdaad aliases zijn gedefinieerd, is het attachen naar een Instance heel makkelijk. Men dient dan alleen 
nog maar de alias vanaf de unix prompt in te voeren.

Voorbeeld:

Stel dat in het .profile van de ETM user het volgende is opgenomen:

alias SPLDEV1='/spl/SPLDEV1/bin/splenviron.sh -e SPLDEV1'

Dan kan de ETM user zich direct aan SPLDEV1 attachen via het command: SPLDEV1

Dus om een aantal ETM instances te stoppen en weer te starten (bijv. in een backupscript):

#STOPPEN ETM instances:
su - cissys -c '/spl/SPLDEV1/bin/splenviron.sh -e SPLDEV1 -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLDEV2/bin/splenviron.sh -e SPLDEV2 -c "spl.sh -t stop"'
sleep 2
su - cissys -c '/spl/SPLCONF/bin/splenviron.sh -e SPLCONF -c "spl.sh -t stop"'
sleep 2

#STARTEN ETM instances:
su - cissys -c '/spl/SPLDEV1/bin/splenviron.sh -e SPLDEV1 -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLDEV2/bin/splenviron.sh -e SPLDEV2 -c "spl.sh -t start"'
sleep 2
su - cissys -c '/spl/SPLCONF/bin/splenviron.sh -e SPLCONF -c "spl.sh -t start"'
sleep 2


Note 3:
-------

Hoe te (her-)compileren van de ETM Cobol objecten

Je kunt hiervoor het "co.sh" of het customized "co_BD.sh" script gebruiken.
Het co_BD.sh script, is een copy van het co.sh script in het /prj/spl/<INSTANCE_NAME>/bin directory, 
en dit script prompt de gebruiker voor de DB2USER / DB2PASSWORD credentials.

Syntax:

co_BD.sh -p <CobolSourceName>.cbl

Hoe te gebruiken:
1.	Logon als de juiste etmuser op AIX
2.	Run nu de alias van de juiste instance om de juiste environment in te stellen, en om je aan de 
        juiste-ETM instance te verbinden.
3.	Zorg ervoor dat je de juiste DB2User and DB2Password kent.

Nu kun je cobol objecten compileren, als in het volgende voorbeeld:

Voorbeeld op "S3" partition op AIX:

Na AIX logon als "ccbsys" (de etm instance owner op de S3 partition):

Type SPLS3 or SPLUI to set the environment
/home/ccbsys >SPLS3

060918.13:37:40 <info> DB2DIR Environment set to /prj/db2/admin/iinvu02/sqllib/
Version ................ (SPLVERSION) : V1.5.15.0
Database Type ............... (SPLDB) : db2
App Dir - Logs ............. (SPLAPP) : /prj/spl/splapp/SPLS3
Environment Name ....... (SPLENVIRON) : SPLS3
Environment Code Directory (SPLEBASE) : /prj/spl/SPLS3
Build Directory .......... (SPLBUILD) : /prj/spl/SPLS3/cobol/build
Runtime Directory .......... (SPLRUN) : /prj/spl/SPLS3/runtime/db2
Cobol Copy Path ......... (SPLCOBCPY) : /prj/spl/SPLS3/cobol/source/cm:/prj/spl/SPLS3/tuxedo/templates:/prj/spl/SPLS3/tuxedo/genSources:/prj/spl/SPLS3/cobol/source:/prj/spl/SPLS3/product/tuxedo8.1/cobinclude

/prj/spl/SPLS3 >cd /prj/spl/SPLS3/cobol/source/cm


/prj/spl/SPLS3/cobol/source/cm >co_BD.sh -p CMPCSU2B.cbl

060918.13:37:57 <info> co_BD.sh : Compile Started Mon Sep 18 13:37:57 CDT 2006
060918.13:37:57 <info> Build Directory = /prj/spl/SPLS3/cobol/build
060918.13:37:57 <info> Compiling for db2 database
060918.13:37:57 <info> Compilation requested by ccbsys for version V1.5.15.0
060918.13:37:57 <info> Environment SPLS3
060918.13:37:57 <info> Using cobol directory /opt/microfocus/cobol
060918.13:37:57 <info> DB2DIR Environment set to /prj/db2/admin/iinvu02/sqllib/

Please, type DBUSER userid to connect to database : 
Please, type in a password of c userid to connect to database : 

060918.13:38:27 <info> DB2 Compile : Local DB ALIAS = IVOOIS01
060918.13:38:27 <info> DB2 Compile : collection     = IVOOIS

060918.13:38:28 <info> Compiling one object only - CMPCSU2B.cbl
060918.13:38:29 <info> Program : CMPCSU2B ; Expand Return Code.. : 0
060918.13:38:29 <info> Program : CMPCSU2B ; Prepare Return Code. : 0
060918.13:38:30 <info> Program : CMPCSU2B ; Compile Return Code. : 0
060918.13:38:30 <info> FINISHED COMPILATION

Note 4:
-------

Hoe test je een DB2 connectie vanaf een AIX partition.
Indien op een AIX partition, "DB2 connect ESE" correct geinstalleerd is, wil je misschien de verbinding vanaf AIX, 
naar DB2 op Z testen. Dat kan zoals in het volgende voorbeeld:

1.	login als de juiste ETM instance owner (zoals bijv. iinvu02)
2.	type db2 <enter> 

De DB2 client utility wordt gestart en de bijbehorende prompt verschijnt:

db2 =>

Voer nu in:

db2=> Connect to <alias_name> user <user>

Vervolgens wordt om het password gevraagd, en hierbij is dan ook getest of de verbinding werkt.

Voorbeeld:

db2=> connect to IVOOCF01 user $SCCB60 using sa876dfy

Als extra test, kun je ook proberen om de huidige datum uit een DB2 dummy table op te vragen, via het commando:

db2=> select current date from sysibm.sysdummy1

Opmerking:

De ETM instance owner dient wel in zijn .profile een aantal DB2 environment variables te hebben staan, 
zodat DB2 correct werkt, zoals:

export DB2_HOME=/prj/db2/admin/<db2_user>
. $DB2_HOME/sqllib/db2profile


Note 5:
-------

Environment variabelen voor de ETM Instance owner op UNIX

De ETM software owner, of ook wel de ETM Instance owner, heeft op unix / AIX, een aantal noodzakelijke 
environment variabelen nodig in het .profile bestand. 

Stel dat we als voorbeeld nemen, de Instance SPLDEV1 die geinstalleerd is in "/spl/SPLDEV1"
In dat geval heeft de Instance owner zeker de volgende variabelen nodig. Je kunt ze aanpassen naar de environment 
die voor jou speelt, en direkt in het .profile file kopieren.

1. Algemene vars die verwijzen naar Support software als Java, Perl, DB2 connect e.d.

export COBDIR=/opt/SPLcobAS40		# of hogere versie
export COBMODE=64
export JAVA_HOME=/usr/java131		# of hogere versie
export LC_MESSAGES=C
export LANG=C
LD_LIBRARY_PATH=/spl/V1515_SFix2_BASE_SUN_DB2/runtime/db2:/spl/V1515_SFix2_BASE_SUN_DB2/product/tuxedo8.1/lib:/opt/SPLcobAS40/lib:/usr/local/lib:/opt/IBMdb2/db282/sqllib/lib::

3.	Vars mbt "deze" Instance

SPLAPP=/spl/splapp/V1515_SFix2_BASE_SUN_DB2
SPLBCKLOGDIR=/tmp
SPLBUILD=/spl/V1515_SFix2_BASE_SUN_DB2/cobol/build
SPLCOBCPY=/spl/V1515_SFix2_BASE_SUN_DB2/cobol/source/cm:/spl/V1515_SFix2_BASE_SUN_DB2/tuxedo/templates:/spl/V1515_SFix2_BASE_SUN_DB2/tuxedo/genSources:/spl/V1515_SFix2_BASE_SUN_DB2/cobol/source:/spl/V1515_SFix2_BASE_SUN_DB2/product/tuxedo8.1/cobinclude
SPLCOMMAND='ksh -o vi'
SPLCOMP=microfocus
SPLDB=db2
SPLEBASE=/spl/V1515_SFix2_BASE_SUN_DB2
SPLENVIRON=V1515_SFix2_BASE_SUN_DB2
SPLFUNCGETOP=''
SPLGROUP=cisusr
SPLHOST=sf-sunapp-22
SPLLOCALLOGS=/spl/vInd/local/logs
SPLLOGS=/spl/V1515_SFix2_BASE_SUN_DB2/logs
SPLQUITE=N
SPLRUN=/spl/V1515_SFix2_BASE_SUN_DB2/runtime/db2
SPLSOURCE=/spl/V1515_SFix2_BASE_SUN_DB2/cobol/source
SPLSUBSHELL=ksh
SPLSYSTEMLOGS=/spl/V1515_SFix2_BASE_SUN_DB2/logs/system
SPLUSER=cissys
SPLVERS=1
SPLVERSION=V1.5.15.1
SPLWEB=/spl/V1515_SFix2_BASE_SUN_DB2/cisdomain/applications
T=/spl/V135_MASTERTEMPLATE_UNIX
TERM=ansi
THREADS_FLAG=native
TUXCONFIG=/spl/V1515_SFix2_BASE_SUN_DB2/etc/tuxconfig.bin
TUXDIR=/spl/V1515_SFix2_BASE_SUN_DB2/product/tuxedo8.1
ULOGPFX=/spl/V1515_SFix2_BASE_SUN_DB2/logs/system/ULOG




THREADSTACKSIZE:
----------------

Each dispatched thread is created with the stack size specified by THREADSTACKSIZE (or TA_THREADSTACKSIZE). 
If this parameter is not specified or has a value of 0, the operating system default is used. On a few operating systems 
on which the default is too small to be used by the BEA Tuxedo system, a larger default is used.
If you change the default of 0, you should specify the parameter in bytes.



55.3 Installing Micro focus:
============================


55.4 Installing Java or JRE:
============================

What is it?:
------------

- Java Compiler (javac):  Compiles programs written in the Java programming language into bytecodes.

- Java Interpreter (java):  Executes Java bytecodes.  In other words, it runs 
  programs written in the Java programming language.

- The Java 2 Runtime Environment is intended for software developers 
  and vendors to redistribute with their applications.

  The Java(TM) 2 Runtime Environment contains the Java virtual machine, 
  runtime class libraries, and Java application launcher that are 
  necessary to run programs written in the Java progamming language. 
  It is not a development environment and does not contain development 
  tools such as compilers or debuggers.  For development tools, see the 
  Java 2 SDK, Standard Edition.

- SDK, JDK
  Java 2 Platform, Standard Edition (J2SE) provides a complete environment for applications development 
  on desktops and servers and for deployment in embedded environments. It also serves as the foundation 
  for the Java 2 Platform, Enterprise Edition (J2EE) and Java Web Services.

- The PATH statement enables a system to find the executables (javac, java, javadoc, etc.) 
  from any current directory.

- The CLASSPATH tells the Java virtual machine and other applications (which are located in the 
  "jdk_<version>\bin" directory) where to find the class libraries, such as classes.zip file 
  (which is in the lib directory). 

The LIBPATH environment variable tells AIX applications, such as the JVM where to find shared libraries. 
This is equivalent to the use of LD_LIBRARY_PATH in other Unix-based systems. 

On AIX, LIBPATH must be set instead of LD_LIBRARY_PATH. On HP UX, SHLIB_PATH must be set instead of 
LD_LIBRARY_PATH. On Windows NT, no variable for shared libraries is required.



For AIX, a number of Java SDK's and JRE's are available, e.g.
December 2004 - SDK 1.3.1 32-bit PTF (APAR IY65310) released and JRE 1.3.1 32-bit refreshed, 
both using ca131-20041210 build (SR8). 
December 2004 - SDK 1.3.1 64-bit PTF (APAR IY65311) released and JRE 1.3.1 64-bit refreshed, 
both using caix64131-20041210 build (SR8). 



How to install?:
----------------

- Question: Can all these java releases co-exist on a machine? In which directories are these releases installed? 
    Answer: 
Yes, releases can co-exist. 
Java 1.1.8 installs in /usr/jdk_base 
Java 1.2.2 installs in /usr/java_dev2 
Java 1.3.0 installs in /usr/java130 
Java 1.3.1 64-bit install in /usr/java13_64 
Java 1.3.1 installs in /usr/java131 
Java 1.4 64-bit install in /usr/java14_64 
Java 1.4 installs in /usr/java14

- Question: What AIX levels are required for Java releases? 
    Answer: 
To take advantage of latest AIX fixes it is recommended/required that latest AIX Recommended Maintenance Level 
be used. The following is the minimum AIX level required at the time when a Java release was first released: 
Java 1.1.8 requires AIX 4.2.1 
Java 1.2.2 requires AIX 4.3.3 PLUS fixes 
Java 1.3.0 requires AIX 4.3.3.10 PLUS fixes 
Java 1.3.1 64-bit requires AIX 5.1.0.10 
Java 1.3.1 requires AIX 4.3.3.75 
Java 1.4 64-bit requires at least AIX 5.1.0.75 or AIX 5.2.0.10 
Java 1.4 requires at least AIX 5.1.0.75 or AIX 5.2.0.10


Question: What AIX levels are required for Java releases?
    Answer:
To take advantage of latest AIX fixes it is recommended/required that latest AIX Recommended Maintenance Level 
be used. The following is the minimum AIX level required at the time when a Java release was first released:
Java 1.1.8 requires AIX 4.2.1 
Java 1.2.2 requires AIX 4.3.3 PLUS fixes 
Java 1.3.0 requires AIX 4.3.3.10 PLUS fixes 
Java 1.3.1 64-bit requires AIX 5.1.0.10 
Java 1.3.1 requires AIX 4.3.3.75 
Java 1.4 requires at least AIX 5.1.0.75 or AIX 5.2.0.10
Java 5 requires at least AIX 5.2.0.75 or AIX 5.3.0.30



- Question: What paths do I need to set to use a specific Java release on my system? 
    Answer: 
Java 1.1.8: 
PATH=/usr/jdk_base/bin:$PATH 
Java 1.2.2: 
PATH=/usr/java_dev2/jre/sh:/usr/java_dev2/sh:$PATH 

Java 1.3.0 
PATH=/usr/java130/jre/bin:/usr/java130/bin:$PATH 

Java 1.3.1 64-bit: 
PATH=/usr/java13_64/jre/bin:/usr/java13_64/bin:$PATH 

Java 1.3.1 
PATH=/usr/java131/jre/bin:/usr/java131/bin:$PATH 

Java 1.4 64-bit: 
PATH=/usr/java14_64/jre/bin:/usr/java14_64/bin:$PATH 

Java 1.4 
PATH=/usr/java14/jre/bin:/usr/java14/bin:$PATH


Install JDK or SDK:

For base images after you downloaded either packagename.tar or the packagename.tar.gz file 
(the latter is recommended if you have gunzip utility available), you need to extract packagename from 
the downloaded file: 
# tar -xvf packagename.tar  (example: tar -xvf Java14.sdk.tar), or 
# gunzip -c packagename.tar.gz | tar -xvf -  (example: gunzip -c Java14.sdk.tar.gz | tar -xvf - ) 

For update images the .bff files are ready to be installed. Before installing, remove the old .toc file (if it exist) 
in the directory containing the .bff images. 

You can use the smitty command to install (both base and update images): 

        Run "smitty install"
        Select "Install and Update Software"
        Select "Install Software"
        Specify directory containing the images


Install JRE:

The JRE installation is simple. After downloading the package, create a directory where you want to install, 
then unpackage the files where /java_home is a directory of your choice and jre## refers to the specific 
JRE image from the download page. 
mkdir -p /java_home
cd /java_home 
tar -xvpf jre##.tar 
or 
gunzip -c < jre##.tar.gz | tar -xvpf -

 
How to check your java version?
-------------------------------

/software/java:>java -version
java version "1.3.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1)
Classic VM (build 1.3.1, J2RE 1.3.1 IBM AIX build ca131ifx-20040721a SR7P (JIT enabled: jitc))

/software/java:>java -fullversion
java full version "J2RE 1.3.1 IBM AIX build ca131ifx-20040721a SR7P"

/root:>which java
/usr/java131/bin/java

To check the Java filesets on your system:
# lslpp -l | grep Java


/root:>lslpp -l | grep Java
  Java131.rte.bin           1.3.1.16  COMMITTED  Java Runtime Environment
  Java131.rte.lib           1.3.1.16  COMMITTED  Java Runtime Environment
                                                 Java-based build tool.
                                                 JavaBeans(TM) (EJB(TM)).
                                                 Javadocs
                                                 Java(TM) technology-based Web
                                                 Java(TM) technology-based Web
                                                 Javadocs
  idebug.rte.hpj             9.2.5.0  COMMITTED  High-Performance Java Runtime
  idebug.rte.jre             9.2.5.0  COMMITTED  Java Runtime Environment
  idebug.rte.olt.Java        9.2.5.0  COMMITTED  Object Level Trace Java


Notes:
------

Note 1:
-------

thread

Q:

Unable to install Java 1.4 due to License Problem 
I am trying to install Java14_64.sdk package on AIX 5.3 ML4. The install fails with the message below, 
the license file IS on the system/media but for some reason is not recognized. 
When I try to install the license on its own it fails with a pre-requisite failure, the pre-requisite 
being the above package - Java14_64.sdk. 

Any ideas out there ?


Selected Filesets
-----------------
Java14_64.sdk 1.4.0.1 # Java SDK 64-bit

<< End of Success Section >>

FILESET STATISTICS
------------------
1 Selected to be installed, of which:
1 Passed pre-installation verification
----
1 Total to be installed

LICENSE AGREEMENT FAILURES
------------------
The installation cannot proceed because the following filesets
require software license agreement files which could not be
found
on the system or installation media:

Java14_64.sdk 


A:

The downloadable install images at:
http://www-128.ibm.com/developerwork...x/service.html
http://www-128.ibm.com/developerworks/java/jdk/aix/service.html
have both the 1.4.2.0 base images (ca1420-20040626) and the
current 'latest' level (ca142-20060824). You should be able
to use the 1.4.2.0 from that site plus your 1.4.2.1 update.

Paul Landay


Some jre versions on AIX:
-------------------------

AIX 5.1 ML5 comes with APAR IY52512
IBM SDK 1.3.1 SR7 32-bit (APAR IY52512) JavaTM 2 Runtime Environment, Standard Edition (build 1.3.1) Classic VM 
(build 1.3.1, J2RE 1.3.1 IBM AIX build ca131-20040517 (JIT enabled: jitc)) 

AIX 5.2 ML5 comes with APAR IY58350
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1)
Classic VM (build 1.3.1, J2RE 1.3.1 IBM AIX build ca131ifx-20040721a SR7P (JIT enabled: jitc))


IY65305: JAVA142 32-BIT PTF : CA142IFX-20041203
== IY58350 : JAVA 1.3.1 32-bit SR7P : ca131ifx-20040721a

SDK 1.3.1 32-bit PTFs since GA:
ca131-20040517
					Java131.rte.bin
        APAR #    Fullversion		fileset level		"SR" #
        ------    -----------		---------------		------
	IY76252   ca131-20051025	1.3.1.18		SR 9
	IY65310   ca131-20041210	1.3.1.17		SR 8
  ->    IY58350   ca131ifx-20040721a    1.3.1.16                SR 7P
  ->	IY52512   ca131-20040517	1.3.1.15		SR 7
	IY50443   ca131-20031105	1.3.1.13		SR 6a
	IY49074   ca131-20031021	1.3.1.12		SR 6
	IY47055   ca131-20030630a	1.3.1.11		N/A
	IY45632   ca131-20030630	1.3.1.10		SR 5
	IY45288   ca131-20030329	1.3.1.9			N/A
	IY40440	  ca131-20030329	1.3.1.8			SR 4
	IY39508   ca131-20030122a	1.3.1.7			N/A
	IY38011	  ca131-20021107	1.3.1.6			SR 3W
	IY33957	  ca131-20021102	1.3.1.5			SR 3
        IY30887   ca131-20020706	1.3.1.2			SR 2


SDK 1.3.1 64-bit PTFs since GA:
					Java13_64.rte.bin
        APAR #    Fullversion		fileset level		"SR" #
        ------    -----------		-----------------	------

	IY76253   caix64131-20051025 	1.3.1.10		SR 9	
	IY65311   caix64131-20041210 	1.3.1.9			SR 8	
	IY58414   caix64131ifx-20040721 1.3.1.8			SR 7P	
	IY57370   caix64131-20040517	1.3.1.7			SR 7
        IY49076   caix64131-20031021	1.3.1.6			SR 6
	IY45633   caix64131-20030618	1.3.1.5			SR 5
	IY42844	  caix64131-20030329	1.3.1.4			SR 4
	IY34010   caix64131-20021102	1.3.1.3			SR 3
        IY30923   caix64131-20020706	1.3.1.2			SR 2


SDK 1.4 64-bit PTFs since 1.4.0 GA:
                                        Java14_64.sdk
    APAR #   Fullversion                fileset level   "SR" #
    ------   -----------                -------------   ------

    IY84054  caix64142-20060421           1.4.2.75      142 SR5 
    IY81444  caix64142ifx-20060209        1.4.2.51      142 SR4 (repackaged)
    IY77461  caix64142-20060120           1.4.2.50      142 SR4 (bad)
    IY75004  caix64142-20050929           1.4.2.20      142 SR3
    IY72502  caix64142-20050609           1.4.2.10      142 SR2
    IY70332  caix64142sr1aifx-20050414    1.4.2.5       N/A
    IY68122  caix64142sr1a-20050209       1.4.2.3       142 SR1a
 ==   IY62851  (IY63533 for download)
             caix64142-20040917           1.4.2.1       1.4.2 SR 1
    IY54664  caix641420-20040626          1.4.2.0       N/A (1.4.2 GA code)
    IY58415  caix641411ifx-20040810       1.4.1.4       1.4.1 SR 3
    IY52686  caix641411-20040301          1.4.1.3       1.4.1 SR 2
    IY48526  caix641411-20030930          1.4.1.2       1.4.1 SR 1
    IY47538  caix64141-20030703a          1.4.1.1       N/A
    IY43716  caix64141-20030522           1.4.1.0       N/A

Latest: 1.4.2 Service Release 5 (caix64142-20060421)


Other notes:
------------

jre 131 32 bit:
installs in /usr/java131

5100-08 (APAR IY70781)  - min AIX51
5200-06 (APAR IY67913)  - min AIX52
5300-02 (APAR IY69190)  - min AIX53

jre 131 64 bit:
installs in /usr/java13_64

5100-08 (APAR IY70781)  - min AIX51
5200-06 (APAR IY67913)  - min AIX52
5300-02 (APAR IY69190)  - min AIX53


Java 1.1.8, 1.2.2, and 1.4.1 are no longer supported by IBM. 

For AIX 4.3.3, which is out of support, Java 1.3.1 requires the AIX 4330-10 Recommended Maintenance Level.
For AIX 5.1, Java 1.3.1 requires the AIX 5100-03 Recommended Maintenance Level.
For AIX 5.2, Java 1.3.1 requires the AIX 5200-01 Recommended Maintenance Level.
For AIX 5.3, Java 1.3.1 requires Version 5.3.0.1 (APAR IY58143) or later.


Java version on AIX 5.3:
========================

The latest Java technology is included with base AIX 5L V5.3. 
The IBM 32-bit SDK for AIX 5L, Java 2 Technology Edition V1.4 ships with AIX 5L V5.3. 
The IBM 64-bit SDK for AIX 5L, Java 2 Technology Edition V1.4 is available on the AIX 5L V5.3 Expansion Pack 
and the AIX 5L Java Web site at ibm.com/developerworks/java/jdk/aix.


JVM problems and AIX Environment Variables in relation to Java:
===============================================================

Default Behavior of Java on AIX

This section describes the settings as they are right now. These settings may, and in most cases will, 
change over time. The README or SDK Guide accompanying the SDK are always the most up-to-date references 
for such settings.

Java uses the following environment settings:

AIXTHREAD_SCOPE=S 
This setting is used to ensure that each Java thread maps 1x1 to a kernel thread. The advantage of this approach 
is seen in several places; a notable example is how Java exploits Dynamic Logical Partitioning (DLPAR); 
when a new CPU is added to the partition, a Java thread can be scheduled on it. This setting should not be 
changed under normal circumstances. 

AIXTHREAD_COND_DEBUG, AIXTHREAD_MUTEX_DEBUG and AIXTHREAD_RWLOCK_DEBUG 
These flags are used for kernel debugging purposes. These may sometimes be set to OFF. If not, switching 
them off can provide a good performance boost.

LDR_CNTRL=MAXDATA=0x80000000 
This is the default setting on Java 1.3.1, and controls how large the Java heap can be allowed to grow. 
Java 1.4 decides the LDR_CNTRL setting based on requested heap. See Getting more memory in AIX for your 
Java applications for details on how to manipulate this variable.

JAVA_COMPILER 
This decides what the Just-In-Time compiler will be. The default is jitc, which points to the IBM JIT compiler. 
It can be changed to jitcg for the debug version of JIT compiler, or to NONE for switching the JIT compiler off 
(which in most cases is the absolute worst thing you can do for performance).

IBM_MIXED_MODE_THRESHOLD 
This decides the number of invocations after which the JVM JIT-compiles a method. This setting varies 
by platform and version; for example, it is 600 for Java 1.3.1 on AIX. 


Note 1:
-------

About o_maxdata and LDR_CNTRL:

... space for the native heap. Moving the fence down allows the native heap to grow, while reducing shared memory. 
For a setting of o_maxdata = N, the fence is placed at 0x30000000+N. For several good reasons, 
it is recommended to set o_maxdata to a value that is the start of a particular segment, 
such as 0xn0000000. In this case, the fence sits between segments 2+n and 3+n, which translates 
to n segments for the native heap, and 10-n segments for shared memory.

o_maxdata=8: 8 seg for native, 2 seg for shared
o_maxdata=7: 7 seg for native, 3 seg for shared
o_maxdata=6: 6 seg for native, 4 seg for shared
o_maxdata=5: 5 seg for native, 5 seg for shared
o_maxdata=4: 4 seg for native, 6 seg for shared
o_maxdata=3: 3 seg for native, 7 seg for shared *
o_maxdata=2: 2 seg for native, 8 seg for shared


By default, o_maxdata is set to 0x80000000, leaving 2 GB for native heap and 512 MB for shared memory. 
If you attempt to allocate a Java heap larger than 1 GB, it fails because Java tries to use shared memory 
for heap, and there is only 512 MB of shared memory available. If you set IBM_JAVA_MMAP_JAVA_HEAP 
in the environment and try to allocate a heap larger than 512 MB, JVM will be unable to allocate the heap. 
The solution is to adjust o_maxdata in such a way that the size of shared memory grows large enough 
to accommodate the Java heap. The next section shows you how to do this. 


So how do you go to a larger Java heap? You need to change o_maxdata to increase the amount of 
shared memory address space. You can use the following calculations to come up with the appropriate value 
for o_maxdata. Supposing you need a maximum heap size of J bytes, you would invoke Java as 

java -mxJ <other arguments> 

If J is less than 1 GB, and IBM_JAVA_MMAP_JAVA_HEAP is not set, the default setup will suffice. 
If J is > 1 GB, or if IBM_JAVA_MMAP_JAVA_HEAP is set, use o_maxdata = 0xn0000000 

where  n = (10 - ceil(J/256M)) or 8 

whichever is smaller. The function ceil rounds up the argument to the next integer. 

For example, if you need to allocate 1500 MB of heap, we have 

n = (10 - ceil(1500M/256M)) = (10 - 6) = 4. If you set o_maxdata = 0x40000000, 

you will be able to allocate the needed size of heap. To change o_maxdata, set the following 
environment variable: LDR_CNTRL=MAXDATA=<new o_maxdata value> 

The above example would set the following environment variable: LDR_CNTRL=MAXDATA=0x40000000
 

To verify that your calculation is accurate, you can try the following commands: 
$ export LDR_CNTRL=MAXDATA=0x40000000 
$ java -mx1500m -version
 
Setting the IBM_JAVA_MMAP_JAVA_HEAP variable

# export IBM_JAVA_MMAP_JAVA_HEAP=true


So, if you need to enhance memory for Websphere 5.x 32 bits, put the following lines
into the startServer.sh script, or in /prj/was/omgeving.rc:

export LDR_CNTRL=MAXDATA=0xn0000000
export IBM_JAVA_MMAP_JAVA_HEAP=true

try:

export AIXTHREAD_SCOPE=S
export AIXTHREAD_MUTEX_DEBUG=OFF
export AIXTHREAD_RWLOCK_DEBUG=OFF
export AIXTHREAD_COND_DEBUG=OFF
export LDR_CNTRL=MAXDATA=0x40000000 
export IBM_JAVA_MMAP_JAVA_HEAP=TRUE

or

export IBM_JAVA_MMAP_JAVA_HEAP=true
export LDR_CNTRL=MAXDATA=0x80000000

or

export IBM_JAVA_MMAP_JAVA_HEAP=true
export LDR_CNTRL=MAXDATA=0x80000000 


Note 2:
-------

I think the problem is that there are typically a lot of JNI allocations in
the heap that are pinned and are allocated for the life of the application.
Most of these are allocated during startup. If the min and max heap sizes
are the same, these pinned allocations are scattered throughout the heap.
Whereas if the min heap size is quite low, most of these allocations will be
closer together at the start of the heap, leaving the bulk of the heap (when
it's expanded) more free of pinned memory.


Note 3:
-------





55.5 Installing Perl:
=====================

AIX supports dynamically loadable objects as well as shared libraries. Shared libraries by convention 
end with the suffix .a, which is a bit misleading, as an archive can contain static as well as 
dynamic members. For perl dynamically loaded objects we use the .so suffix also used on many other platforms.

Note that starting from Perl 5.7.2 (and consequently 5.8.0) and AIX 4.3 or newer Perl uses the AIX native 
dynamic loading interface in the so called runtime linking mode instead of the emulated interface that 
was used in Perl releases 5.6.1 and earlier or, for AIX releases 4.2 and earlier. This change does break 
backward compatibility with compiled modules from earlier perl releases. The change was made to make 
Perl more compliant with other applications like Apache/mod_perl which are using the AIX native interface. 
This change also enables the use of C++ code with static constructors and destructors in perl extensions, 
which was not possible using the emulated interface.


Starting from AIX 4.3.3 Perl 5 ships standard with AIX. (Perl 5.8.0 with AIX 5L V5.2, 5.6.0 with AIX 5L V5.1, 
5.005_03 with AIX 4.3.3.)

You either get the source code and compile Perl, or in some situations you might be happy with installing
a binary build.




55.6 Installing DB2 Connect Enterprise Edition 8.x:
===================================================

DB2 Connect 
DB2(R) Connect provides fast and robust connectivity to IBM(R) mainframe databases for e-business 
and other applications running under UNIX(R) and Windows(R) operating systems. 

DB2 Connect Personal Edition provides direct connectivity to host and iSeries DB2 servers, while 
DB2 Connect Enterprise Edition provides indirect connectivity that allows clients to access 
host and iSeries DB2 servers through the DB2 Connect server. DB2 Connect Unlimited Edition 7 and 
DB2 Connect Application Server Edition provide unique packaging solutions that make product selection 
and licensing easier. 



Note 1:
-------

Log on to the system as a user with root authority. 
Refer to the CD-ROM label to ensure that you are using the CD-ROM with your appropriate language. 
Change to the directory where the CD-ROM is mounted by entering the following command: 
   cd /cdrom 


- For AIX 4.3.3, HP-UX and Linux 

Enter the ./db2setup command to start the DB2 Setup wizard.
 
- For Solaris Operating Environment and AIX 5L 

Copy product.tar.Z, where product represents the product you are licensed to install, to a temporary filesystem. 
Enter the following command to start the DB2 Setup wizard: 

# zcat product.tar.Z | tar -xf - ; ./product/db2setup 

For example, if the product name for DB2 Enterprise Server Edition is ese, then enter the following command: 

# zcat ese.tar.Z | tar -xf - ; ./ese/db2setup 

After a moment, the IBM DB2 Setup Launchpad opens. 

When you have completed your installation, DB2 will be installed in the one of the following directories: 

For AIX: 
/usr/opt/db2_08_01 

For HP-UX, Linux, Solaris Operating Environment: 
/opt/IBM/db2/V8.1 

The installation logs db2setup.his, db2setup.log, and db2setup.err are located, by default, 
in the /tmp directory. You can specify the location of the log files. 
The db2setup.log file captures all DB2 installation information including errors. 
The db2setup.his records all DB2 installations on your machine. 
DB2 appends the db2setup.log file to the db2setup.his file. The db2setup.err file captures any error output 
that is returned by Java (for example, exceptions and trap information). 

If you want your DB2 product to have access to DB2 documentation either on your 
local computer or on another computer on your network, then you must install the DB2 Information Center. 
The DB2 Information Center contains documentation for DB2 Universal Database and DB2 related products. 


Note 2: db2admin
----------------

db2admin - DB2 Administration Server Command 
This utility is used to manage the DB2 Administration Server. 

Authorization 
Local administrator on Windows, or DASADM on UNIX based systems. 

Required connection 
None 

Command syntax 
>>-db2admin----------------------------------------------------->

>--+-----------------------------------------------------------------+-><
   +-START-----------------------------------------------------------+
   +-STOP--+--------+------------------------------------------------+
   |       '-/FORCE-'                                                |
   +-CREATE--+----------------------+--+---------------------------+-+
   |         '-/USER:--user-account-'  '-/PASSWORD:--user-password-' |
   +-DROP------------------------------------------------------------+
   +-SETID--user-account--user-password------------------------------+
   +-SETSCHEDID--sched-user--sched-password--------------------------+
   +- -?-------------------------------------------------------------+
   '- -q-------------------------------------------------------------

Note: 
If no parameters are specified, and the DB2 Administration Server exists, this command returns the name 
of the DB2 Administration Server. 

START 
Start the DB2 Administration Server. 

STOP /FORCE 
Stop the DB2 Administration Server. The force option is used to force the DB2 Administration Server to stop, 
regardless of whether or not it is in the process of servicing any requests. 

CREATE /USER: user-account /PASSWORD: user-password 
Create the DB2 Administration Server. If a user name and password are specified, the DB2 Administration Server 
will be associated with this user account. If the specified values are not valid, the utility returns 
an authentication error. The specified user account must be a valid SQL identifier, and must exist in the security database. It is recommended that a user account be specified to ensure that all DB2 Administration Server functions can be accessed. 

Note: 
To create a DAS on UNIX systems, use the dascrt command. 

Starting and stopping the DAS:

db2admin stop
db2admin start 


Note 3: db2start
----------------

db2start - Start DB2 Command 
Starts the current database manager instance background processes on a single database partition 
or on all the database partitions defined in a partitioned database environment. Start DB2 at the server 
before connecting to a database, precompiling an application, or binding a package to a database. 

db2start can be executed as a system command or a CLP command


Note 4: example cronjobs
------------------------

30 20 * * 1-6 /usr/opt/db2_08_01/adm/db2stop force >> /home/db2inst1/DBMaintenance/dbbkup.log 2>&1
31 20 * * 1-6 /usr/opt/db2_08_01/adm/db2start >> /home/db2inst1/DBMaintenance/dbbkup.log 2>&1


Note 5: sample db2 connect processes:
-------------------------------------

Using AIX, you would use the command ps -ef in order to examine processes. On Solaris and HP-UX, ps -ef 
will only show the db2sysc process (the main DB2 engine process) for all server-side processes 
(eg: agents, loggers, page cleaners, and prefetchers). If you're using Solaris or HP-UX, you can see these 
side processes with the command /usr/ucb/ps -axw. Both of these versions of the ps command work on Linux.

When performing this command on a computer running the DB2 Universal Database client or server software, 
you may see several DB2 processes listed. 

Example 1:

/root:#ps -ef | grep db2
 iinvu02 188456 422094   0 13:53:02      -  0:00 db2agent (idle) 0
   db2as 266468      1   0 13:52:10      -  0:00 /prj/db2/admin/db2as/das/adm/db2dasrrm
 iinvu02 282624 417996   0 13:53:03      -  1:13 db2disp 0
    root 295060      1   0 13:52:24      -  0:00 db2wdog 0
 iinvu01 299158 303256   0 13:52:26      -  0:00 db2resync 0
 iinvu01 303256 295060   0 13:52:24      -  0:00 db2sysc 0
    root 307350 303256   0 13:52:24      -  0:00 db2ckpwd 0
    root 311448 303256   0 13:52:24      -  0:00 db2ckpwd 0
    root 315546 303256   0 13:52:24      -  0:00 db2ckpwd 0
 iinvu01 319644 303256   0 13:52:24      -  0:00 db2gds 0
 iinvu01 323742 303256   0 13:52:24      -  0:00 db2ipccm 0
 iinvu01 327840 303256   0 13:52:25      -  0:00 db2tcpcm 0
 iinvu01 331938 303256   0 13:52:25      -  0:00 db2tcpcm 0
 iinvu01 336036 303256   0 13:52:25      -  0:00 db2tcpcm 0
 iinvu01 340134 303256   0 13:52:25      -  0:00 db2tcpcm 0
 iinvu01 344232 319644   0 13:52:26      -  0:00 db2srvlst 0
 iinvu01 348330 303256   0 13:52:26      -  0:00 db2spmrsy 0
 iinvu01 352428 319644   0 13:52:26      -  0:00 db2spmlw 0
 iinvu02 356606 401604   0 13:52:46      -  0:16 db2hmon 0
 iinvu01 360624 303256   0 13:52:26      -  0:18 db2hmon 0
 iinvu01 377016      1   0 13:52:32      -  3:00 /prj/db2/admin/iinvu01/sqllib/bin/db2fmd -i iinvu01 -m /prj/db2/admin/iinvu01/sqllib/lib/libdb2gcf.a
 iinvu02 389128 417996   0 13:52:46      -  0:00 db2srvlst 0
    root 393408      1   0 13:52:38      -  0:00 db2wdog 0
 iinvu02 397514 401604   0 13:52:46      -  0:00 db2resync 0
 iinvu02 401604 393408   0 13:52:38      -  0:00 db2sysc 0
    root 405702 401604   0 13:52:38      -  0:00 db2ckpwd 0
    root 409800 401604   0 13:52:38      -  0:00 db2ckpwd 0
    root 413898 401604   0 13:52:38      -  0:00 db2ckpwd 0
 iinvu02 417996 401604   0 13:52:38      -  0:00 db2gds 0
 iinvu02 422094 401604   0 13:52:39      -  0:00 db2ipccm 0
 iinvu02 426192      1   5 13:52:52      -  2:55 /prj/db2/admin/iinvu02/sqllib/bin/db2fmd -i iinvu02 -m /prj/db2/admin/iinvu02/sqllib/lib/libdb2gcf.a
    root 475370      1   0 13:53:50      -  0:18 /usr/opt/db2_08_01/bin/db2fmcd
   db2as 528466      1   0 13:55:52      -  0:00 /prj/db2/admin/db2as/das/bin/db2fmd -i db2as -m /prj/db2/admin/db2as/das/lib/libdb2dasgcf.a
 iinvu01 561230 323742   0 13:57:29      -  0:03 db2agent (idle) 0
 iinvu02 573686 417996   0 15:11:41      -  0:02 db2agent (idle) 0

Example 2:

    root 49504     1   0 13:13:07    -  0:00 db2wdog 
db2inst1 25844 35124   0 16:04:50    -  0:00 db2pfchr 
db2inst1 35124 65638   0 16:04:17    -  0:00 db2gds 
db2inst1 35540 35124   0 16:04:50    -  0:00 db2loggr (SAMPLE) 
db2inst1 41940 65638   0 16:04:19    -  0:00 db2resync 
db2inst1 45058 35124   0 16:04:50    -  0:00 db2pfchr 
db2inst1 49300 35124   0 16:04:19    -  0:00 db2srvlst 
db2inst1 49626 35124   0 16:04:50    -  0:00 db2dlock (SAMPLE) 
db2inst1 55852 65638   0 16:04:17    -  0:00 db2ipccm 
db2inst1 58168 35124   0 16:04:50    -  0:00 db2loggw (SAMPLE) 
db2inst1 59048 35124   0 16:04:50    -  0:00 db2pfchr 
db2inst1 64010 55852   0 16:04:50    -  0:00 db2agent (SAMPLE) 
db2inst1 65638 22238   0 16:04:17    -  0:00 db2sysc 
db2inst1 70018 35124   0 16:04:50    -  0:00 db2pclnr 
db2inst1 72120 35124   0 16:04:51    -  0:00 db2event (DB2DETAILDEADLOCK) 
db2inst1 74198 65638   0 16:04:17    -  0:00 db2syslog 
db2inst1 74578     1   0 16:04:47    -  0:00 /home/db2inst1/sqllib/bin/db2bp  
  50112C14631 5 


- db2dasrrm: The DB2 Admin Server process. This process supports both local and remote administration requests 
  using the DB2 Control Center 

- db2pclnr: I/O cleaners, associated with data cache buffers

- db2rebal: This process is used to perform a rebalancing of the data when a container is added to a DMS table space.

- db2disp: The DB2 agent dispatcher process. This process dispatches application connections between the 
  logical agent assigned to the application and the available coordinating agents when connection concentration 
  is enabled.

  This process will only exist when connection concentration is enabled.

- The db2ckpwd utility in DB2 is used to verify usernames and passwords for the operating system. 
  db2ckpwd takes a file descriptor as a command line argument and reads the username and password information 
  from that file descriptor.

- db2gds: The DB2 Global Daemon Spawner process that starts all DB2 EDUs (processes) on UNIX. 
  There is one db2gds per instance or database partition  

- db2ipccm: listener process for local applications.

- db2tcpcm: A remote client establishes TCP/IP communications through the db2tcpcm listener process. 

- db2sysc: The main DB2 system controller or engine. Without this process, the database server cannot function.

- db2resync: The resync manager process used to support applications that are using two-phase commit  

- db2wdog: The DB2 watchdog. This process is required since processes in UNIX can only track their 
  parent process ID. Each time a new process is started, the db2gds notifies the DB2 watchdog. 
  In the event that any DB2 process receive a ctrl-c or other abnormal signal, the process send the signal 
  to the watchdog, and it propagates the signal to all of the other processes in the instance. 

- db2ca: Starts the Configuration Assistant. The Configuration Assistant is a graphical interface 
  that is used to manage DB2 database configuration such as database manager configuration, DB2 registry, 
  node directory, database directory and DCS directory

- Agents

  An agent can be thought of as a 'worker' that performs all database operations on behalf of an application. 
  There are two main types of DB2 agents:

  Coordinator Agent (db2agent)
  A coordinator agent (or a coordinating agent) coordinates the work on behalf of an application and communicates 
  to other agents using interprocess communication (IPC) or remote communication protocols. 
  All connection requests from client applications, whether they are local or remote, are allocated a corresponding 
  coordinator agent. 

  Subagent (db2agntp)
  When the intra_parallel database manager configuration parameter is enabled, the coordinator agent distributes 
  the database requests to subagents (db2agntp). These agents perform the requests for the application. 
  Once the coordinator agent is created, it handles all database requests on behalf of its application 
  by coordinating subagents (db2agent) that perform requests on the database. 
  When an agent or subagent completes its work it becomes idle. When a subagent becomes idle, its name changes 
  from db2agntp to db2agnta.

  For example:

  db2agntp processes are active subagents which are currently performing work for the coordinator agent. 
  These processes will only exist when intra-partition parallelism is enabled.

  db2agnta processes are idle subagents that were used in the past by a coordinator agent.

- db2hmon: The db2hmon process has changed in DB2 Universal Database Version 8.2 and is no longer associated 
  with the HEALTH_MON database manager configuration parameter.  
  
  In DB2r Universal DatabaseT (DB2 UDB) Version 8.1, the db2hmon process was controlled by the HEALTH_MON 
  database manager configuration parameter. When HEALTH_MON was set to ON, a single-threaded independent 
  coordinator process named db2hmon would start. This process would terminate if HEALTH_MON was set to OFF. 
  In DB2 UDB Version 8.2, the db2hmon process is no longer controlled by the HEALTH_MON database manager 
  configuration parameter. Rather, it is a stand-alone process that is part of the database server 
  so when DB2 is started, the db2hmon process starts. db2hmon is a special multi-threaded DB2FMP process 
  that is named db2hmon on UNIX/Linux platforms and DB2FMP on Windows. 


Note 6: db2icrt
--------------- 

On UNIX-based systems, the db2icrt utility is located in the DB2DIR/instance directory, 
where DB2DIR represents /usr/opt/db2_08_01 on AIX, and /opt/IBM/db2/V8.1 on all other UNIX-based systems. 
If you have a FixPak or modification level installed in an alternate path, 
the DB2DIR directory is /usr/opt/db2_08_FPn on AIX and opt/IBM/db2/V8.FPn on all other 
UNIX-based systems, where n represents the number of 1 the FixPak or modification level.
The db2icrt utility creates an instance on the directory from which you invoke it. 

>>-db2icrt--+-----+--+-----+--+---------------+----------------->
            +- -h-+  '- -d-'  '- -a--AuthType-'
            '- -?-'

>--+---------------+--+---------------+--+----------------+----->
   '- -p--PortName-'  '- -s--InstType-'  '- -w--WordWidth-'

>--+---------------+--InstName---------------------------------><
   '- -u--FencedID-'

On an AIX machine, to create an instance called "db2inst1" on the directory /u/db2inst1/sqllib/bin, issue 
the following command from that directory: 

Example 1 

On a client machine: 1 
usr/opt/db2_08_01/instance/db2icrt db2inst1 
On a server machine: 1 
usr/opt/db2_08_01/instance/db2icrt -u db2fenc1 db2inst1 
where db2fenc1 is the user ID under which fenced user-defined functions and fenced stored procedures will run. 

Example 2 
On an AIX machine, if you have Alternate FixPak 1 installed, run the following command to create an instance 
running FixPak 1 code from the Alternate FixPak install path: 
/usr/opt/db2_08_FP1/instance/db2icrt -u db2fenc1 db2inst1 


Note 7: Whats an instance? Compared to Unix / Z
-----------------------------------------------

1. What's an Instance? And where is my Subsystem?

Posted 11/8/2005 | by Chris Eaton | Comments (0) | TrackBacks (0) 
If you are new to DB2 LUW from a DB2 for z/OS background then the first thing you probably noticed 
is that there is no subsystem on the LUW platform. Instead you will hear the term Instance used in a similar manner. 
An instance in DB2 for LUW is like a copy of the RDBMS including all the processes that run DB2 and memory 
(address spaces) associated with that instance of DB2 and some configuration parameters (ZPARMS) to control 
that instance. Think of it as a copy of the DB2 code running on a server. You can have as many instances 
as you like running on a single server. Associated with an instance is the concept of an instance owner. 
This is the user that "owns" that instance and has SYSADM authority over the instance and all databases 
inside that instance. SYSADM authority is the highest level of authority in DB2 and lets this user do anything 
within the databases it manages (create, drop, access all data, grant, revoke, etc). 

You can have one or more databases in each instance but a database is not exactly the same as you have 
on z/OS either. On z/OS you have one catalog per subsystem and a database is merely a logical collection of tables, 
indexes that usually have a distinct relationship to a given application. On the LUW platform each database has 
its own catalogs associated with it which stores all the metadata about that database. 

Why the difference? Well, as with many of the differences you will find at the server or storage layer, 
they are mostly due to the "culture" or "industry standard terms" that are typically used in a Linux, 
UNIX or for that matter a Windows environment. An Instance is a common term across a number of distributed 
platform RDBMSs to represent a copy of the database management code running on a server. And you won't likely 
find the term subsystem used to describe anything on a distributed platform (except for maybe some people 
talking about storage but if you dig a bit you will likely find that in a past life these people 
worked on a mainframe).

The other important distinction in this area is that your application connects to a database in the LUW 
environment (not a subsystem or instance). As well if you want to join tables across different databases 
you would use the federated query support built into DB2.


On MVS, OS390, zos:                         On UNIX/Windows:

----------------------------                ----------     -----------------------
| Subsystem                |                |INSTANCE|     |INSTANCE             |
----------------------------                ----------     -----------------------
 |             |       |                        |              |               |
---------     ----    ----                    ------------    ------------    ------------
|CATALOG|     |DB|    |DB|                    |DB+catalog|    |DB+catalog|    |DB+catalog|
---------     ----    ----                    ------------    ------------    ------------


   So, a  Z "Subsystem" <=corresponds to=> an Unix "Instance".


2. Aanvullende Info op Unix:

Na de installatie van DB2 kunt u alleen met DB2 communiceren door het instanti%ren van DB2. 
Met andere woorden, u maakt een object (lees: Database Manager) binnen DB2 aan,
die voor u de communicatie verzorgt.

  Dus een "Instance" <=corresponds to=> "Database Manager"

Stel u heeft een instance van een Database Manager aangemaakt. Deze Database Manager verzorgt de communicatie 
met zowel lokale als remote databases. U dient de Database Manager te instrueren hoe en op welke wijze 
bepaalde databases benaderd kunnen worden. Tevens geeft u aan onder welke `eenvoudige' naam deze set van 
instructies gebruikt kunnen worden. Dit is de zogenaamde Alias. 

In onderstaand figuur wordt schematisch weergegeven hoe de communicatie tussen de platformen 
wordt gerealiseerd.


       AIX                                                 z/OS
                                                                   
------------------------------              -------------------------------------     
|      -------------          |             |                                   |
|      |Application|          |             | Een Partitie                      |
|      ------|-------         |             |                ----------------   |
|            |                |             |                | DBMS 1 port A |  |      
| ------------------------    |             |                |        ----   |  |
| | Instance =           |    |        |------------------------>     |DB|   |  |
| | Database Manager     |    |        |    |                |        ----   |  |
| |                      |    |        |    |                | ----          |  |
| |         ----------   |    |        |---------------------->|DB|          |  |
| |         |Alias 1 |   |    |        |    |                | ----          |  |
| |         |        |------------------    |                ----------------   |
| |         ----------   |    |             |             ----------------      |
| |                      |    |             |             |DBMS 2 port B |      |
| |         ----------   |    |             |             |              |      |
| |         |Alias 2 |   |    |      alias  |             |       ----   |      |
| |         |        |--------------------------------------->    |DB|   |      |
| |         ----------   |    |             |             |       ----   |      |
| |                      |    |             |             |              |      |
| ------------------------    |             |             -----------------     |
-------------------------------             --------------------------------------


Zoals u ziet verloopt communicatie tussen een Applicatie, bijv. Websphere Application Server (WAS),
en het mainframe platform via DB2. Voor de eenvoud nemen we voor nu even aan dat de applicatie binnen 
de WAS de communicatie rechtstreeks aangaat. Binnen DB2 verzorgt een instance van een Database Manager 
de voorgedefinieerde connecties. De Database Manager stelt deze connecties ter beschikking 
middels een alias.

De middels pijltjes `eenvoudig' weergegeven connecties tussen de twee platformen dienen nader bekeken 
te worden. De pijl vanaf een alias tot aan een database op het mainframe wordt een node genoemd. 
Deze node dient vooraf voorzien te worden van informatie op basis waarvan toegang tot de betreffende 
database op het mainframe verkregen kan worden. Het pakket aan informatie gekoppeld aan een node noemen we 
een catalog. De node wordt alsvolgt opgebouwd:

 -------------------------------------------------------------------
 |Alias (heeft de database op het mainframe gekoppeld aan de Node) |
 -------------------------------------------------------------------
                         |
                         |
 -------------------------------------------------------------------------------------------
 |Node (kent het IP nummer van het mainframe en het poortnummer van de DBMS op de partitie)|
 -------------------------------------------------------------------------------------------


E,n alias heeft ,,n verbinding met ,,n DBMS op ,,n partitie op het mainframe. 
Binnen het DBMS `leven' namelijk meerdere databases. Indien een connectie moet worden gelegd 
tussen AIX en een andere database in een andere DBMS op dezelfde partitie dient een nieuwe alias 
(en dus node) aangemaakt te worden.


Bij het configureren van een Remote Database praten we dus over de connectie tussen DB2 
en een database op een partitie op het mainframe. De volgende stappen moeten we doorlopen om een 
werkende connectie aan te maken:

£	Aanmaken van de node;
£	Koppelen van node aan ip-nummer van het mainframe;
£	Koppelen van node aan poortnummer van een partitie op het mainframe;
£	Koppelen van database op het mainframe aan een alias op AIX

De implementatie zullen we laten zien aan de hand van een voorbeeld:

Catalog= ( node={IP + port} ( {Alias=DB} ) )

------------------------------------------------
	
db2=> Catalog tcpip node <nodenaam> <remote ip-adres mainframe> server <poortnummer>	

Het laatste commando initieert feitelijk de node gekoppeld aan het ipnummer 
van de mainframe en het poortnummer van de partitie, waarbij:

Nodenaam: De naam van de node. Deze kunt u zelf kiezen (bijv NOO49 : NOde Ontwikkeling 49).
Ip-adres mainframe:	T-partitie: 10.73.64.183
Poortnummer mainframe	T-partitie: 447 of 448 (afhankelijk van DBMS): BACDB2O = 447 (Ontwikkel omgeving)  
                                                                       BACDB2I  = 448 (Integratie omgeving)

-- Ter controle:


db2=> list node directory

Adds a Transmission Control Protocol/Internet Protocol (TCP/IP) node entry to the node directory. 
The TCP/IP communications protocol is used to access the remote node. 
The CATALOG TCPIP NODE command is run on a client. 

-----------------------------------------------

Vervolgens koppelen we de database op het mainframe middels een node aan een alias. Voer uit:

db2=> Catalog database databasenaam as alias at node nodenaam authentication dcs

Databasenaam:	Een bestaande database binnen het DBMS op het mainframe
Alias:       	Vrij te kiezen naam
Nodenaam:	De naam van de hierboven aangemaakte node

-- Ter controle:

db2=> list db directory

Nu doen we

db2=> Catalog dcs database databasenaam as DBMS

Databasenaam	De hierboven bestaande database binnen het DBMS op het mainframe
DBMS	        Dit is het DataBase Management Systeem op het mainframe.
                Bijv. T-partitie: BACDB2O (Ontwikkel omgeving)                
                                  BACDB2I (Integratie omgeving)

-- Ter controle:

db2=> list dcs directory

Vervolgens loggen wij in het mainframe in om de connectie te testen:

db2=> connect to aliasnaam user user using password

Aliasnaam	De zojuist hierboven aangemaakte alias voor een verbinding met het mainframe
User	        Uw userid of een userid met voldoende rechten op het mainframe (bijv. BDN account)
Password	Password van het toepaste userid

  Dus een sessie tot stand brengen gaat als in het onderstaande voorbeeld:

  connect to pscrx user u@mnx01 using PSWVDB2C;
  set current sqlid = 'F@MNX01'



Dus hoe zit het nu:
===================

Je wilt via DB2 connect naar een remote DB op een mainframe. Op de client doe je:

db2=> Catalog tcpip node <nodenaam> <remote ip-adres mainframe> server <poortnummer>
      db2=> list node directory         (=controle statement)
db2=> Catalog database databasenaam as alias at node nodenaam authentication dcs
      db2=> list db directory           (=controle statement)
db2=> Catalog dcs database databasenaam as DBMS
      db2=> list dcs directory          (=controle statement)

Je neemt een willekeurige handig nodenaam (door jou te kiezen dus) en koppel dat begrip
aan de remote IP en poort.
Dan koppel je de echte databasenaam aan een handige (door jou te kiezen dus) Alias, en dat koppel je dan ook
aan de nodenaam.

Voortaan kun je dan met de Alias een connectie opzetten !



Note: Connection via DB2 Connect:
---------------------------------

First of all install the DB2 client (for me it was DB2connect 7.1) and register it 
with the proper license (using db2licm).

Now you are ready to register your remote database.
I'll need to provide:
hostname,
port,
database name,
authentication method.

For every DB, I need three registrations: tcp/ip node, database and DCS.

Let's start from the tcp/ip node.

Connect to your db2 user (by default db2inst1):

db2inst1@brepredbls01:~> db2
(c) Copyright IBM Corporation 1993,2001
Command Line Processor for DB2 SDK 7.2.0

db2 =>


-- Now from the db2 client command prompt:

catalog tcpip node <nodename> remote <hostaname> server <port>

where nodename is an alias you choose, hostname is the DB2 remote hostname and the port is the DB2 listening port.

example:

catalog tcpip node RIHEP remote rihep.rit server 5023

to unregister it:

uncatalog node RIHEP 

and to list the register nodes:

db2 => list node directory

 Node Directory

 Number of entries in the directory = 3

Node 1 entry:

 Node name                      = AMDSPT
 Comment                        =
 Protocol                       = TCPIP
 Hostname                       = amdahlsvil.ras
 Service name                   = 5023

Node 2 entry:

 Node name                      = AMSVIL
 Comment                        =
 Protocol                       = TCPIP
 Hostname                       = amdahlsvil.ras
 Service name                   = 6021

Node 3 entry:

 Node name                      = RIHEP
 Comment                        =
 Protocol                       = TCPIP
 Hostname                       = rihep.rit
 Service name                   = 5023


-- Now you need to catalog your remote DB2 database:

catalog database <DBname> as <DBalias> at node <nodename> authentication DCS

Where DBname is the name of the remote database, DBalias is the name you are going to use in your connection 
and nodename is the node alias you registered above.
The chosen authentication has been DCS for my environment.

Example:

catalog database ITFINDB2 as ITFINDB2 at node RIHEP authentication DCS

If you wish to unregister the DB:

uncatalog database ITFINDB2 

for the list:

db2 => list db directory

 System Database Directory

 Number of entries in the directory = 3

Database 1 entry:

 Database alias                  = ITFINDB2
 Database name                   = ITFINDB2
 Node name                       = RIHEP
 Database release level          = 9.00
 Comment                         =
 Directory entry type            = Remote
 Authentication                  = DCS
 Catalog node number             = -1

Database 2 entry:

 Database alias                  = DB2PROD
 Database name                   = DB2PROD
 Node name                       = AMSVIL
 Database release level          = 9.00
 Comment                         =
 Directory entry type            = Remote
 Authentication                  = DCS
 Catalog node number             = -1

Database 3 entry:

 Database alias                  = DB2DSPT
 Database name                   = DB2DSPT
 Node name                       = AMDSPT
 Database release level          = 9.00
 Comment                         =
 Directory entry type            = Remote
 Authentication                  = DCS
 Catalog node number             = -1

-- Last registration step: the DCS.

catalog dcs database <DBname> as <DBalias>

example:

catalog dcs database ITFINDB2 as ITFINDB2

to unregister:

unregister dcs ITFINDB2

For the list:

db2 => list dcs directory

 Database Connection Services (DCS) Directory

 Number of entries in the directory = 3

DCS 1 entry:

 Local database name                = DB2DSPT
 Target database name               = DB2DSPT
 Application requestor name         =
 DCS parameters                     =
 Comment                            =
 DCS directory release level        = 0x0100

DCS 2 entry:

 Local database name                = DB2PROD
 Target database name               = DB2PROD
 Application requestor name         =
 DCS parameters                     =
 Comment                            =
 DCS directory release level        = 0x0100

DCS 3 entry:

 Local database name                = ITFINDB2
 Target database name               = ITFINDB2
 Application requestor name         =
 DCS parameters                     =
 Comment                            =
 DCS directory release level        = 0x0100


Now you can check if your configuration is correct:

db2 => connect to ITFINDB2 user sisbanc
Enter current password for sisbanc:

   Database Connection Information

 Database server        = DB2 OS/390 7.1.1
 SQL authorization ID   = SISBANC
 Local database alias   = ITFINDB2

This indicate a succesful connection.
An error or a command prompt without output indicates a failure.

ex:

db2 => connect to ITFINDB2 user sisbanc
Enter current password for sisbanc:

db2 => db2 => 




Note 9: license for DB2 Connect 8.x
----------------------------------- 

To license DB2 Connect 8.x, you typically use a statement like

/usr/opt/db2_08_01/adm/db2licm -a /prj/db2/install/udb/8.1/db2ese.lic


Note 10: DB2 Connect Configuration files:
-----------------------------------------

- db2nodes.cfg 

This topic provides information about the format of the node configuration file (db2nodes.cfg). 
The db2nodes.cfg file is used to define the database partition servers that participate in a DB2 instance. 
The db2nodes.cfg file is also used to specify the IP address or host name of a high-speed interconnect, 
if you want to use a high-speed interconnect for database partition server communication. 

The format of the db2nodes.cfg file 7 is as follows: 

nodenum    hostname    logical port   netname    resourcesetname 


Note 11: Most important Err messages:
-------------------------------------

1.

db2=> connect to <> user <> using <>

SQL1032N No start database manager command was issued.  SQLSTATE=57019

I keep getting the following error:
[IBM][CLI Driver] SL1032N No start database manager command was issued. SQLSTATE=57019

I tried starting the database, and I still get the above error.
Exacly how am I suppose to start the database, and how do I get rid of the above error?




56. Setting up an ASCII terminal on AIX:
========================================

The 3151 display can connect directly, or through a modem, to an AIX system.
The connection to the AIX system can be made to one of the native serial ports,
or to an asynchronous adapter. 
To add a TTY, use the following procedure:

- use "smitty tty" and select "Add a TTY" 
  or use "smitty maktty"

- or use mkdev

# mkdev -c tty -t tty -s rs232 -p sa0 -w s1 -a login=enable -a term=ibm3151
# mkdev -c tty -t tty -s rs232 -p sa0 -w s0 -a login=enable -a term=ibm3151

To validate that the tty has been added to the customized VPD object class, enter
# lscfg -vp | grep tty
tty0      01-S1-00-00       Asynchronous Terminal

To display the name of the systemconsole effective on the next startup, enter
# lscons -b
/dev/tty0


You can remove a terminal with
# rmdev -l tty_name -d

On the ASCII terminal, set the communications options as follows:
Line speed (baud rate) = 9600
Word Length (bits per character) = 8
Parity = no (none)
Number of Stop Bits = 1
Interface = RS-232C or RS-422A
Line Control = IPRTS


57: chroot:
===========

chroot

Run a command with a different root directory
'chroot' runs a command with a specified root directory. On many systems, only the super-user can do this. 


SYNTAX
     chroot NEWROOT [COMMAND [ARGS]...]

     chroot OPTION Ordinarily, filenames are looked up starting at the root of the directory structure, i.e. '/' 

'chroot' changes the root to the directory NEWROOT (which must exist) and then runs COMMAND with optional ARGS. 

If COMMAND is not specified, the default is the value of the `SHELL' environment variable or `/bin/sh' if not set, 
invoked with the `-i' option. 

The only options are `--help' and `--version' 

AIX:
----

chroot Command

Purpose
Changes the root directory of a command.

Syntax
chroot Directory Command

Description

Attention: If special files in the new root directory have different major and minor device numbers than the 
real root directory, it is possible to overwrite the file system.
The chroot command can be used only by a user operating with root user authority. 
If you have root user authority, the chroot command changes the root directory to the directory 
specified by the Directory parameter when performing the Command. The first / (slash) in any path name 
changes to Directory for the specified Command and any of its children.

The Directory path name is always relative to the current root. Even if the chroot command is in effect, 
the Directory path name is relative to the current root of the running process.

A majority of programs may not operate properly after the chroot command runs. For example, the commands 
that use the shared libraries are unsuccessful if the shared libraries are not in the new root file system. 
The most commonly used shared library is the /usr/ccs/lib/libc.a library.

The ls -l command is unsuccessful in giving user and group names if the current root location makes 
the /etc/passwd file beyond reach. In addition, utilities that depend on localized files (/usr/lib/nls/*) 
may also be unsuccessful if these files are not in the new root file system. It is your responsibility 
to ensure that all vital data files are present in the new root file system and that the path names 
accessing such files are changed as necessary.

Examples

Attention: The commands in the following examples may depend on shared libraries. Ensure that the shared 
libraries are in the new root file system before you run the chroot command.
To run the pwd command with the /usr/bin directory as the root file system, enter: 

# mkdir /usr/bin/lib
 
# cp /usr/ccs/lib/libc.a /usr/bin/lib
 
chroot /usr/bin pwd
To run a Korn shell subshell with another file system as the root file system, enter: 

# chroot /var/tmp /usr/bin/ksh

This makes the directory name / (slash) refer to the /var/tmp for the duration of the /usr/bin/ksh command. 
It also makes the original root file system inaccessible. The file system on the /var/tmp file must contain 
the standard directories of a root file system. In particular, the shell looks for commands in the 
/bin and /usr/bin files on the /var/tmp file system.

Running the /usr/bin/ksh command creates a subshell that runs as a separate process from your original shell.
 Press the END OF FILE (Ctrl-d) key sequence to end the subshell and go back to where you were 
in the original shell. This restores the environment of the original shell, including the meanings 
of the . (current directory) and the / (root directory).


58. The date command:
=====================

The date command can be very interesting to use on shell scripts, for example for testing purposes.
You can device a test like

daynumber=`date -u %d`
export daynumber

if daynumber=31 then
..
The following shows what can be done using date.

NAME
       date - print or set the system date and time

SYNOPSIS
       date [OPTION]... [+FORMAT]
       date [-u|--utc|--universal] [MMDDhhmm[[CC]YY][.ss]]

DESCRIPTION
       Display the current time in the given FORMAT, or set the system date.

       -d, --date=STRING
	      display time described by STRING, not `now'

       -f, --file=DATEFILE
	      like --date once for each line of DATEFILE

       -ITIMESPEC, --iso-8601[=TIMESPEC]
	      output  date/time	 in ISO 8601 format.  TIMESPEC=`date' for date
	      only, `hours', `minutes', or `seconds' for date and time to  the
	      indicated	 precision.   --iso-8601  without TIMESPEC defaults to
	      `date'.

       -r, --reference=FILE
	      display the last modification time of FILE

       -R, --rfc-822
	      output RFC-822 compliant date string

       -s, --set=STRING
	      set time described by STRING

       -u, --utc, --universal
	      print or set Coordinated Universal Time

       --help display this help and exit

       --version
	      output version information and exit

       FORMAT controls the output.  The only valid option for the second  form
       specifies Coordinated Universal Time.  Interpreted sequences are:

       %%     a literal %

       %a     locale's abbreviated weekday name (Sun..Sat)

       %A     locale's full weekday name, variable length (Sunday..Saturday)

       %b     locale's abbreviated month name (Jan..Dec)

       %B     locale's full month name, variable length (January..December)

       %c     locale's date and time (Sat Nov 04 12:02:33 EST 1989)

       %C     century  (year  divided  by  100	and  truncated	to an integer)
	      [00-99]

       %d     day of month (01..31)

       %D     date (mm/dd/yy)

       %e     day of month, blank padded ( 1..31)

       %F     same as %Y-%m-%d

       %g     the 2-digit year corresponding to the %V week number

       %G     the 4-digit year corresponding to the %V week number

       %h     same as %b

       %H     hour (00..23)

       %I     hour (01..12)

       %j     day of year (001..366)

       %k     hour ( 0..23)

       %l     hour ( 1..12)

       %m     month (01..12)

       %M     minute (00..59)

       %n     a newline

       %N     nanoseconds (000000000..999999999)

       %p     locale's upper case AM or PM indicator (blank in many locales)

       %P     locale's lower case am or pm indicator (blank in many locales)

       %r     time, 12-hour (hh:mm:ss [AP]M)

       %R     time, 24-hour (hh:mm)

       %s     seconds since `00:00:00 1970-01-01 UTC' (a GNU extension)

       %S     second (00..60); the 60 is necessary to accommodate a leap  sec-
	      ond

       %t     a horizontal tab

       %T     time, 24-hour (hh:mm:ss)

       %u     day of week (1..7);  1 represents Monday

       %U     week number of year with Sunday as first day of week (00..53)

       %V     week number of year with Monday as first day of week (01..53)

       %w     day of week (0..6);  0 represents Sunday

       %W     week number of year with Monday as first day of week (00..53)

       %x     locale's date representation (mm/dd/yy)

       %X     locale's time representation (%H:%M:%S)

       %y     last two digits of year (00..99)

       %Y     year (1970...)

       %z     RFC-822 style numeric timezone (-0500) (a nonstandard extension)

       %Z     time zone (e.g., EDT), or nothing if  no	time  zone  is	deter-
	      minable

       By  default, date pads numeric fields with zeroes.  GNU date recognizes
       the following modifiers between `%' and a numeric directive.

	      `-' (hyphen) do not pad the field `_' (underscore) pad the field
	      with spaces

ENVIRONMENT
       TZ     Specifies the timezone, unless overridden by command line param-
	      eters.  If neither is specified, the setting from /etc/localtime
	      is used.





DATE=$(date +%d"-"%B"-"%Y) 
ERRORDATE=$(date +%m%d0000%y) 

 


==================================
59. SOME NOTES ON LPARS ON POWER5:
==================================

This section is about pSeries and AIX only.

59.1 General architecture:
--------------------------

Before the POWER5 Architecture, you could only use lpars with dedicated cpu's, and disks dedicated to an lpar.
As from POWER5 you can use "Micro Partitioning" (assign cpu power in increments of 10% to lpars),
you can use "Dynamic LPAR" (reassign resouces to and from lpars without a reboot of lpars)
and every resource (SCSI, Netcards etc..) can be virtualized. But DLPAR was also available before Power5.

- "Virtual IO Server" (VIOS) must be installed on a partition (Nederlands: de beheer partitie) 
  to enable virtualization services.
  The other partitions can be AIX52, AIX53, Linux (Redhat, Suse) and i5/OS (AIX52 cannot use virtualized services,
  so you have to assign dedicated cpu's and disks to that partition).

  Also, if you do not have VIOS, you can only use the traditional lpars.
  VIOS provides the IO and ethernet resources to the other lpars.
  You cannot use VOIS as a usable operating system for applications. It is only used to provide
  virtual resources to other partitions. You must use the HMC or IVM to assign resources to lpars.

- You can use HMC to define partitions and administer partitions (e.g. start, shutdown an lpar)
  The HMC is a desktop connected with ethernet to the pSeries machine.

- You can use Integrated Virtualization Manager (IVM) on systems where an HMC is not used. 
  You can use the local IVM to create and administer lpars. This is a Webbased interface.
  If you want or need to use IVM, you need to install the VIOS on a nonpartitioned Server first.
  Then you can use a PC with a LAN connection to the Server, and use the browser interface.

- The Partion Load Manager (PLM) makes it possible to re-assign resources from lpars
  with lower needs (at a time) to lpars who needs higher number of resources (at a time).
  Policies can be defined on how to manage that.


HMC makes use of "partition profiles", in which you for example, can define for a lpar what the desired and
minimum and maximum resource values are. The IVM does not make use of profiles.
You can create a "system profile" that lists which partion profiles are to be used when the
Server is restarted.
Take notice of the fact that the HMC has the lpar configuration information in the form of saved profiles.

IVM does not have a commandline interface. You can telnet or ssh from your PC to the lpar for VOIS, and
use the "mkvt" command to create a vt to another lpar.

In order to use the PLM, you need to have a HMC connected to the managed Server, and you must have
an AIX 5.2 ML4 or 5.3 lpar or Server where PLM will be running.

You can create a Virtual Ethernet and VLAN's with VID's which enables lpars to communicate
with each other through this "internal" network.

Server Operating Systems can be placed in LPARS, like AIX 5.2, AIX 5.3, Linux and some others.
For AIX, only 5.3 can be a virtual client of virtualized resources.

Access to real storage devices is implemented through the Virtual SCSI services, a part of the VIOS.
Logical volumes that are created and exported on the Virtual I/O Server partition are shown at the
virtual storage client partition as a SCSI disk. 
The Virtual I/O Server supports logical mirroring and RAID. Logical Volumes created on RAID or JOBD
are bootable.

The VIOS and PLM is delivered on CD. 

To enable Power5 Partitioning, you must have obtained a key from IBM. But on the 570 and above,
this feature is per default implemented.

An AIX 5.2 lpar needs dedicated resources. AIX 5.3 can use all virtualization features.



59.2 Create an AIX logical partition and profile:
-------------------------------------------------

- logon to HMC
- Choose "Server and Partition"
- Choose "Server management"
- Choose your Server from list
- Rightclick on Partitions -> Click Create -> Click Logical Partition


59.3 Create a virtual ethernet adapter for AIX:
-----------------------------------------------

- logon to HMC
- Choose "Server and Partition"
- Choose "Server management"
- Choose your Server from list
- click on Partitions -> rightclick the partitionprofile of the
  partition who is about to use the virtual ethernet adapter 
  -> Select Dynamic Logical Partitions -> Virtual adapterrescouces -> Add/Remove
  -> Choose the tab Virtual I/O -> Choose Ethernet -> Create
  -> A dialog Properties will be displayed
  -> Fill in the slotnumber, Port Virtual LAN ID (PVID)


chown emcdmeu:emcdgeu

59.4 Installation of PLM:
-------------------------

Preparation:
============

1. Put the hostname of every lpar fullyqualified, like
lpar1.domain.com
lpar2.domain.com

2. If you do not use DNS, put in every hostfile of all lpars, 
   the hostname of the PLM Server, 
   the other hostnames of all other lpars,
   and the hostname of the HMC, like for example

172.16.0.30   lpar1.domain.com        lpar1
172.16.0.33   lpar2.domain.com        lpar2
172.16.0.100  plmserver1.domain.com   plmserver1
172.16.0.3    p5hmc1.domain.com       p5hmc1

3. Check whether Dynamic partitioning is possible for an lpar

# lssrc -a | grep rsct

If the deamon IBM.DRM is started, then an active RMC session is present on this lpar with the HMC.
RMC stands for Resource Monitoring and Control.

In order for DLPAR to work on an lpar, you need to see the following subsystems installed and active:

Subsystem	
ctrmc		Resource monitoring and control subsystem
IBM.CSMAgentRM	is for handshaking between the lpar and hmc		
IBM.ServiceRM		
IBM.DRM		is for executing the dlpar commands on the lpar 	
IBM.HostRM	is for obtaining OS information

On the HMC, you can check which lpars are ready for DLPAR with the following command:

# lspartition -dlpar


4. You need to have rsh and rcp access for all lpars.
If those are not enabled, the do the following:

- edit the .rhosts file on any lpar, and type in the lines

plmserver1 root
plmserver1.domain.com root

- chmod 4554 /usr/sbin/rshd
- chmod 4554 /usr/bin/rcp

- edit /etc/inetd.conf and make sure that this line is not commented out:
shell stream tcp6 nowait root /usr/sbin/rshd rshd

- Start the inetd deamon again with
refresh -s inetd

- Test the rsh access from the PLM Server with:
rsh root@lpar1 date
rsh root@lpar2 date

- Create the account "plmuser" on the PLM Server

- You need to have an ssh connection between the HMC and the PLM Server. 
Install Openssh on the PLM Server, and create a ssh user on the HMC.
To install Openssh on AIX, you need to have Openssl as well.
Create the ssh keys to make communication possible from HMC to PLM Server.

Installation:
=============


1. Place the PLM CD in the drive
2. smitty install_latest
3. The following filesets will be installed:
plm.license
plm.server.rte
plm.sysmgt.websm
plm.msg.en_US.server
plm.msg.en_US.websm


IOSCLI:
=======

The command line interface of the VIOS is called the IOSCLI.
The shell is a restricted shell, for example, you cannot change directories or change your PATH env. variable.

You can either work in the traditional mode or interactive mode.

- In traditional mode, you start a command with "ioscl", like in

# ioscli lsdev -virtual  (to list all virtual devices)

- In interactive mode, you use the aliases for the ioscli subcommands.
  That is, start the ioscli, and then just type the subcommand, like in

# ioscli
# lsdev -virtual

You cannot run external commands from interactive mode, like grep or sed.
First leave the interactive mode with "exit".

To escape from the limitations of ioscli, run "oem_setup_env" and you have access
to regular commands.



59.5 Important IOSCLI commands:
-------------------------------

Only on the VIOS:

lsmap command:
--------------

Displays the mapping between physical, logical, and virtual devices.

Syntax
lsmap { -vadapter ServerVirtualAdapter | -plc PhysicalLocationCode | -all }

lsmap [ -type BackingDeviceType | -net ]

lsmap [ -fmt Delimiter ] [ -field FieldNames ]

Description
The lsmap command displays the mapping between virtual host adapters and the physical devices they are backed to. 
Given a device name (ServerVirtualAdapter) or physical location code (PhysicalLocationCode) of a 
server virtual adapter, the device name of each connected virtual target device (child devices), 
its logical unit number, backing device(s) and the backing devices physical location code is displayed. 
If the -net flag is specified the supplied device must be a virtual server Ethernet adapter.

The -fmt flag divides the output by a user-specified delimiter/character (delimiter). The delimiter can be 
any non-white space character. This format is provided to facilitate scripting.

Examples:

- To list all virtual target devices and backing devices mapped to the server virtual SCSI adapter vhode2, type: 

# lsmap -vadapter vhost2 

The system displays a message similar to the following: 
 SVSA         Physloc                                     Client Partition ID
------------ -------------------------------------------- ------------------
vhost0       U9111.520.10004BA-V1-C2                      0x00000004

VTD                   vtscsi0
LUN                   0x8100000000000000
Backing device        vtd0-1
Physloc

VTD                   vtscsi1
LUN                   0x8200000000000000
Backing device        vtd0-2
Physloc

VTD                   vtscsi2
LUN                   0x8300000000000000
Backing device        hdisk2
Physloc               U787A.001.0397658-P1-T16-L5-L0


- To list the shared Ethernet adapter and backing device mapped to the virtual server Ethernet adapter ent4, type: 

# lsmap -vadapter ent4 -net

The system displays a message similar to the following: 
SVEA   Physloc
------ --------------------------------------------
ent4   P2-I1/E1

SEA                   ent5
Backing device        ent1
Physloc               P2-I4/E1


- To list the shared Ethernet adapter and backing device mapped to the virtual server Ethernet adapter ent5 
in script format separated by a : (colon), type: 

# lsmap -vadapter ent5 -fmt ":"

The system displays a message similar to the following: 
ent5:ent8:ent2


- To list all virtual target devices and backing devices, where the backing devices are of type disk or lv, type: 

# lsmap -all -type disk lv

The system displays a message similar to the following: 
SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0          U9117.570.10D1B0E-V4-C3                      0x00000000

VTD                   vtscsi0
LUN                   0x8100000000000000
Backing device        hdisk0
Physloc               U7879.001.DQD0KN7-P1-T12-L3-L0

VTD                   vtscsi2
LUN                   0x8200000000000000
Backing device        lv04
Physloc                

SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost1          U9117.570.10D1B0E-V4-C4                      0x00000000

VTD                   vtscsi1
LUN                   0x8100000000000000
Backing device        lv03
Physloc 



mkvdev command:
---------------

Purpose
Adds a virtual device to the system.

Syntax
To create a virtual target device:

mkvdev [ -f ] {-vdev TargetDevice | -dplc TDPhysicalLocatonCode } { -vadapter VirtualServerAdapter | 
              -aplc VSAPhysicalLocationCode} [ -dev DeviceName ]

- To create a Shared Ethernet Adapter:

# mkvdev -sea TargetDevice -vadapter VirtualEthernetAdapter... -default DefaultVirtualEthernetAdapter 
       -defaultid SEADefaultPVID [ -attr Attribute=Value [ Attribute=Value... ] ]

- To create an Link Aggregation adapter:

# mkvdev -lnagg TargetAdapter... [ -attr Attribute=Value [ Attribute=Value... ] ]

- To create a VLAN Ethernet adapter:

# mkvdev -vlan TargetAdapter -tagid TagID

Description
The mkvdev command creates a virtual device. The name of the virtual device will be automatically generated 
and assigned unless the -dev DeviceName flag is specified, in which case DeviceName will become 
the device name. If the -lnagg flag is specified, a Link Aggregation or IEEE 802.3 Link Aggregation 
(automatic Link Aggregation) device is created. To create an IEEE 802.3 Link Aggregation set the mode attribute 
to 8023ad. If the -sea flag is specified, a Shared Ethernet Adapter is created. The TargetDevice may be a 
Link Aggregation adapter (note, however, that the VirtualEthernetAdapter may not be Link Aggregation adapters). 
The default virtual Ethernet adapter, DefaultVirtualEthernetAapter, must also be included as one of the 
virtual Ethernet adapters, VirtualEthernetAdapter. The -vlan flag is used to create a VLAN device and 
the -vdev flag creates a virtual target device which maps the VirtualServerAdapter to the TargetDevice.

If the backing device that is specified by the -vdev or -dplc flags is already in use, an error will be 
returned unless the -f flag is also specified.


Examples:
---------

Example 1:
----------

Suppose you have VIOS running, and you want to create three AIX53 client lpars, LPS1, LPS2 and LPS3.
Suppose from VIOS, you have created a number of virtual scsi controllers:

- Listing virtuele scsi controllers.

# lsdev -virtual

You will see a listing of virtual scsi controllers: vhost0, vhost1, en vhost2

- From VIOS, create Volume Groups.

Suppose hdisk2, hdisk3, and hdisk4 are not yet assigned, and thus are free to create VG's.

# mkvg -f -vg rootvg_lpar1  hdisk2
# mkvg -f -vg rootvg_lpar2  hdisk3
# mkvg -f -vg rootvg_lpar3  hdisk4


- Now create LV's.

# mklv -lv rootvg_lps1 rootvg_lpar1 15G
# mklv -lv rootvg_lps2 rootvg_lpar2 15G
# mklv -lv rootvg_lps3 rootvg_lpar3 15G

   Note: this could also be have done:
   # mklv -lv rootvg_lps1 rootvg_lpar1 15G
   # mklv -lv rootvg_lps2 rootvg_lpar1 15G
   # mklv -lv rootvg_lps3 rootvg_lpar1 15G


The lv's rootvg_lps1, rootvg_lps2, and rootvg_lps3 will become the rootvg's for the AIX53 client partitions.

- Create mappings.

# mkvdev -vdev rootvg_lps1 -vadapter vhost0
# mkvdev -vdev rootvg_lps2 -vadapter vhost1
# mkvdev -vdev rootvg_lps3 -vadapter vhost2


vhostx = LV \
vhosty = LV -> VG {disk(s)}
vhostz = LV /


More examples:
--------------

- From a AIX 5.3 client partition run the lsdev command, like

# lsdev -Cc disk -s vscsi
hdisk2 Available Virtual SCSI Disk Drive

# lscfg -vpl hdisk2
hdisk2 11.520.10DDEDC-V3-C5-T1-L810000000 Virtual SCSI Disk Drive

root@zd110l06:/root#lscfg -vpl hdisk2
  hdisk2           U9117.570.65B61FE-V6-C7-T1-L810000000000  Virtual SCSI Disk Drive

  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block


- To create the mapping of a virtual scsi adapter vhost0, to a logical volume (rootvg_nim) that an AIX partition
will use later as a disk, use

# mkvdev -vdev rootvg_nim -vadapter vhost0 -dev vnim

- To create a virtual target device that maps the logical volume lv20 as a virtual disk for a client partition 
hosted by the vhost0 virtual server adapter, type: 

# mkvdev -vdev lv20 -vadapter vhost0

The system displays a message similar to the following: 
vtscsi0 available

- To create a virtual target device that maps the physical volume hdisk6 as a virtual disk for a client partition 
served by the vhost2 virtual server adapter, type: 

# mkvdev -vdev hdisk6 -vadapter vhost2

The system displays a message similar to the following: 
vtscsi1 available

- To create a Shared Ethernet Adapter that maps the physical Ethernet adapter "ent4" as a virtual Ethernet adapter 
for the client partitions served by the virtual Ethernet adapters ent6, ent7, and ent9, using ent6 as the 
default adapter and 8 as the default ID, type: 

# mkvdev -sea ent4 -vadapter ent6,ent7,ent9 -default ent6 -defaultid 8

The system displays a message similar to the following: 
ent10 available           (which is the sea)

	Remember how to create a SEA on the VIOS:

	- To create a Shared Ethernet Adapter:

	# mkvdev -sea PhysTargetDevice -vadapter VirtualEthernetAdapter... -default DefaultVirtualEthernetAdapter 
       		-defaultid SEADefaultPVID [ -attr Attribute=Value [ Attribute=Value... ] ]


- To create an automatic Link Aggregation with primary adapters ent4 and ent5 and backup adapter ent6, type: 

# mkvdev -lnagg ent4,ent5 -attr backup_adapter=ent6 mode=6023ad

The system displays a message similar to the following: 
ent10 available





lsdev command (on VIOS):
------------------------

The lsdev command on a VIO Server has a bit of a different syntax compared to a regular AIX partition.
Commands like "lsdev -Cc tape" does not work on VIO.
Instead, you have a limited number of parameters you can give to the lsdev command.


Usage: lsdev [-type DeviceType ...] [-virtual] [-state DeviceState]
             [-field FieldName ...] [-fmt delimiter]
       lsdev {-dev DeviceName | -plc PhysicalLocationCode} [-child]
             [-field FieldName ...] [-fmt delimiter]
       lsdev {-dev DeviceName | -plc PhysicalLocationCode} [-parent |
             -attr [Attribute] | -range Attribute | -slot | -vpd]
       lsdev -slots
       lsdev -vpd

So normally you will use the following on a VIO Server:

lsdev -dev [options]  
                        like "lsdev -dev" 
                             "lsdev -dev <device>" 
                             "lsdev -dev <device> -vpd"
lsdev -slots 
lsdev -vpd
lsdev -virtual



>> Examples of Usage of lsdev on VIOS:


# tn vioserver1
Trying...
Connected to vioserver1.
Escape character is '^T'.

telnet (vioserver1)

IBM Virtual I/O Server

login: padmin
padmin's Password:
Last unsuccessful login: Mon Sep 24 04:25:04 CDT 2007 on /dev/vty0
Last login: Wed Nov 21 05:10:29 CST 2007 on /dev/pts/0 from starboss.antapex.org


Suppose you have logged on as padmin on a VIO server. Now you try the following commands
to retrieve information of the system:


$ lsdev -dev fcs*

name            status                                            description
fcs0            Available  FC Adapter
fcs1            Available  FC Adapter
fcs2            Available  FC Adapter
fcs3            Available  FC Adapter

$  lsdev -dev fcs0

name            status                                            description
fcs0            Available  FC Adapter


$ lsdev -dev fcs* -vpd|grep Z8
        Device Specific.(Z8)........20000000C95CDDEE
        Device Specific.(Z8)........20000000C95C88F1
        Device Specific.(Z8)........20000000C95AB49A
        Device Specific.(Z8)........20000000C95CDBFD



$ lsdev -dev fcs0 -vpd

  fcs0             U7879.001.DQDTZXG-P1-C6-T1  FC Adapter

        Part Number.................03N7069
        EC Level....................A
        Serial Number...............1B64505069
        Manufacturer................001B
        Feature Code/Marketing ID...280B
        FRU Number.................. 03N7069
        Device Specific.(ZM)........3
        Network Address.............10000000C95CDBFD
        ROS Level and ID............02881955
        Device Specific.(Z0)........1001206D
        Device Specific.(Z1)........00000000
        Device Specific.(Z2)........00000000
        Device Specific.(Z3)........03000909
        Device Specific.(Z4)........FF801413
        Device Specific.(Z5)........02881955
        Device Specific.(Z6)........06831955
        Device Specific.(Z7)........07831955
        Device Specific.(Z8)........20000000C95CDBFD
        Device Specific.(Z9)........TS1.91A5
        Device Specific.(ZA)........T1D1.91A5
        Device Specific.(ZB)........T2D1.91A5
        Device Specific.(YL)........U7879.001.DQDTZXG-P1-C6-T1

  PLATFORM SPECIFIC

  Name:  fibre-channel
    Model:  LP10000
    Node:  fibre-channel@1
    Device Type:  fcp
    Physical Location: U7879.001.DQDTZXG-P1-C6-T1



$ lsdev -slots

# Slot                      Description       Device(s)
U7311.D11.655157B-P1-C4     Logical I/O Slot  pci12 ent0 ent1
U7311.D11.655157B-P1-C5     Logical I/O Slot  pci13 fcs2
U7311.D11.655158B-P1-C6     Logical I/O Slot  pci14 fcs3
U7311.D20.655159B-P1-C04    Logical I/O Slot  pci9 sisscsia0
U7879.001.DQDTPAK-P1-C5     Logical I/O Slot  pci10 fcs1
U7879.001.DQDTZXG-P1-C6     Logical I/O Slot  pci8 fcs0
U7879.001.DQDTPAK-P1-T12    Logical I/O Slot  pci11 sisscsia1
U9117.570.65B61FE-V17-C0    Virtual I/O Slot  vsa0
U9117.570.65B61FE-V17-C11   Virtual I/O Slot  vhost0
U9117.570.65B61FE-V17-C12   Virtual I/O Slot  vhost1
..
U9117.570.65B61FE-V17-C324  Virtual I/O Slot  vhost33


$ lsdev -type disk

name            status                                            description
hdisk0          Available  16 Bit LVD SCSI Disk Drive
hdisk1          Available  16 Bit LVD SCSI Disk Drive
..
hdisk10         Available  16 Bit LVD SCSI Disk Drive
hdisk11         Available  SAN Volume Controller MPIO Device
hdisk12         Available  SAN Volume Controller MPIO Device
..
hdisk35         Available  SAN Volume Controller MPIO Device
vg01sanl02      Available  Virtual Target Device - Disk
vg01sanl03      Available  Virtual Target Device - Disk
..
vg04sanl14      Available  Virtual Target Device - Disk
vg05sanl14      Available  Virtual Target Device - Disk
vzd110l01       Available  Virtual Target Device - Logical Volume
vzd110l02       Available  Virtual Target Device - Logical Volume
..
vzd110l14       Available  Virtual Target Device - Logical Volume


$ lsdev -virtual

name            status                                            description
vhost0          Available  Virtual SCSI Server Adapter
vhost1          Available  Virtual SCSI Server Adapter
vhost2          Available  Virtual SCSI Server Adapter
..
vhost33         Available  Virtual SCSI Server Adapter
vsa0            Available  LPAR Virtual Serial Adapter
vg01sanl02      Available  Virtual Target Device - Disk
vg01sanl03      Available  Virtual Target Device - Disk
..
vg05sanl14      Available  Virtual Target Device - Disk
..
vzd110l01       Available  Virtual Target Device - Logical Volume
vzd110l14       Available  Virtual Target Device - Logical Volume


$ lsdev -vpd          # gives a huge list of output

INSTALLED RESOURCE LIST WITH VPD

The following resources are installed on your machine.

  Model Architecture: chrp
  Model Implementation: Multiple Processor, PCI bus

  sys0                                                                           System Object
  sysplanar0                                                                     System Planar
  vio0                                                                           Virtual I/O Bus
  vhost33          U9117.570.65B61FE-V17-C324                                    Virtual SCSI Server Adapter

        Device Specific.(YL)........U9117.570.65B61FE-V17-C324

  vg05sanl14       U9117.570.65B61FE-V17-C324-L2                                 Virtual Target Device - Disk
  vg03sanl14       U9117.570.65B61FE-V17-C324-L1                                 Virtual Target Device - Disk
  vhost32          U9117.570.65B61FE-V17-C323                                    Virtual SCSI Server Adapter

        Device Specific.(YL)........U9117.570.65B61FE-V17-C323

  vg04sanl14       U9117.570.65B61FE-V17-C323-L2                                 Virtual Target Device - Disk
  vg03sanl13       U9117.570.65B61FE-V17-C323-L1                                 Virtual Target Device - Disk
  vhost31          U9117.570.65B61FE-V17-C224                                    Virtual SCSI Server Adapter

        Device Specific.(YL)........U9117.570.65B61FE-V17-C224

  vg02sanl14       U9117.570.65B61FE-V17-C224-L1                                 Virtual Target Device - Disk
  vhost30          U9117.570.65B61FE-V17-C223                                    Virtual SCSI Server Adapter

        Device Specific.(YL)........U9117.570.65B61FE-V17-C223

..
..
  vg01sanl05       U9117.570.65B61FE-V17-C115-L1                                 Virtual Target Device - Disk
  vhost15          U9117.570.65B61FE-V17-C113                                    Virtual SCSI Server Adapter

        Device Specific.(YL)........U9117.570.65B61FE-V17-C113
  vg04sanl03       U9117.570.65B61FE-V17-C113-L3                                 Virtual Target Device - Disk
  vg03sanl03       U9117.570.65B61FE-V17-C113-L2                                 Virtual Target Device - Disk
  vg01sanl03       U9117.570.65B61FE-V17-C113-L1                                 Virtual Target Device - Disk
  vhost14          U9117.570.65B61FE-V17-C112                                    Virtual SCSI Server Adapter

        Device Specific.(YL)........U9117.570.65B61FE-V17-C112
..
..
        Device Specific.(YL)........U9117.570.65B61FE-V17-C0

  vty0             U9117.570.65B61FE-V17-C0-L0                                   Asynchronous Terminal
  pci6             U7311.D11.655158B-P1                                          PCI Bus

        Device Specific.(YL)........U7311.D11.655158B-P1

  pci14            U7311.D11.655158B-P1                                          PCI Bus

        Device Specific.(YL)........U7311.D11.655158B-P1

  fcs3             U7311.D11.655158B-P1-C6-T1                                    FC Adapter

        Part Number.................03N7069
        EC Level....................A
        Serial Number...............1B64504CA3
        Manufacturer................001B
        Feature Code/Marketing ID...280B
        FRU Number.................. 03N7069
        Device Specific.(ZM)........3
        Network Address.............10000000C95CDDEE
        ROS Level and ID............02881955
        Device Specific.(Z0)........1001206D
        Device Specific.(Z1)........00000000
        Device Specific.(Z2)........00000000
        Device Specific.(Z3)........03000909
        Device Specific.(Z4)........FF801413
        Device Specific.(Z5)........02881955
        Device Specific.(Z6)........06831955
        Device Specific.(Z7)........07831955
        Device Specific.(Z8)........20000000C95CDDEE
        Device Specific.(Z9)........TS1.91A5
        Device Specific.(ZA)........T1D1.91A5
        Device Specific.(ZB)........T2D1.91A5
        Device Specific.(YL)........U7311.D11.655158B-P1-C6-T1

  fcnet3           U7311.D11.655158B-P1-C6-T1                                    Fibre Channel Network Protocol Device
  fscsi3           U7311.D11.655158B-P1-C6-T1                                    FC SCSI I/O Controller Protocol Device
  pci5             U7311.D11.655157B-P1                                          PCI Bus

        Device Specific.(YL)........U7311.D11.655157B-P1




Other Example:
==============

See the differences between 1 and 2:


1. LPAR using only storage via VIO

root@zd110l06:/root#lspv
hdisk0          00cb61fe223c3926                    rootvg          active
hdisk1          00cb61fe2360b1b7                    rootvg          active
hdisk2          00cb61fe3339af9f                    appsvg          active
hdisk3          00cb61fe3339b066                    datavg          active

root@zd110l06:/root#lsdev -Cc disk -s vscsi
hdisk0 Available  Virtual SCSI Disk Drive
hdisk1 Available  Virtual SCSI Disk Drive
hdisk2 Available  Virtual SCSI Disk Drive
hdisk3 Available  Virtual SCSI Disk Drive

root@zd110l06:/root#lsdev -Cc disk
hdisk0 Available  Virtual SCSI Disk Drive
hdisk1 Available  Virtual SCSI Disk Drive
hdisk2 Available  Virtual SCSI Disk Drive
hdisk3 Available  Virtual SCSI Disk Drive

root@zd110l06:/root#lsdev -Cc adapter
ent0   Available       Virtual I/O Ethernet Adapter (l-lan)
ent1   Available 02-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent2   Available 02-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent3   Available 03-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent4   Available 03-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
vsa0   Available       LPAR Virtual Serial Adapter
vscsi0 Available       Virtual SCSI Client Adapter
vscsi1 Available       Virtual SCSI Client Adapter
vscsi2 Available       Virtual SCSI Client Adapter
vscsi3 Available       Virtual SCSI Client Adapter
vscsi4 Available       Virtual SCSI Client Adapter
vscsi5 Available       Virtual SCSI Client Adapter



2. LPAR using storage via VIO and dedicated FC cards

root@zd110l01.nl.eu.abnamro.com:/root#lspv
hdisk0          00cb61fe09fe92bd                    rootvg          active
hdisk1          00cb61fe0a47a802                    rootvg          active
hdisk2          00cb61fe336bc95b                    appsvg          active
hdisk3          00cb61fe321664d1                    datavg          active

root@zd110l01.nl.eu.abnamro.com:/root#lsdev -Cc disk -s vscsi
hdisk0 Available  Virtual SCSI Disk Drive
hdisk1 Available  Virtual SCSI Disk Drive

root@zd110l01.nl.eu.abnamro.com:/root#lsdev -Cc disk
hdisk0 Available          Virtual SCSI Disk Drive
hdisk1 Available          Virtual SCSI Disk Drive
hdisk2 Available 02-08-02 SAN Volume Controller MPIO Device
hdisk3 Available 02-08-02 SAN Volume Controller MPIO Device

root@zd110l01.nl.eu.abnamro.com:/root#lsdev -Cc adapter
ent0   Available       Virtual I/O Ethernet Adapter (l-lan)
fcs0   Available 02-08 FC Adapter
fcs1   Available 03-08 FC Adapter
vsa0   Available       LPAR Virtual Serial Adapter
vscsi0 Available       Virtual SCSI Client Adapter
vscsi1 Available       Virtual SCSI Client Adapter






>> More on lsdev on a VIOS:


Purpose

Displays Virtual I/O Server devices and their characteristics.

Syntax
To list devices

lsdev [ -type DeviceType... ] [ -virtual ] [ -field FieldName... ] [ -fmt Delimiter ] [-state State ]

To display information about a specific device:

lsdev { -dev DeviceName | -plc PhysicalLocationCode } [ -child ] [ -field FieldName... ] [ -fmt Delimiter ]

lsdev { -dev DeviceName | -plc PhysicalLocationCode } [ -attr [ Attribute ] | -range Attribute | -slot | -vpd | -parent]

lsdev -vpd

lsdev -slots

Description:

The lsdev command displays information about devices in the Virtual I/O Server. If no flags are specified, 
a list of all devices, both physical and virtual, in the Virtual I/O Server is displayed. 
To list devices, both physical and virtual, of a specific type use the -type DeviceType flag. 
Use the -virtual flag to list only virtual devices. Combining both the -type and -virtual flags 
will list the virtual devices of the specified type.

To display information about a specific device, use the -dev DeviceName or -plc PhysicalLocationCode. 
Use either the -child, -parent, -attr, -range, -slot, or -vpd flag to specify what type of information 
is displayed. If none of these flags are used, the name, status, and description of the device will be displayed.

Using the -vpd flag, without specifying a device, displays platform-specific information for all devices.


Examples

- To list all virtual adapters and display the name and status fields, type: 

# lsdev -type adapter -virtual -field name status

The system displays a message similar to the following: 
name  status

vhost0  Available
vhost1  Available
vhost2  Available
ent6    Available
ent7    Available
ent8    Available
ent9    Available

- To list all devices of type disk and display the name and physical location fields, type: 

# lsdev -type disk -field name physloc

The system displays a message similar to the following: 
name    physloc

hdisk0 U9111.520.10004BA-T15-L5-L0
hdisk1 U9111.520.10004BA-T15-L8-L0
hdisk2 U9111.520.10004BA-T16-L5-L0
hdisk3 U9111.520.10004BA-T16-L8-L0
hdisk4 UTMP0.02E.00004BA-P1-C4-T1-L8-L0
hdisk5 UTMP0.02E.00004BA-P1-C4-T2-L8-L0
hdisk6 UTMP0.02F.00004BA-P1-C8-T2-L8-L0
hdisk7 UTMP0.02F.00004BA-P1-C4-T2-L8-L0
hdisk8 UTMP0.02F.00004BA-P1-C4-T2-L11-L0
vtscsi0 U9111.520.10004BA-V1-C2-L1
vtscsi1 U9111.520.10004BA-V1-C3-L1
vtscsi2 U9111.520.10004BA-V1-C3-L2
vtscsi3 U9111.520.10004BA-V1-C4-L1
vtscsi4 U9111.520.10004BA-V1-C4-L2
vtscsi5 U9111.520.10004BA-V1-C5-L1


- To display the parent of a devices, type: 

# lsdev -dev hdisk0 -parent

The system displays a message similar to the following: 
parent

scsi0

- To display all I/O slots that are not hot-pluggable but can have DLPAR operations performed on them, type: 

# lsdev -slots

The system displays a message similar to the following: 
U787A.001.DNZ00Y1-P1-C1  Logical I/O Slot  pci4 sisscsia0   
U787A.001.DNZ00Y1-P1-T5  Logical I/O Slot  pci3 ent0 ent1   
U787A.001.DNZ00Y1-P1-T7  Logical I/O Slot  pci2 usbhc0 usbhc1   
U9111.520.10DFD8C-V2-C0  Virtual I/O Slot  vsa0   
U9111.520.10DFD8C-V2-C2  Virtual I/O Slot  vhost0   
U9111.520.10DFD8C-V2-C4  Virtual I/O Slot  Unknown


- The client partition accesses its assigned disks through a virtual SCSI client adapter.
The virtual scsi client adapter sees standard scsi devices and LUNs through this virtual adapter.
The commands in the following example show how the disks appear on a AIX 53 partition:

# lsdev -Cc disk -s vscsi
hdisk2 Available Virtual SCSI Disk Drive

# lscfg -vpl hdisk2
hdisk2 111.530.10DDEDC-V3-C5-T1  Virtual SCSI Disk Drive

- To configure an optical device as a virtual SCSI device is the same as configuring
a disk or logical volume into a vscsi device.
Using either a new or previously defined vhost adapter with the client partition, 
run the following command:

# mkvdev -vdev cd0 -vadapter vhost0
vtopt0  Available Virtual Target Device - Optical Media

On the client partition, run the cfmgr command and a cd0 device will be configured for use.
Mounting the CD device is now possible, as is using the mkdvd command.





rmvdev command:
---------------

Purpose
To remove the connection between a physical device and its associated virtual SCSI adapter.

Syntax
rmvdev [ -f ] { -vdev TargetDevice | -vtd VirtualTargetDevice } [-rmlv]

Description
The rmdev command removes the connection between a physical device and its associated virtual SCSI adapter. 
The connection can be identified by specifying the backing (physical) device or the virtual target device.
If the connection is specified by the device name and there are multiple connections between the 
physical device and virtual SCSI adapters and error is returned unless the -f flag is also specified.
If -f is included then all connections associated with the physical device are removed.

If the backing (physical) device is a logical volume and the -rmlv flag is specified, 
then logical volume will be removed as well.

Example:

# rmvdev -dev vhost0 -recursive

Example:

how to remove a dynamically allocated i/o slot in a DLPAR in IBM AIX 
Description

To remove a dynamically allocated I/O slot (must be a desired component) from a partition on a P-series 
IBM server partition:

1) Find the slot you wish to remove from the partition:

# lsslot -c slot
# Slot Description Device(s)
U1.5-P2/Z2 Logical I/O Slot pci15 scsi2 
U1.9-P1-I8 Logical I/O Slot pci13 ent0 
U1.9-P1-I10 Logical I/O Slot pci14 scsi0 scsi1 

In our case, it is pci14.

2) Delete the PCI adapter and all of its children in AIX before removal:

# rmdev -l pci14 -d -R
cd0 deleted
rmt0 deleted
scsi0 deleted
scsi1 deleted
pci14 deleted

3) Now, you can remove the PCI I/O slot device using the HMC:

a) Log in to the HMC

b) Select "Server and Partition", and then "Server Management"

c) Select the appropriate server and then the appropriate partition

d) Right click on the partition name, and then on "Dynamic Logical Partitioning"

e) In the menu, select "Adapters"

f) In the newly created popup, select the task "Remove resource from this partition"

g) Select the appropriate adapter from the list (only desired one will appear)

h) Select the "OK" button

i) You should have a popup window which tells you if it was successful. 


Example

lsslot -c slot; rmdev -l pci14 -d -R 


mkdvd command:
-------------- 

Examples of the mkdvd command:

To generate a bootable system backup to the DVD-R device named /dev/cd1, enter: 

# mkdvd -d /dev/cd1

To generate a system backup to the DVD-R or DVD-RAM device named /dev/cd1, enter: 

# mkdvd -d /dev/cd1

To generate a non-bootable volume group backup of the volume group myvg to /dev/cd1, enter: 

# mkdvd -d /dev/cd1 -v myvg

Note:
All savevg backup images are non-bootable.
To generate a non-bootable system backup, but stop mkdvd before the DVD is created and save 
the final images to the /mydata/my_cd file system, and create the other mkdvd file systems in myvg, enter: 

# mkdvd -B -I /mydata/my_cd -V myvg -S
To create a DVD or DVD that duplicates an existing directory structure 
/mycd/a
/mycd/b/d
/mycd/c/f/g
use the following command:

# mkdvd -r /mycd -d /dev/cd1
After mounting with mount -o ro /dev/cd1 /mnt, cd to /mnt; a find . -print command displays:

./a
./b
./b/d
./c
./c/f
./c/f/g


lparstat command:
-----------------

From the AIX prompt in a lpar, you can enter the lparstat -i command to get a list
of names and resources like, for example, if the partition is capped or uncapped etc..

# lparstat -i


cfgdev command:
---------------

On the VIOS partition, run the "cfgdev" command to rebuild the list of visible devices.
This is neccessary after you have created the partition and have added virtual controllers.

The virtual SCSI server adapters are now available to the VIOS. 
The name of these adapters are vhostx where x is a number assigned by the system.

Use the following command to make sure your adapters are available:

$ lsdev -virtual
name		status		description
ent2		Available	Virtual Ethernet Adapter
vhost0		Available	Virtual SCSI Server Adapter
vhost1		Available	Virtual SCSI Server Adapter
vhost2		Available	Virtual SCSI Server Adapter
vhost3		Available	Virtual SCSI Server Adapter
vsa0		Available	LPAR Virtual Serial Adapter


lspath command:
---------------

lspath Command
Purpose
Displays information about paths to a MultiPath I/O (MPIO) capable device.

Syntax
lspath [ -dev DeviceName ] [ -pdev Parent ] [ -status Status ] [ -conn Connection ] [ -field FieldName ] 
       [ -fmt Delimiter ]

lspath -dev DeviceName -pdev Parent [ -conn Connection ] -lsattr [ -attr Attribute... ]

lspath -dev DeviceName -pdev Parent [ -conn Connection ] -range -attr Attribute

Description
The lspath command displays one of three types of information about paths to an MPIO capable device. It either 
displays the operational status for one or more paths to a single device, or it displays one or more attributes 
for a single path to a single MPIO capable device. The first syntax shown above displays the operational status 
for one or more paths to a particular MPIO capable device. The second syntax displays one or more attributes 
for a single path to a particular MPIO capable device. Finally, the third syntax displays the possible range 
of values for an attribute for a single path to a particular MPIO capable device.

Displaying Path Status with the lspath Command
When displaying path status, the set of paths to display is obtained by searching the device configuration database 
for paths that match the following criteria:

The target device name matches the device specified with the -dev flag. If the -dev flag is not present, then the 
target device is not used in the criteria. 
The parent device name matches the device specified with the -pdev flag. If the -pdev flag is not present, then 
parent is not used in the criteria. 
The connection matches the connection specified with the -conn flag. If the -conn flag is not present, then 
connection is not used in the criteria. 
The path status matches status specified with the -status flag. If the -status flag is not present, the path 
status is not used in the criteria.
If none of the -dev, -pdev, -conn, or -status flags are specified, then all paths known to the system are displayed.

By default, this command will display the information in columnar form. When no flags are specified that qualify 
the paths to display, the format of the output is:

status device  parent
Possible values that can appear for the status column are:

-enabled 
Indicates that the path is configured and operational. It will be considered when paths are selected for IO. 
-disabled 
Indicates that the path is configured, but not currently operational. It has been manually disabled and will 
not be considered when paths are selected for IO. 
-failed 
Indicates that the path is configured, but it has had IO failures that have rendered it unusable. It will not be considered when paths are selected for IO. 
-defined 
Indicates that the path has not been configured into the device driver. 
-missing 
Indicates that the path was defined in a previous boot, but it was not detected in the most recent boot of the system. 
-detected 
Indicates that the path was detected in the most recent boot of the system, but for some reason it was not configured. A path should only have this status during boot and so this status should never appear as a result of the lspath command. 

Displaying Path Attributes with the lspath Command
When displaying attributes for a path, the path must be fully qualified. Multiple attributes for a path can be displayed, but attributes belonging to multiple paths cannot be displayed in a single invocation of the lspath command. Therefore, in addition to the -lsattr, -dev, and -pdev flags, the -conn flags are required to uniquely identify a single path. For example:

if only one path between a device and a specific parent, the -conn flag is not required 
if there are multiple paths between a device and a specific parent, the -conn flag is required
Furthermore, the -status flag is not allowed.

By default, this command will display the information in columnar form.

attribute   value    description         user_settableFlags
-attr Attribute Identifies the specific attribute to list. The 'Attribute' is the name of a path specific attribute. 
 When this flag is provided, only the identified attribute is displayed. Multiple instances of this flag may be 
 used to list multiple attributes. If this flag is not specified at all, all attributes associated with the 
 identified path will be listed. 
-lsattr Displays the attribute names, current values, descriptions, and user-settable flag values for a specific path. 
-dev Name Specifies the logical device name of the target device whose path information is to be displayed. 
-field FieldNames Specifies the list of fields to display. The following fields are supported: 
status 
Status of the path 
name 
Name of the device 
parent 
Name of the parent device 
conn 
Path connection.  
-fmt Delimiter Specifies a delimiter character to separate output fields. 
-pdev Parent Indicates the logical device name of the parent device of the path(s) to be displayed. 
-range Displays the legal values for an attribute name. The -range flag displays the list attribute values in a vertical column as follows: 
Value1
Value2
.
.
ValueN
The -range flag displays the range attribute values as x...n(+i) where x is the start of the range, n is the end of the range, and i is the increment. 
-status Status The -status Status flag indicates the status to use in qualifying the paths to be displayed. When displaying path information, the allowable values for this flag are: 
enabled 
Display paths that are enabled for MPIO path selection. 
disabled 
Display paths that are disabled from MPIO path selection. 
failed 
Display paths that are failed due to IO errors. 
available 
Display paths whose path_status is PATH_AVAILABLE (that is, paths that are configured in the system, includes enabled, disabled, and failed paths). 
defined 
Display paths whose path_status is PATH_DEFINED. 
missing 
Display paths whose path_status is PATH_MISSING.  
-conn Connection Indicates the connection information to use in qualifying the paths to be displayed. 

Exit Status
Return code Description 
1 Invalid status value. 

Examples:

To display, without column headers, the set of paths whose operational status is disabled, enter: 

# lspath -status disabled

The system will display a message similar to the following: 

disabled  hdisk1   scsi1 
disabled  hdisk2   scsi1 
disabled  hdisk23  scsi8 
disabled  hdisk25  scsi8

To display the set of paths whose operational status is failed, enter: 

# lspath -status failed

The system will display a message similar to the following: 
failed  hdisk1   scsi1 
failed  hdisk2   scsi1 
failed  hdisk23  scsi8 
failed  hdisk25  scsi8

If the target device is a SCSI disk, to display all attributes for the path to parent scsi0 at connection 5,0, 
use the command: 

# lspath -dev hdisk10 -pdev scsi0 -conn "5,0" -lsattr

The system will display a message similar to the following: 

weight     1      Order of path failover selection  true


To display the status of all paths to hdisk1 with column headers and I/O counts, type: 

# lspath -l hdisk1 -H

The system displays a message similar to the following: 

STATUS          PARENT          CONNECTION
enabled   (4)   scsi0           5,0
disabled  (0)   scsi1           5,0
missing         scsi2           5,0


To display without column headers, the set of paths whose operational status is disabled, type: 

# lspath -s disabled

The system displays a message similar to the following: 

hdisk1        scsi1        5, 0
hdisk2        scsi1        6, 0
hdisk23       scsi8        3, 0
hdisk25       scsi8        4, 0


chpath command:
---------------

chpath Command
Purpose
Changes the operational status of paths to an MultiPath I/O (MPIO) capable device, or changes an attribute 
associated with a path to an MPIO capable device.

Syntax
chpath -l Name -s OpStatus [ -p Parent ] [ -w Connection ]

chpath -l Name -p Parent [ -w Connection ] [ -P ] -a Attribute=Value [ -a Attribute=Value ... ]

chpath -h

Description
The chpath command either changes the operational status of paths to the specified device (the -l Name flag) 
or it changes one, or more, attributes associated with a specific path to the specified device. The required syntax 
is slightly different depending upon the change being made.

The first syntax shown above changes the operational status of one or more paths to a specific device. 
The set of paths to change is obtained by taking the set of paths which match the following criteria:

The target device matches the specified device. 
The parent device matches the specified parent (-p Parent), if a parent is specified. 
The connection matches the specified connection (-w Connection), if a connection is specified. 
The path status is PATH_AVAILABLE.
The operational status of a path refers to the usage of the path as part of MPIO path selection. The value of enable indicates that the path is to be used while disable indicates that the path is not to be used. It should be noted that setting a path to disable impacts future I/O, not I/O already in progress. As such, a path can be disabled, but still have outstanding I/O until such time that all of the I/O that was already in progress completes. As such, if -s disable is specified for a path and I/O is outstanding on the path, this fact will be output.

Disabling a path affects path selection at the device driver level. The path_status of the path is not changed in the device configuration database. The lspath command must be used to see current operational status of a path.

The second syntax shown above changes one or more path specific attributes associated with a particular path to a particular device. Note that multiple attributes can be changed in a single invocation of the chpath command; but all of the attributes must be associated with a single path. In other words, you cannot change attributes across multiple paths in a single invocation of the chpath command. To change attributes across multiple paths, separate invocations of chpath are required; one for each of the paths that are to be changed.

Flags
-a Attribute=Value Identifies the attribute to change as well as the new value for the attribute. The Attribute is the name of a path specific attribute. The Value is the value which is to replace the current value for the Attribute. More than one instance of the -a Attribute=Value can be specified in order to change more than one attribute. 
-h Displays the command usage message. 
-l Name Specifies the logical device name of the target device for the path(s) affected by the change. This flag is required in all cases. 
-p Parent Indicates the logical device name of the parent device to use in qualifying the paths to be changed. This flag is required when changing attributes, but is optional when change operational status. 
-P Changes the path's characteristics permanently in the ODM object class without actually changing the path. The change takes affect on the path the next time the path is unconfigured and then configured (possibly on the next boot). 
-w Connection Indicates the connection information to use in qualifying the paths to be changed. This flag is optional when changing operational status. When changing attributes, it is optional if the device has only one path to the indicated parent. If there are multiple paths from the parent to the device, then this flag is required to identify the specific path being changed. 
-s OpStatus Indicates the operational status to which the indicated paths should be changed. The operational status of a path is maintained at the device driver level. It determines if the path will be considered when performing path selection.The allowable values for this flag are: 
enable 
Mark the operational status as enabled for MPIO path selection. A path with this status will be considered for use when performing path selection. Note that enabling a path is the only way to recover a path from a failed condition. 
disable 
Mark the operational status as disabled for MPIO path selection. A path with this status will not be considered for use when performing path selection. 
This flag is required when changing operational status. When used in conjunction with the -a Attribute=Value flag, a usage error is generated. 

Security
Privilege Control: Only the root user and members of the system group have execute access to this command.

Auditing Events:

Event Information 
DEV_Change The chpath command line. 

Examples

To disable the paths between scsi0 and the hdisk1 disk device, enter: 

# chpath -l hdisk1 -p scsi0 -s disable

The system displays a message similar to one of the following: 

paths disabled
or 
some paths enabled

The first message indicates that all PATH_AVAILABLE paths from scsi0 to hdisk1 have been successfully enabled. 
The second message indicates that only some of the PATH_AVAILABLE paths from scsi0 to hdisk1 have been 
successfully disabled.


59.5 Example of usage of virtualization commands:
-------------------------------------------------


Suppose we have the following lpar:

# uname -L
12 zd110l12

# oslevel -r
5300-05

# lsdev -Cc disk
hdisk0 Available          Virtual SCSI Disk Drive
hdisk1 Available          Virtual SCSI Disk Drive <--------------------------------------
hdisk2 Available 02-08-02 SAN Volume Controller MPIO Device                            |
hdisk3 Available 02-08-02 SAN Volume Controller MPIO Device                            |
                                                                                       |
# lsdev -Cc disk -s vscsi                                                              |
hdisk0 Available  Virtual SCSI Disk Drive                                              |
hdisk1 Available  Virtual SCSI Disk Drive                                              |
                                                                                       |
# lscfg -vpl hdisk1                                                                    |
  hdisk1        U9117.570.65B61FE-V12-C6-T1-L810000000000  Virtual SCSI Disk Drive <----
                                                                                       |
# lsslot -c slot                                                                       |
# Slot                    Description       Device(s)                                  |
U7879.001.DQDTZXG-P1-C2   Logical I/O Slot  pci2 fcs0                                  |
U7879.001.DQDTPAK-P1-C6   Logical I/O Slot  pci3 fcs1                                  |
U9117.570.65B61FE-V12-C0  Virtual I/O Slot  vsa0                                       |
U9117.570.65B61FE-V12-C2  Virtual I/O Slot  ent0                                       |
U9117.570.65B61FE-V12-C5  Virtual I/O Slot  vscsi0                                     |
U9117.570.65B61FE-V12-C6  Virtual I/O Slot  vscsi1 <------------------------------------

#lscfg -vpl hdisk3
  hdisk3           U7879.001.DQDTZXG-P1-C2-T1-W50050768013029E5-L1000000000000  SAN Volume Controller MPIO Device

        Manufacturer................IBM
        Machine Type and Model......2145
        ROS Level and ID............0000
        Device Specific.(Z0)........0000043268101002
        Device Specific.(Z1)........0200640
        Serial Number...............600507680190014E3000000000000199   (LUN)


  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block






59.6 HMC commands:
==================


HMC commands:
lssyscfg	List the hardware resource configuration
mksyscfg	Creates the hardware resource configuration
chsyscfg	Changes the hardware resource configuration
rmsyscfg	Removes the hardware resource configuration

Example:

$ lssyscfg -r sys --all -z
name=ITSO_p690
state=Ready
model=7040-681
serial_number=021768A
..
..


Detail on lssyscfg:
-------------------


NAME 

lssyscfg - list system resources 

SYNOPSIS 

lssyscfg -r {lpar | prof | sys | sysprof | cage | frame} [-m managed-system | -e managed-frame] 
            [--filter "filter-data"] [-F [attribute-names] [--header]] [--help] 

DESCRIPTION 

lssyscfg lists the attributes of partitions, partition profiles, or system profiles for the managed-system. 
It can also list the attributes of the managed-system, and of all of the systems managed by this 
Hardware Management Console (HMC).

lssyscfg can also list the attributes of cages in the managed-frame, the attributes of the managed-frame, 
or the attributes of all of the frames managed by this HMC.

OPTIONS 

-r 
The type of resources to list. Valid values are lpar for partitions, prof for partition profiles, sys for 
managed systems, sysprof for system profiles, cage for managed frame cages, and frame for managed frames. 

-m 
The name of either the managed system to list, or the managed system which has the system resources to list. 
The name may either be the user-defined name for the managed system, or be in the form tttt-mmm*ssssssss, 
where tttt is the machine type, mmm is the model, and ssssssss is the serial number of the managed system. 
The tttt-mmm*ssssssss form must be used if there are multiple managed systems with the same user-defined name. 
This option is required when listing partitions, partition profiles, or system profiles. This option is optional 
when listing managed systems, and if it is omitted, then all of the systems managed by this HMC will be listed. 
This option is not valid when listing managed frame cages or managed frames. 

-e 
The name of either the managed frame to list, or the managed frame which contains the cages to list. 
The name may either be the user-defined name for the managed frame, or be in the form ttttmmm* ssssssss, 
where tttt is the type, mmm is the model, and ssssssss is the serial number of the managed frame. 
The tttt-mmm*ssssssss form must be used if there are multiple managed frames with the same user-defined name. 
This option is required when listing managed frame cages. This option is optional when listing managed frames, 
and if it is omitted, then all of the frames managed by this HMC will be listed. This option is not valid when 
listing partitions, partition profiles, system profiles, or managed systems. 

--filter 
The filter(s) to apply to the resources to be listed. Filters are used to select which resources of the specified 
resource type are to be listed. If no filters are used, then all of the resources of the specified resource type 
will be listed. For example, specific partitions can be listed by using a filter to specify the names or IDs of the partitions to list. Otherwise, if no filter is used, then all of the partitions in the managed system will be listed. The filter data consists of filter name/value pairs, which are in comma separated value (CSV) format. The filter data must be enclosed in double quotes. 
The format of the filter data is as follows:

"filter-name=value,filter-name=value,..."

Note that certain filters accept a comma separated list of values, as follows:

""filter-name=value,value,...",..."

When a list of values is specified, the filter name/value pair must be enclosed in double quotes. Depending on the 
shell being used, nested double quote characters may need to be preceded by an escape character, which is usually 
a '\' character. Unless otherwise indicated, multiple values can be specified for each filter.

Valid filter names for partitions:

lpar_names | lpar_ids | work_groups

Only one of these three filters may be specified.

Valid filter names for partition profiles:

lpar_names | lpar_ids

Either the name or the ID of the partition which has the partition profiles to be listed must be specified. 
Only one partition name or ID can be specified.

profile_names

Valid filter names for system profiles:

profile_names

This option is required when listing partition profiles. This option is not valid when listing managed systems, 
managed frame cages, or managed frames.

-F 
A delimiter separated list of attribute names for the desired attribute values to be displayed for each resource. 
If no attribute names are specified, then values for all of the attributes for the resource will be displayed. 
When this option is specified, only attribute values will be displayed. No attribute names will be displayed. 
The attribute values displayed will be separated by the delimiter which was specified with this option. 
This option is useful when only attribute values are desired to be displayed, or when the values of only 
selected attributes are desired to be displayed. 

--header 
Display a header record, which is a delimiter separated list of attribute names for the attribute values that 
will be displayed. This header record will be the first record displayed. This option is only valid when used 
with the -F option. 

--help 
Display the help text for this command and exit. 

EXAMPLES 

List all systems managed by this HMC:

lssyscfg -r sys

List only the user-defined name, machine type and model, and serial number for all of the systems managed by this HMC, 
and separate the output values with a colon:

lssyscfg -r sys -F name:type_model:serial_num

List the managed system system1:

lssyscfg -r sys -m system1

List all partitions in the managed system, and only display attribute values for each partition, following a header 
of attribute names:

lssyscfg -r lpar -m 9406-570*12345678 -F --header

List the partitions lpar1, lpar2, and lpar3:

lssyscfg -r lpar -m system1 --filter ""lpar_names=lpar1, lpar2,lpar3""

List only the names, IDs, and states of partitions lpar1, lpar2, and lpar3, and separate the output values 
with a comma:

lssyscfg -r lpar -m system1 --filter ""lpar_names=lpar1, lpar2,lpar3"" -F name,lpar_id,state

List all partition profiles defined for partition lpar2:

lssyscfg -r prof -m system1 --filter "lpar_names=lpar2"

List the partition profiles prof1 and prof2 defined for the partition that has an ID of 2:

lssyscfg -r prof -m system1 --filter "lpar_ids=2, "profile_names=prof1,prof2""

List all system profiles defined for the managed system:

lssyscfg -r sysprof -m 9406-520*100128A

List the system profile sysprof1:

lssyscfg -r sysprof -m system1 --filter "profile_names= sysprof1"

List all frames managed by this HMC:

lssyscfg -r frame

List the managed frame myFrame:

lssyscfg -r frame -e myFrame

List all cages in the managed frame:

lssyscfg -r cage -e 9119-59*000012C


Power 4 HMC Commands:
---------------------

To view partition state: 

get_partition_state 
To pop a hung partiton into the debugger (aka 'soft reset'): 

reset_partition -m <machine> -p <partition> -t soft 
To force a reboot of a hung partition (aka 'hard reset'): 

reset_partition -m <machine> -p <partition> -t hard 
To force a reboot of a "full system partition" (i.e. a system that is not partitioned) : 

chsysstate -r sys -n <machine> -o off --immed --restart 
To start a partition: 

start_partition -p <partition> -f <profile name> -m <machine> 
To get a listing of boot profiles: 

query_profile_names -m <machine> -p <partition> 


Power 5 HMC Commands:
---------------------

Viewing system state
To list all of the HMC-managed systems, issue the command: 

lssyscfg -r sys 

To list only the "name" field of all of the systems, use the -F flag, together with the name of the field 
(in this case, name): 

lssyscfg -r sys -F name 

To see system state for only a single system, issue: 

lssyscfg -r sys -m <machine> 

The above may be combined with the -F flag as well, to list only one attribute for one machine. 

To view the partition state, issue the command: 

lssyscfg -r lpar -m <machine> 

To see just the names and state: 

lssyscfg -r lpar -m <machine> -F name,state --header 

All frames managed by the HMC may be listed as: 

lssyscfg -r frame 

All cages in a frame may be listed by: 

lssyscfg -r cage -e <frame-name> 
Cages may be processors (cpu memory pci slots), and are identified as contents=sys, or they may be I/O drawers, 
and are identified as contents=io. 

To view the various profiles a partition can be booted into: 

lssyscfg -r prof -m <machine> --filter lpar_names=<partition> 


Changing system state, rebooting:
---------------------------------

To power on an lpar with a profile: 

chsysstate -m <machine> -o on -r lpar -n <lpar name> -f <profile> 
i.e. for example: 

chsysstate -m alpha -o on -r lpar -n alpha-lp1 -f default 
To power on a whole machine (CEC): 

chsysstate -m alpha -o on -r sys 
Etc. chsysstate, lssyscfg and other commands have good explanations if they're run without arguments. 

Issuing a 'soft reset', to push a hung machine into KDB/XMON, is not obvious. The magic incantation is: 

chsysstate -r lpar -m <machine> -n <partition> -o dumprestart 
To issue a 'hard reset', to turn off a partition, no matter what: 

chsysstate -r lpar -m <machine> -n <partition> -o shutdown --immed --restart 


Controlling virtual cpus:
-------------------------

To add one virtual CPU: (note these use -p instead of -n for the partition name) 

chhwres -r proc -m <machine> -p <partition> -o a --procs 1 
To add one-tenth of a cpu processing entitlement: 

chhwres -r proc -m <machine> -p <partition> --procunits 0.1 
To see nice report of: MACHINE,LPAR,PROFILE,STATE: 

lssyscfg -r sys -F name | while read mngsys; do lssyscfg -r lpar -F name,curr_profile,state -m $mngsys | 
sed "s/^/$mngsys,/"; done 


chhwres command:
----------------

This command you should use from the VIOS.


NAME 

chhwres - change hardware resources 

SYNOPSIS 

To add, remove, or move a physical I/O slot:

chhwres -r io -m managed-system -o {a | r | m} 
{-p partition-name | --id partition-ID} 
[{-t target-partition-name | 
--tid target-partition-ID}] 
-l slot-DRC-index [-a "attributes"] 
[-w wait-time] [-d detail-level] [--force]

To set physical I/O attributes:

chhwres -r io -m managed-system -o s 
{-p partition-name | --id partition-ID} 
--rsubtype {iopool | taggedio} 
-a "attributes"

To add or remove a virtual I/O adapter:

chhwres -r virtualio -m managed-system -o {a | r} 
{-p partition-name | --id partition-ID} 
[--rsubtype {eth | scsi | serial}] 
[-s virtual-slot-number] [-a "attributes"] 
[-w wait-time] [-d detail-level] [--force]

To set virtual I/O attributes:

chhwres -r virtualio -m managed-system -o s 
[{-p partition-name | --id partition-ID}] 
--rsubtype {eth | hsl | virtualopti} 
-a "attributes"

To add, remove, or move memory:

chhwres -r mem -m managed-system -o {a | r | m} 
{-p partition-name | --id partition-ID} 
[{-t target-partition-name | 
--tid target-partition-ID}] 
-q quantity 
[-w wait-time] [-d detail-level] [--force]

To set memory attributes:

chhwres -r mem -m managed-system -o s -a "attributes"

To add, remove, or move processing resources:

chhwres -r proc -m managed-system -o {a | r | m} 
{-p partition-name | --id partition-ID} 
[{-t target-partition-name | 
--tid target-partition-ID}] 
[--procs quantity] [--procunits quantity] 
[--5250cpwpercent percentage] 
[-w wait-time] [-d detail-level] [--force]

To set processing attributes:

chhwres -r proc -m managed-system -o s 
{-p partition-name | --id partition-ID} 
-a "attributes" 

DESCRIPTION 

chhwres changes the hardware resource configuration of the managed-system. chhwres is used to perform 
dynamic logical partitioning (DLPAR) operations. 

OPTIONS 

-r 

The type of hardware resources to change. Valid values are io for physical I/O, virtualio for virtual I/O, 
mem for memory, and proc for processing resources. 

--rsubtype 

The subtype of hardware resources to change. Valid physical I/O resource subtypes are slot for I/O slots, 
iopool for I/O pools, and taggedio for tagged I/O resources. Valid virtual I/O resource subtypes are eth 
for virtual ethernet, scsi for virtual SCSI, serial for virtual serial, hsl for High Speed Link (HSL) 
OptiConnect, and virtualopti for virtual OptiConnect resources. 
This option is required for physical I/O or virtual I/O set operations, and for virtual I/O add operations.

This option is not valid for memory or processor operations. 

-m 

The name of the managed system for which the hardware resource configuration is to be changed. 
The name may either be the user-defined name for the managed system, or be in the form tttt-mmm*ssssssss, 
where tttt is the machine type, mmm is the model, and ssssssss is the serial number of the managed system. 
The tttt-mmm*ssssssss form must be used if there are multiple managed systems with the same user-defined name. 

-o 

The operation to perform. Valid values are a to add hardware resources to a partition, r to remove hardware 
resources from a partition, m to move hardware resources from one partition to another, and s to set 
hardware resource related attributes for a partition or the managed-system. 

-p 

The name of the partition for which the operation is to be performed. For a move operation, this is the source 
partition (the partition the resources will be moved from) for the operation. To perform an add, remove, 
or move operation, the partition must be in the running state. 
You can either use this option to specify the name of the partition for which the operation is to be performed, 
or use the --id option to specify the partition's ID. The -p and the --id options are mutually exclusive.

A partition is required to be specified with this option or the --id option for all operations except 
a virtual ethernet or memory set operation. 

--id 

The ID of the partition for which the operation is to be performed. For a move operation, this is the source 
partition (the partition the resources will be moved from) for the operation. To perform an add, remove, 
or move operation, the partition must be in the running state. 
You can either use this option to specify the ID of the partition for which the operation is to be performed, 
or use the -p option to specify the partition's name. The --id and the -p options are mutually exclusive.

A partition is required to be specified with this option or the -p option for all operations except a virtual 
ethernet or memory set operation. 

-t 

The name of the target partition for a move operation. The partition must be in the running state. 
You can either use this option to specify the name of the target partition, or use the --tid option to specify 
the ID of the partition. The -t and the --tid options are mutually exclusive.

A target partition is required to be specified with this option or the --tid option for a move operation. 
This option is not valid for any other operation.

--tid 

The ID of the target partition for a move operation. The partition must be in the running state. 
You can either use this option to specify the ID of the target partition, or use the -t option to specify 
the name of the target partition. The --tid and the -t options are mutually exclusive.

A target partition is required to be specified with this option or the -t option for a move operation. 
This option is not valid for any other operation.

-l 

The DRC index of the physical I/O slot to add, remove, or move. 

-s 

The virtual slot number of the virtual I/O adapter to add or remove. 
When adding a virtual I/O adapter, if this option is not specified then the next available virtual slot number 
will be assigned to the virtual I/O adapter.

When removing a virtual I/O adapter, this option is required. 

-q 

The quantity of memory to add, remove, or move. The quantity specified must be in megabytes, it must be a multiple 
of the memory region size for the managed-system, and it must be greater than 0. 

--procs 

When adding or removing processing resources to or from a partition using dedicated processors, or when moving 
processing resources from a partition using dedicated processors to another partition using dedicated processors, 
use this option to specify the quantity of dedicated processors to add, remove, or move. 
When adding or removing processing resources to or from a partition using shared processors, or when moving processing 
resources from a partition using shared processors to another partition using shared processors, use this option 
to specify the quantity of virtual processors to add, remove, or move.

When moving processing resources from a partition using dedicated processors to a partition using shared processors, 
use this option to specify the quantity of dedicated processors to be moved from the source partition and added 
as shared processors to the target partition.

This option is not valid when moving processing resources from a partition using shared processors to a partition 
using dedicated processors. The --procunits option must be used instead.

The quantity of processing resources specified with this option must be a whole number greater than 0. 

--procunits 

When adding or removing processing resources to or from a partition using shared processors, or when moving 
processing resources from a partition using shared processors to another partition using shared processors, 
use this option to specify the quantity of processing units to add, remove, or move. 
When moving processing resources from a partition using shared processors to a partition using dedicated processors, 
use this option to specify the quantity of shared processors to be moved from the source partition and added as 
dedicated processors to the target partition.

This option is not valid when moving processing resources from a partition using dedicated processors to a partition 
using shared processors. The --procs option must be used instead.

When moving processing resources from a partition using shared processors to a partition using dedicated 
processors, the quantity of processing units specified with this option must be a whole number. Otherwise, 
the quantity of processing units specified with this option can have up to 2 decimal places. In either case, 
the quantity specified must be greater than 0. 

--5250cpwpercent 

The percentage of 5250 Commercial Processing Workload (CPW) to add, remove, or move. The percentage specified 
can have up to 2 decimal places, and it must be greater than 0. 
This option is only valid for i5/OS(TM) partitions and can only be used when the managed-system supports 
the assignment of 5250 CPW percentages to partitions. 

-w 

The elapsed time, in minutes, after which an add, remove, or move operation will be aborted. 
wait-time must be a whole number. If wait-time is 0, the operation will not be timed out.

If this option is not specified, a default value of 5 minutes is used.

This option is valid for all add, remove, and move operations for AIX(R), Linux(TM), and virtual I/O server 
partitions. This option is also valid for memory add, remove, and move operations for i5/OS partitions. 

-d 

The level of detail to be displayed upon return of an add, remove, or move operation. Valid values are 0 (none) 
through 5 (highest). 
If this option is not specified, a default value of 0 is used.

This option is valid for all add, remove, and move operations for AIX, Linux, and virtual I/O server partitions. 

--force 

This option allows you to force a remove or move operation to be performed for a physical I/O slot that is currently 
in use (varied on) by an i5/OS partition. 
This option also allows you to force an add, remove, or move operation to be performed for an AIX, Linux, 
or virtual I/O server partition that does not have an RMC connection to the HMC. If this command completes 
successfully, you will need to restart your operating system for the change to take affect. You should only use 
this option if you intentionally configured your LAN to isolate the HMC from the operating system of your partition. 

-a 

The configuration data needed to create virtual I/O adapters or set hardware resource related attributes. 
The configuration data consists of attribute name/value pairs, which are in comma separated value (CSV) format. 
The configuration data must be enclosed in double quotes. 

The format of the configuration data is as follows:

attribute-name=value,attribute-name=value,...

Note that certain attributes accept a comma separated list of values, as follows:

"attribute-name=value,value,...",...

When a list of values is specified, the attribute name/value pair must be enclosed in double quotes. Depending on 
the shell being used, nested double quote characters may need to be preceded by an escape character, 
which is usually a '\' character.

If '+=' is used in the attribute name/value pair instead of '=', then the specified value is added to the existing 
value for the attribute if the attribute is numerical. If the attribute is a list, then the specified value(s) 
is added to the existing list. 

If '-=' is used in the attribute name/value pair instead of '=', then the specified value is subtracted from 
the existing value for the attribute if the attribute is numerical. If the attribute is a list, then the 
specified value(s) is deleted from the existing list. 

Valid attribute names for attributes that can be set when adding, removing, or moving a physical I/O slot: 

slot_io_pool_id 
Valid attribute names for setting I/O pool attributes: 

lpar_io_pool_ids 
comma separated 
Valid attribute names for setting tagged I/O resources (i5/OS partitions only): 

load_source_slot 
DRC index of I/O slot, or virtual slot number 
alt_restart_device_slot 
DRC index of I/O slot, or virtual slot number 
console_slot 
DRC index of I/O slot, virtual slot number, or the value hmc 
alt_console_slot 
DRC index of I/O slot, or virtual slot number 
op_console_slot 
DRC index of I/O slot, or virtual slot number 
Valid attribute names for adding a virtual ethernet adapter: 

ieee_virtual_eth 
Valid values: 
0 - not IEEE 802.1Q compatible 
1 - IEEE 802.1Q compatible 

Required 
port_vlan_id 
Required 
addl_vlan_ids 
is_trunk 
Valid values: 
0 - no 
1 - yes 
trunk_priority 
Valid values are integers between 1 and 15, inclusive 
Required for a trunk adapter 
Valid attribute names for adding a virtual SCSI adapter: 

adapter_type 
Valid values are client or server (server adapters can only be added to i5/OS partitions on IBM(R) 
eServer(TM) i5 servers, or virtual I/O server partitions) 
Required 
remote_lpar_id | remote_lpar_name 
One of these attributes is required for a client adapter 
remote_slot_num 
Required for a client adapter 
Valid attribute names for adding a virtual serial adapter: 

adapter_type 
Valid values are client or server (client adapters cannot be added to i5/OS partitions on IBM System p5 or 
eServer p5 servers, and server adapters can only be added to i5/OS or virtual I/O server partitions) 
Required 
remote_lpar_id | remote_lpar_name 
One of these attributes is required for a client adapter 
remote_slot_num 
Required for a client adapter 
supports_hmc 
The only valid value is 0 for no 
Valid attribute names for setting virtual ethernet attributes: 

mac_prefix
Valid attribute names for setting HSL OptiConnect attributes (i5/OS partitions only): 

hsl_pool_id 
Valid values are: 
0 - HSL OptiConnect is disabled 
1 - HSL OptiConnect is enabled 
Valid attribute names for setting virtual OptiConnect attributes (i5/OS partitions only): 

virtual_opti_pool_id 
Valid values are: 
0 - virtual OptiConnect is disabled 
1 - virtual OptiConnect is enabled 
Valid attribute names for setting memory attributes: 

requested_num_sys_huge_pages 
Valid attribute names for setting processing attributes: 

sharing_mode 
Valid values are: 
keep_idle_procs - valid with dedicated processors 
share_idle_procs - valid with dedicated processors 
cap - valid with shared processors 
uncap - valid with shared processors 
uncap_weight 

--help 

Display the help text for this command and exit. 


EXAMPLES 

Add the I/O slot with DRC index 21010001 to partition p1 and set the I/O pool ID for the slot to 3:

chhwres -r io -m sys1 -o a -p p1 -l 21010001 
-a "slot_io_pool_id=3"

Add I/O pools 2 and 3 to the I/O pools in which partition p1 is participating:

chhwres -r io --rsubtype iopool -m 9406-520*1234321A -o s 
-p p1 -a ""lpar_io_pool_ids+=2,3""

Add a virtual ethernet adapter to the partition with ID 3:

chhwres -r virtualio -m 9406-520*1234321A -o a --id 3 
--rsubtype eth -a "ieee_virtual_eth=1, 
port_vlan_id=4,"addl_vlan_ids=5,6",is_trunk=1, 
trunk_priority=1"

Remove the virtual adapter in slot 3 from partition p1:

chhwres -r virtualio -m sys1 -o r -p p1 -s 3

Enable HSL OptiConnect for the i5/OS partition i5_p1:

chhwres -r virtualio -m sys1 -o s -p i5_p1 
--rsubtype hsl -a "hsl_pool_id=1"

Add 128 MB of memory to the partition with ID 1, and time out after 10 minutes:

chhwres -r mem -m sys1 -o a --id 1 -q 128 -w 10

Remove 512 MB of memory from the AIX partition aix_p1, return a detail level of 5:

chhwres -r mem -m 9406-520*1234321A -o r -p aix_p1 -q 512 
-d 5

Set the number of pages of huge page memory requested for the managed system to 2 (the managed system must be 
powered off):

chhwres -r mem -m sys1 -o s -a "requested_num_sys_huge_pages=2"

Move 1 processor from partition p1 to partition p2 (both partitions are using dedicated processors):

chhwres -r proc -m 9406-520*1234321A -o m -p p1 -t p2 
--procs 1

Move .5 processing units from the partition with ID 1 to the partition with ID 2 (both partitions are using 
shared processors):

chhwres -r proc -m sys1 -o m --id 1 --tid 2 --procunits .5

Add .25 processing units to the i5/OS partition i5_p1 and add 10 percent 5250 CPW:

chhwres -r proc -m sys1 -o a -p i5_p1 --procunits .25 
--5250cpwpercent 10


lshwres command:
----------------

lshwres -m "managed-system" [-p "partition-name" | -all ] -r [ resource-type ] [ -y "led-type" ] 
                            [ -F < format > ] [-help 

List system level memory information and include the minimum memory required to support a maximum of 1024 MB: 

# lshwres -r mem --level sys --maxmem 1024

List all memory information for partitions lpar1 and lpar2, and only display attribute values, following a 
header of attribute names: 

# lshwres -r mem --level lpar --filter "\"lpar_names=lpar1,lpar2\"" -F --header

List all I/O units on the system: 

# lshwres -r io --rsubtype unit

List all virtual Ethernet adapters on the managed system: 

# lshwres -r virtualio --rsubtype eth --level lpar

List all virtual slots for partition lpar1: 

# lshwres -r virtualio --rsubtype slot --level slot --filter "lpar_names=lpar1"

List only the installed and configurable processors on the system: 

# lshwres -r proc --level sys -F installed_sys_proc_units,configurable_sys_proc_units


lpar_netboot command:
---------------------

NAME 

lpar_netboot - retrieve MAC address and physical location code from network adapters for a partition or 
instruct a partition to network boot 

SYNOPSIS 

-- To retrieve MAC address and physical location code:

lpar_netboot -M -n [-v] [-x] [-f] [-i] [-A] -t ent [-D -s speed -d duplex -S server -G gateway -C client] 
             partition-name partition-profile managed-system

-- To perform network boot:

lpar_netboot [-v] [-x] [-f] [-i] [-g args] [{-A -D | [-D] -l physical-location-code | [-D] -m MAC-address}] 
             -t ent -s speed -d duplex -S server -G gateway -C client partition-name partition-profile managed-system

-- To retrieve MAC address and physical location code on a system supporting a full system partition:

lpar_netboot -M -n [-v] [-x] [-f] [-i] [-A] -t ent [-D -s speed -d duplex -S server -G gateway -C client] 
             managed-system managed-system

-- To perform network boot on a system supporting a full system partition:

lpar_netboot [-v] [-x] [-f] [-i] [-g args] [{-A -D | [-D] -l physical-location-code | [-D] -m MAC-address}] 
             -t ent -s speed -d duplex -S server -G gateway -C client managed-system managed-system

DESCRIPTION 

lpar_netboot instructs a logical partition to network boot by having it send out a bootp request to a server 
specified with the -S option. The server can be an AIX(R) NIM server serving SPOT resources or any server 
serving network boot images. If specified with the -M and -n options, lpar_netboot will return the 
Media Access Control (MAC) address and the physical location code for a network adapter of the type specified 
with the -t option. When the -m option is specified, lpar_netboot will boot a partition using the network adapter 
which has the specified MAC address. When the -l option is specified, lpar_netboot will boot a partition using 
the network adapter which has the specified physical location code. The MAC address and physical location code 
of a network adapter is dependent upon the hardware resource allocation in the partition profile the partition 
was booted with. The lpar_netboot command requires arguments for partition name, partition profile, and the name 
of the managed system which has the partition. 

OPTIONS 

-A 

Return all adapters of the type specified with the -t option. 
-C 

The IP address of the partition to network boot. 
-D 

Perform a ping test and use the adapter that successfully pings the server specified with the -S option. 
-G 

The gateway IP address of the partition specified with the -C option. 
-M 

Discover network adapter MAC address and physical location code. 
-S 

The IP address of the machine from which to retrieve the network boot image during network boot. 
-d 

The duplex setting of the partition specified with the -C option. Valid values are full, half, and auto. 
-f 

Force close the virtual terminal session for the partition. 
-g 

Specify generic arguments for booting the partition. 
-i 

Force immediate shutdown of the partition. If this option is not specified, a delayed shutdown will be performed. 
-l 

The physical location code of the network adapter to use for network boot. 
-m 

The MAC address of the network adapter to use for network boot. 
-n 

Instruct the partition to not network boot. 
-s 

The speed setting of the partition specified with the -C option. Valid values are 10, 100, 1000, and auto. 
-t 

The type of adapter for MAC address or physical location code discovery or for network boot. The only valid value is 
ent for ethernet. 
-v 

Display additional information during command execution. 
-x 

Display debug output during command execution. 
partition-name

The name of the partition.

partition-profile

The name of the partition profile.

managed-system

The name of the managed system which has the partition.

--help 

Display the help text for this command and exit. 

EXAMPLES 

To retrieve the MAC address and physical location code for partition machA with partition profile machA_prof 
on managed system test_sys:

$ lpar_netboot -M -n -t ent "machA" "machA_prof" "test_sys"

To network boot the partition machA with partition profile machA_prof on managed system test_sys:

$ lpar_netboot -t ent -s auto -d auto -S 9.3.6.49 -G 9.3.6.1 -C 9.3.6.234 "machA" "machA_prof" "test_sys"

To network boot the partition machA using the network adapter with a MAC address of 00:09:6b:dd:02:e8 with 
partition profile machA_prof on managed system test_sys:

$ lpar_netboot -t ent -m 00096bdd02e8 -s auto -d auto -S 9.3.6.49 -G 9.3.6.1 -C 9.3.6.234 "machA" "machA_prof" "test_sys"

To network boot the partition machA using the network adapter with a physical location code of 
U1234.121.A123456-P1-T6 with partition profile machA_prof on managed system test_sys:

$ lpar_netboot -t ent -l U1234.121.A123456-P1-T6 -s auto -d auto -S 9.3.6.49 -G 9.3.6.1 -C 9.3.6.234 "machA" 
  "machA_prof" "test_sys"

To perform a ping test along with a network boot of the partition machA with partition profile machA_prof on 
managed system test_sys:

$ lpar_netboot -t ent -D -s auto -d auto -S 9.3.6.49 -G 9.3.6.1 -C 9.3.6.234 "machA" "machA_prof" "test_sys"



Other HMC commands:
-------------------

Activate a logical partition						chsysstate  
Change the default partition profile for a logical partition		chsyscfg   
Close a virtual terminal session for an AIXr, Linuxr, or 
Virtual I/O Server partition						rmvterm   
Create a logical partition on a managed system				mksyscfg   
Create a logical partition profile on a managed system			mksyscfg  
Create a Virtual I/O Server						mksyscfg
Issue a command to a Virtual I/O Server					viosvrcmd   
Modify memory resources of a logical partition				chhwres   
Modify processing resources of a logical partition			chhwres   
Modify the properties of a logical partition profile			chsyscfg 
Modify the hardware resource configuration of a logical partition	chhwres   
Modify the I/O resources of a logical partition				chhwres   
Modify the keylock position on a logical partition			chsysstate   
Modify the properties of a logical partition				chsyscfg   
Modify virtual I/O resources of a logical partition			chhwres  
Open a virtual terminal session for an AIX, Linux, or 
Virtual I/O Server partition 						mkvterm   
Perform a Dynamic Logical Partitioning task				chhwres   
Perform a network boot of a logical partition				lpar_netboot   
Perform an operator panel function on a logical partition		chsysstate  
Remove a logical partition from the managed system			rmsyscfg   
Remove a logical partition profile					rmsyscfg  
Restart a logical partition						chsysstate   
Retrieve MAC address and location code for a partition			lpar_netboot   
Shut down a logical partition						chsysstate
View HCA adapter resources of a logical partition			lshwres
View I/O resources of a logical partition				lshwres
View logical partition profiles						lssyscfg
View logical partitions							lssyscfg
View processing resources of a logical partition			lshwres
View memory resources of a logical partition				lshwres
View reference codes for a logical partition				lsrefcode
View SNI adapter resources of a logical partition			lshwres
View virtual I/O resources of a logical partition			lshwres


Example mksyscfg:
-----------------

Get the configuration data from existing LPAR

If you already have LPARs created you can use this command to get their configuration which can be reused as template:
 
 lssyscfg -r prof -m SERVERNAME  --filter "lpar_ids=X, profile_names=normal"
 Create new LPAR using command line

Here is an example, for more information see '''man mksyscfg''' 
 
 mksyscfg -r lpar -m MACHINE -i name=LPARNAME, profile_name=normal, lpar_env=aixlinux, shared_proc_pool_util_auth=1, 
 min_mem=512, desired_mem=2048, max_mem=4096,   proc_mode=shared, min_proc_units=0.2, desired_proc_units=0.5, 
 max_proc_units=2.0, min_procs=1, desired_procs=2, max_procs=2, sharing_mode=uncap, uncap_weight=128,
 boot_mode=norm, conn_monitoring=1, shared_proc_pool_util_auth=1
 Create more LPARs using configuration file

If you want to create more LPARS at once you can use a configuration file and provide it as input for mksyscfg.
Here is an example for 3 LPARs, each definition starting at new line: 
 
name=LPAR1,profile_name=normal,lpar_env=aixlinux,all_resources=0,min_mem=1024,desired_mem=9216,max_mem=9216,proc_mode=shared,min_proc_units=0.3,desired_proc_units=1.0,max_proc_units=3.0,min_procs=1,desired_procs=3,max_procs=3,sharing_mode=uncap,uncap_weight=128,lpar_io_pool_ids=none,max_virtual_slots=10,"virtual_scsi_adapters=6/client/4/vio1a/11/1,7/client/9/vio2a/11/1","virtual_eth_adapters=4/0/3//0/1,5/0/4//0/1",boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,shared_proc_pool_util_auth=1
name=LPAR2,profile_name=normal,lpar_env=aixlinux,all_resources=0,min_mem=1024,desired_mem=9216,max_mem=9216,proc_mode=shared,min_proc_units=0.3,desired_proc_units=1.0,max_proc_units=3.0,min_procs=1,desired_procs=3,max_procs=3,sharing_mode=uncap,uncap_weight=128,lpar_io_pool_ids=none,max_virtual_slots=10,"virtual_scsi_adapters=6/client/4/vio1a/12/1,7/client/9/vio2a/12/1","virtual_eth_adapters=4/0/3//0/1,5/0/4//0/1",boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,shared_proc_pool_util_auth=1
name=LPAR3,profile_name=normal,lpar_env=aixlinux,all_resources=0,min_mem=1024,desired_mem=15360,max_mem=15360,proc_mode=shared,min_proc_units=0.4,desired_proc_units=1.0,max_proc_units=4.0,min_procs=1,desired_procs=4,max_procs=4,sharing_mode=uncap,uncap_weight=128,lpar_io_pool_ids=none,max_virtual_slots=10,"virtual_scsi_adapters=6/client/4/vio1a/13/1,7/client/9/vio2a/13/1","virtual_eth_adapters=4/0/3//0/1,5/0/4//0/1",boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,shared_proc_pool_util_auth=1

Copy this file to HMC and run: 
 
mksyscfg -r lpar -m SERVERNAME -f /tmp/profiles.txt
 




Installation of the Virtual I/O Server software on the VIOS partition:
----------------------------------------------------------------------

You can install in one of three ways:
1. using the CD/DVD drive allocated to the VIOS partition and booting from it
2. Installing the VIOS software from the HMC
3. Installing the media using NIM and the HMC

In this example we assume you use the allocated CD/DVD drive.

1. put the DVD disk in the drive
2. activate the VIOS partition by right-clicking the partionname and selecting the
   Activate choice
3. Select the profile and check the "Open a terminal window" and click the "Advanced" button.
   Under the Bootmode choice, select "SMS" boot mode.
4. After booting the partition, the SMS menu appears.

   Main menu
   1. Select Language
   2. Setup Remote IPL
   3. Change SCSI Settings
   4. Select Console
   5. Select Boot Options

Choose 5 "select Boot Options".
Next, Choose 1 for "Select Install/Boot Device".
Next, choose 3 for CD/DVD.
Next, choose 4 for IDE.
Next, choose 1 for IDE CD-ROM.
Next, choose 2 for Normal Mode Boot.

When the installation has finished, use the padmin user to login.
After logging in, you will be placed in the IOSCLI. Type the following command
to accept the license:

# license -accept


Installing AIX using the CD-ROM device to install a partition with an HMC:
--------------------------------------------------------------------------

This information contains procedures to install the AIX operating system. For more information on concepts 
and considerations involved when performing a base operating system installation of AIX, 
or concepts and requirements involved when using the Network Installation Manager (NIM) to install 
and maintain AIX, refer to the AIX 5L Installation Guide and Reference.

Note:
For the installation method that you choose, ensure that you follow the sequence of steps as shown. 
Within each procedure, you must use AIX to complete some installation steps, while other steps are completed 
using the HMC interface.
In this procedure, you will perform a New and Complete Base Operating System Installation on 
a logical partition using the partition's CD-ROM device. This procedure assumes that there is an HMC 
attached to the managed system.

Prerequisites
Before you begin this procedure, you should have already used the HMC to create a partition 
and partition profile for the client. Assign the SCSI bus controller attached to the CD-ROM device, 
a network adapter, and enough disk space for the AIX operating system to the partition. 
Set the boot mode for this partition to be SMS mode. After you have successfully created the partition 
and partition profile, leave the partition in the Ready state. For instructions about how to create 
a logical partition and partition profile, refer to the Creating logical partitions and partition profiles 
article in the IBM eServer Hardware Information Center.

1. Activate and install the partition (perform these steps in the HMC interface)

__  Step 1. Activate the partition, as follows: 

Insert the AIX 5L Volume 1 CD into the CD device of the managed system. 
Right-click on the partition to open the menu. 
Select Activate. The Activate Partition menu opens with a selection of partition profiles. 
Be sure the correct profile is highlighted. 
Select Open a terminal window or console session at the bottom of the menu to open 
a virtual terminal (vterm) window. 

Select Advanced to open the Advanced options menu. 
For the Boot mode, select SMS. 
Select OK to close the Advanced options menu. 
Select OK. A vterm window opens for the partition.

__  Step 2. In the SMS menu on the vterm, do the following: 

Press the 5 key and press Enter to select 5. Select Boot Options.

PowerPC Firmware
Version SF220_001
SMS 1.5 (c) Copyright IBM Corp. 2000, 2003  All rights reserved.
-------------------------------------------------------------------------------
Main Menu

1. Select Language
2. Setup Remote IPL (Initial Program Load)
3. Change SCSI Settings
4. Select Console
5. Select Boot Options


-------------------------------------------------------------------------------


Press the 2 key and press Enter to select 2. Select Boot Devices. 
Press the 1 key and press Enter to select 1. Select 1st Boot Device. 
Press the 3 key and press Enter to select 3. CD/DVD. 
Select the media type that corresponds to the CD-ROM device and press Enter. 
Select the device number that corresponds to the CD-ROM device and press Enter. 
 The CD-ROM device is now the first device in the Current Boot Sequence list. 
Press the ESC key until you return to the Configure Boot Device Order menu. 
Press the 2 key to select 2. Select 2nd Boot Device. 
Press the 5 key and press Enter to select 5. Hard Drive. 
If you have more than one hard disk in your partition, determine which hard disk you will use 
to perform the AIX installation. Select the media type that corresponds to the hard disk and press Enter. 
Select the device number that corresponds to the hard disk and press Enter. 
Press the x key to exit the SMS menu. Confirm that you want to exit SMS.

__  Step 3. Boot from the AIX 5L Volume 1, as follows: 

Select console and press Enter. 
Select language for BOS Installation menus, and press Enter to open the Welcome to Base Operating System 
Installation and Maintenance menu. 
Type 2 to select Change/Show Installation Settings and Install in the Choice field and press Enter. 

                     Welcome to Base Operating System
                      Installation and Maintenance

Type the number of your choice and press Enter.  Choice is indicated by >>>.

    1 Start Install Now with Default Settings  

    2 Change/Show Installation Settings and Install

    3 Start Maintenance Mode for System Recovery

    88  Help ?
    99  Previous Menu
>>> Choice [1]: 2


__  Step 4. Verify or Change BOS Installation Settings, as follows: 

Type 1 in the Choice field to select the System Settings option. 
Type 1 for New and Complete Overwrite in the Choice field and press Enter. 
Note:
The installation methods available depend on whether your disk has a previous version of AIX installed.
When the Change Disk(s) screen displays, you can change the destination disk for the installation. 
If the default shown is correct, type 0 in the Choice field and press Enter. 
To change the destination disk, do the following: 

 Type the number for each disk you choose in the Choice field and press Enter. Do not press Enter 
 a final time until you have finished selecting all disks. If you must deselect a disk, type its number 
 a second time and press Enter. 

 When you have finished selecting the disks, type 0 in the Choice field and press Enter. 
 The Installation and Settings screen displays with the selected disks listed under System Settings.

If needed, change the primary language environment. Use the following steps to change the primary language 
used by this installation to select the language and cultural convention you want to use. 


Note:
Changes to the primary language environment do not take effect until after the Base Operating System Installation 
has completed and your system is rebooted.

Type 2 in the Choice field on the Installation and Settings screen to select the Primary Language Environment 
Settings option. 
Select the appropriate set of cultural convention, language, and keyboard options. 
Most of the options are a predefined combination, however, you can define your own combination of options. 
To choose a predefined Primary Language Environment, type that number in the Choice field and press Enter. 


Monitoring VIOS:
----------------

Note 1:
-------

With Virtual I/O Server fix pack 8.1.0, you can install and configure the 
"IBM Tivoli Monitoring System Edition for System pT agent" on the Virtual I/O Server. 
IBM Tivoli Monitoring System Edition for System p enables you to monitor the health and availability 
of multiple IBM System p servers (including the Virtual I/O Server) from the Tivoli EnterpriseT Portal.

IBM Tivoli Monitoring System Edition (SE) for System p V6.1 is a new offering of the popular IBM Tivoli 
Monitoring (ITM) product specifically designed for IBM System p AIX customers. ITM SE for System p V6.1 monitors 
the health and availability of System p servers, providing rich graphical views of your AIX, LPAR, CEC, 
and VIOS resources in a single console, delivering robust monitoring and quick time to value.

ITM SE for System p includes out-of-the-box best practice solutions created by AIX and VIOS developers. 
These best practice solutions include predefined thresholds for alerting on key metrics, Expert Advice that 
provides an explanation of the alert and recommends potential actions to take to resolve the issue, and the 
ability to take resolution actions directly from the Tivoli Enterprise Portal or set up automated actions. 
In addition, users have the ability to visualize the monitoring data in the Tivoli Enterprise Portal to determine 
the current state of the AIX, LPAR, CEC, and VIOS resources.

Note 2:
-------

How to monitor IBM's Virtual-IO-Server (VIO) with OpenSMART
The following steps tells you, how to monitor IBM's Virtual-IO-Server can be monitored by OpenSMART.

Download the latest agent- (or whole source-) pack from the OpenSMART home page.

Transfer this tar file to the VIO-Server (we assume to /tmp/opensmart-client-0.4.tar.gz) and do:

telnet vio-server

IBM Virtual I/O Server

login: padmin
padmin's Password: 
Last unsuccessful login: Tue Feb 28 03:08:08 CST 2006 on /dev/vty0 
Last login: Wed Mar 15 16:14:11 CST 2006 on /dev/pts/0 from 192.168.1.1

$ oem_setup_env
# mkdir /home/osmart
# useradd -c "OpenSMART Monitoring" -d /home/osmart osmart
# chown -R saicsadm:staff /home/osmart
# passwd osmart
Changing password for "saicsadm"
osmart's New password: ******
Enter the new password again: *****

# su - osmart
$ mkdir ostemp
$ cd ostemp
$ gunzip /tmp/opensmart-client-0.4.tar.gz
$ tar -xf /tmp/opensmart-client-0.4.tar
$ ./agent/install_agent ~
[ ... ]
Copy ../lib/opensmartresponse.dtd -> /usr/local/debis/os/etc/opensmartresponse.dtd
chmod 644 /usr/local/debis/os/etc/opensmartresponse.dtd



     **********************************************
     *   OpenSMART agent installed successfully   *
     **********************************************

$ cd ~
$ rm -rf ostemp
        
That's it - your installation is complete. Now you can configure your osagent (and do not forget to set up 
a cronjob for your osagent).

We recommend the following checks:


DISK Section 9.12, "Configuration for the disk check."
LOGS Section 9.20, "Configuration for the logs check."
ERRPT Section 9.14, "Configuration for the errpt check"
LOAD Section 9.19, "Configuration for the load check."
PROC Section 9.35, "Configuration for the proc check."
AIXSWRAID Section 9.8, "Configuration for the aixswraid check."


Note 3: lpar2rrd
----------------

LPAR2RRD Micro-Partitioning statistics tool
The tool is capable produce historical graphs of shared CPU usage on micro-partitioned systems.
Idea and rough design has been initiated by Ondrej Plachy, IBM Czech Republic. Tool itself has been written 
by Pavel Hampl, IBM Czech Republic.

FEATURES

It intended only for Micro-Partitioned systems 
it creates charts based on utilization data collected on HMC's 
it does not need any clients (agents) on LPARs 
it creates automatically a menu based WWW front end for viewing charts 
it is easy to install, configure and use (initial configuration should not take more than an hour! adding next HMC 
   takes up to 5mins) 
no any additional management when ANY change of LPAR configuration (tool recognizes new LPARs automatically) 
it supports all types of LPAR Micro-partitions and OSes (pSeries/iSeries, AIX/Linux/AS400) 
supported only on HMC >= V5R2.1 (it must support : Utilization data collection, check prerequisities for more) 
when Utilization data collection is supported, but disabled then the tool prompts you for enabling it (then it needs 2 hours at least to allow HMC collect data and let the tool show any data in charts) 
graphs are created 1 year back if historical utilization data on HMC's is available (note HMC saves hourly averages for last 2 months, daily averages for last 2 years when collection of data is enabled) 
initially the tool loads all historical data back to 1 year, then it loads only new data every hour, saves it in RRDTool databases and redraws the graphs 
it uses ssh-keys based access to HMC servers to get data automatically 
optionally for one time chart creation can be used whatever account on HMC and password authentification (you will be prompted to type a password couple of times when the tool is running) 
it does not cause considerable load on the hosted server where the tool is installed (it runs once an hour for couple of seconds) 
tool can be hosted on any *NIX platform, just needs a web server, ssh, perl and RRDTool installed. (check prerequisities bellow) 
it creates 4 kind of graphs for each lpar, shared pool and memory pool. First 3 (last day, week and month) are based 
   on hourly average data, last one (yearly chart) is based on daily averages 







More information about pSeries lpars an AIX:
--------------------------------------------

www.ibm.com/servers/eserver/pseries/lpar/
publib.boulder.ibm.com/infocenter/pseries/index.jsp?topic=/com.ibm.help.doc/welcome.htm


Errors at VIOS:
---------------

Note 1:
-------

Procedure: Install/Update HDLM drivers


# login to vio server as "padmin".
# Switch to "oem" prompt.
oem_setup_env
umount /mnt
mount bosapnim01:/export/lpp_source/hitachi /mnt

#  Install and update all filesets from the directories below.
#  "smitty install_all"
cd /mnt/hdlm_5601/aix_odm/V5.0.0.1
cd /mnt/hdlm_5601/aix_odm/V5.0.0.4u
cd /mnt/hdlm_5601/aix_odm/V5.0.1.4U
cd /mnt/hdlm_5601/aix_odm/V5.0.52.1U

#  Copy license file.
cd /mnt/hdlm_5601/license/enterprise
cp *.plk /var/tmp/hdlm_license

#  install and update all filesets from the above directory
#  "smitty install_all"
#  Fileset DLManager 5.60.1.100  Hitachi Dynamic Link Manager
cd /mnt/hdlm_5601

#  Leave the current Directory and unmount Driver Source Directory.
cd /
umount /mnt


Procedure: Install/Update VIO fixpack:
======================================

# Login to VIO server as "padmin"
# Obtain current IOS level
ioslevel

# Update VIO to latest IOS level
mount bosapnim01:/export/lpp_source/aix/vio_1200 /mnt
updateios -dev /mnt
	** Enter "y" to continue install

# Return to "root" shell prompt and HALT system.
oem_setup_env
shutdown -Fh

# Activate LPAR from HMC WebSM


Procedure: Configure VIO Server to utilize Boot Disks:
======================================================


# Login as "padmin"
# Switch to "oem" prompt
oem_setup_env

# Run in korn shell 93
  ksh93
  
# Remove any vhost adapter configuration settings
  for (( i=0; i<=48; ++i ))
  do
    /usr/ios/cli/ioscli rmdev -pdev vhost${i}
  done

# Remove all HDLM disks
  for i in $( lsdev -Cc disk -F name | grep dlmfdrv )
  do
    rmdev -Rdl ${i}
  done

# Remove all hdisks except for hdisk0 and hdisk1 - assumed to be rootvg
  for i in $( lsdev -Cc disk -F name | grep hdisk | egrep -v 'hdisk0$ | hdisk1$' )
  do
    rmdev -Rdl ${i}
  done

# If an HDLM unconfig file exists, rename it 
  [[ -f /usr/DynamicLinkManager/drv/dlmfdrv.unconf ]] &&
  mv /usr/DynamicLinkManager/drv/dlmfdrv.unconf \
     /usr/DynamicLinkManager/drv/$( date +"%Y%m%d").dlmfdrv.unconf

#  Verify "dlmfdrv.unconf" was renamed.  
   ls /usr/DynamicLinkManager/drv
	
# Set fast fail Parameter for SCSI Adapters and Reconfigure FC Adapters
                       -l fscsi0 -a fc_err_recov=fast_fail
  chdev -l fscsi1 -a fc_err_recov=fast_fail
    chdev -l fscsi2 -a fc_err_recov=fast_fail
    cfgmgr -vl fcs0
  cfgmgr -vl fcs1
  cfgmgr -vl fcs2

# Change HDLM settings
  cd /usr/DynamicLinkManager/bin
  print y | ./dlmodmset -e on
  print y | ./dlmodmset -b 68608

# Reconfigure HDLM disks
  ./dlmcfgmgr

# Turn off reserve settings on HDLM Driver
  ./dlnkmgr set -rsv on 0 -s

# Remove HDLM disks
  for i in $( lsdev -Cc disk -F name | grep dlmfdrv )
  do
    rmdev -Rdl ${i}
  done

# Change reserve policy on hdisks to "no_reserve"
  for i in $( lsdev -Cc disk -F name |
              grep hdisk |
              egrep -v 'hdisk0$|hdisk1$' )
  do
    chdev -l ${i} -a reserve_policy=no_reserve
  done

# Reconfigure HDLM disks
  ./dlmcfgmgr

# Verify all HDLM disks have an assigned PVID
  for i in $( lsdev -Cc disk -F name | grep dlmfdrv )
  do
    chdev -l ${i} -a pv=yes
  done
  lspv

# Remove any vhost adapter configuration settings
 /usr/ios/cli/ioscli lsmap -all

# Verify all vhosts adapters exist wihout Devices.
 SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0          U9119.590.51A432C-V3-C10                     0x00000000

VTD                   NO VIRTUAL TARGET DEVICE FOUND

# Reboot VIO Server.
shutdown -Fr


# End of Final Procedure




# Do not perform this step as part of this procedure
 for (( i=0; i<=48; ++i ))
  do
    /usr/ios/cli/ioscli rmdev -pdev vhost${i}
  done



Other notes on VIOS:
--------------------

Note 1:
-------

Getting a Root Shell on VIOS 
IBM don't like people using the root account on their VIOS servers, but it is kind of useful for setting up 
things like the correct date. Just try the oem_setup_env command. 

Note 2:
-------

VIOS Install 
On our p5-550, I have allocated most physical devices to the VIOS LPAR so it can be used to divide these 
amongst AIX/Linux LPARs. The VIOS LPAR has four gigabit ethernet adapters allocated to it. 
Presently only two are in use as an aggregated link to the "real world". It also has a virtual ethernet adapter 
which connects internally to the p5-550. 

As for storage, there are 6 copper SCSI controllers/chains allocated and 3 FC HBAs. The ordinary SCSI stuff 
has 6 disks (2 per chain) with 4 36Gbyte disks and 2 144Gbyte disks. Two of the 36Gbyte disks are allocated 
to the rootvg volume group and are mirrored (with mirrorios). The remainder of the disks are allocated to a 
clients volume group. It is intended that new clients will have a logical volume for the operating system allocated 
out of this pool. 

Two of the Fiber Channel HBAs are assigned to the VIO partition and connected to port 13 of both switch ports 
in the SAN fabric. The SAN has been configured to attach an 860Gbyte RAID5 LUN to the IBM. Due to lack of 
multipathing support in VIOS, there are multiple apparent disks (hdisk6 ... hdisk13) which are in fact one. 
The first (hdisk6) was used to create the client_data volume group. It is intended that this volume group will 
be used for /data filesystems. 

Note: To add virtual devices dynamically to the VIOS partition, use the "Dynamic" option in the HMC. 


Networking on the VIOS LPAR:
---------------------------- 
Two (at present) of the gigabit adapters assigned to the VIOS LPAR are channelled together for redundancy. 
Telecomms will deliver all the relevant VLANS down this interface which can be bridged to internal Ethernets. 
Note that VLAN14 is configured as the native VLAN of the channel. To do this :- 


- Channel the two Ethernet NICs attached to the network: mkvdev -lnagg ent2 ent3 which produced ent5 
- Bridge between the channelled adapter and the internal network. 
  mkvdev -sea ent5 -vadapter ent4 -default ent4 -defaultid 1 which produced ent6 
- Configure the new bridge with an IP address: 
  mktcpip -hostname name -inetaddr 148.197.14.x -netmask 255.255.255.0 -gateway 148.197.14.254 -interface ent6 
- VLAN interfaces are unlikely to be necessary on the VIOS, but can be created :- 
  mkvdev -vlan ent6 -tagid 240. 

Creating a Client Logical Volume for System Disks:
-------------------------------------------------- 

-Create a logical volume for the relevant client. The name of it should be easily identifyable as 
 being assiciated with the relevant client. ... mklv -lv clientname_sys clients 18G. This creates a logical 
 volume 18Gbytes in size (enough for AIX or Linux operating system) on the clients volume group. 
-Mirror the logical volume for safety: mklvcopy lv_name. Warning, this is SLOW. 
-Assign the logical volume to a virtual adaptor: 
 mkvdev -vdev logical-volume -vadapter vhostN -dev name_for_target 

Creating a Client Logical Volume for Data Disks:
------------------------------------------------ 
- Find the virtual scsi adapter by running 
 lsdev -dev vvodka -parent 
 (of course this finds the vhost for vodka and not necessarily the one you are hunting for). 
- Create a logical volume for the relevant client. The name of it should be easily identifyable 
  as being assiciated with the relevant client. ... 
  mklv -lv clientname_data clients 100G. This creates a logical volume 100Gbytes in size on the 
  clients volume group. 
- Assign the logical volume to a virtual adaptor: 
  mkvdev -vdev logical-volume -vadapter vhostN -dev name_for_target 


Note 3:
-------

Installing AIX on an LPAR via NIM 

- On vodka (the NIM server), configure a hostname for the machine in /etc/hosts. The hostname should be the final 
  hostname of the machine to install with a '-i' added to the end (absinthe becomes absinthe-i) 
  as the connection to the NIM server is on subnet 14. This also means the LPAR to be installed needs 
  an IP address on subnet 14. 
- Go to the NIM smitty menu (smitty nim) and "Perfrom NIM Administration Tasks". 
- Select "Manage Machines", and "Define a Machine". Give the installation hostname of the machine (*absinthe-i), 
  and press Enter. 
- Select "ent" as the primary install interface. 
- On the large form, leave most things alone. But change the "Cable type" to "N/A" and hit Enter. On a previous attempt 
  it was also necessary to change the "Subnet Mask" to 255.255.255.0, and to change the "Default gateway" 
  to 148.197.14.254. 
- Back at the shell prompt, enter smitty nim_bosinst 
- Select the appropriate machine (if it isn't listed something went wrong). 
- Select an "rte" install type. 
- Select "lpp_source_530" as the LPP_SOURCE (package source) to use. 
- Select "spot_530" as the SPOT (install root) to use. 
- A long form then appears, scroll down to change the following parameters :- 
  RESOLV_CONF to use: (No choices!!) 
  ACCEPT new licenses: Change to "yes" (use Tab). 
  ACCEPT new license agreements: Change to "yes" (use Tab) 
  Press Enter to accept the changes. 
- Boot the LPAR to be installed via the HMC. Ensure that you select the "Advanced" button and specify "SMS" as the boot mode. 
- Once you have the console at the SMS menu, select "Setup Remote IPL". 
- Select the "Interpartition Logical LAN" device. 
- Select "IP parameters" 
- Specify the relevant IP addresses. 
- Go back to the main menu ("M") and select "Boot options". 
- Select "Configure Boot Device order" 
- Select "Select 1st boot device" 
- Select "Network" 
- Select "Virtual Ethernet" 
- After the virtual ethernet is specified as the boot device, exit SMS by entering "X" 
- The system should then boot over the network ... you will see lots of "IBM"'s appear on the screen followed by various messages proceeded by "BOOTP". The machine waits 60s for "Spanning Tree" ... this is normal (unless of course you have turned it off!). 
- The boot process should go through a BOOTP phase (when it obtains an address and various parameters) followed by a TFTP stage when the kernel is loaded. 
- After the kernel has booted, you will be asked to enter a digit (either '0' or '1') to select the system console for the install process. 
- Then you will be asked to enter a digit for the preferred installation language. 
- Finally you will be into the standard AIX installation process ... just accept the default settings. 

NIM Hacking 

Create an lpp_source (source of packages) without copying from CDs with nim -o define -t lpp_source -a server=master
 -a location=/nim/lpp_source/lpp_source_530  lpp_source_530 
You can create a SPOT resource using a suitable lpp_source (i.e. a full AIX source) as the source. 

Note 4: Errors in VIOS:
-----------------------

Error ED995F18
--------------

VSCSI_ERR3
ED995F18
000DRCFFFF FFF9

The Virtual SCSI server adapter (partition number and slot number) specified in the client adapter definition 
does not exist

On the HMC, correct the client adapter definition to associate it with a valid server adapter.

Error BFE4C025
--------------

BFE4C025 0222073404 P H sysplanar0 UNDETERMINED ERROR
BFE4C025   0113124506 P H sysplanar0     UNDETERMINED ERROR
BFE4C025 0126122306 P H sysplanar0 UNDETERMINED ERRR
BFE4C025 0112174806 P H sysplanar0 UNDETERMINED ERRR
BFE4C025   0919215904 P H sysplanar0     UNDETERMINED ERROR


thread:

A:

 had the error on aix 5.1, here is what IBM said : 

This is corrected with apar IY46874 that ships devices.chrp.base.rte 
5.1.0.53 

Thx to apply that apar , reboot the system. 

http://www-912.ibm.com/eserver/support/fixes/fcgui. jsp


Q:

 get a error "BFE4C025" from errpt, and this error often happen at 
almost all type RS/6000. 

when this happened, the system is running OK also, I also get none 
error information by 'diag' tools. 
I don't know what happen at the system. Who can help me? 


A:

Diags show the SRC, B700 F104, look it up here:
http://publib.boulder.ibm.com/infocenter/eserver/v1r3s/index.jsp?topic=/ipha6/refcodelist.htm

Operating System error 
Platform Licensed Internal Code terminated a partition.

If SRC word 3 is 0007, then a user may have initiated a function 22 prior to the operating system completing the IPL. 
If a function 22 was not performed, or if SRC word 3 is not 0007, look for other serviceable errors 
which occurred at same time frame.

Your word 3 is 0007 so it looks like some one forced a dump from the O/S or the Op Panel or the HMC, 
either by powering off an LPAR configured to dump or the LPAR crashed and dumped.

http://publib.boulder.ibm.com/infocenter/eserver/v1r3s/index.jsp?topic=/iphb5/f22msdc.htm

No indication of a hardware problem or any need to call support nagger.


Note:
-----

Error BFE4C025 could really be related to a hardware fault. We have seen that faulty system memory can lead to this
error in the error report:

error detail:

LABEL:          SCAN_ERROR_CHRP
IDENTIFIER:     BFE4C025

Date/Time:       Sat Apr 25 20:06:22 ZOM 2009
Sequence Number: 1110
Machine Id:      00CDA84C4C00
Node Id:         poon
Class:           H
Type:            PERM
Resource Name:   sysplanar0
Resource Class:  planar
Resource Type:   sysplanar_rspc
Location:

Description
UNDETERMINED ERROR

Failure Causes
UNDETERMINED

        Recommended Actions
        RUN SYSTEM DIAGNOSTICS.

Detail Data
PROBLEM DATA
..
lots of hex numbers
..




DLPAR scripts:
==============


Note 1:
-------

Abstract 
 
DLPAR scripts are written by system administrators or software vendors to automate system resources in a dynamic 
LPAR environment. Scripts can be implemented in any scripting language, such as perl or shell, or it can be 
a compiled program. They are maintained by the system administrator using the drmgr command. 
The following Tip provides an overview of how to craft a script.

For related information about this topic, refer to the following IBM Redbooks publication:
AIX 5L Differences Guide Version 5.2 Edition, SG24-5765-02 
. 
Contents 
_ 
DLPAR scripts, used to automate LPAR reconfiguration, are written by system administrators or software vendors. 
Scripts can be implemented in any scripting language, such as perl or shell, or it can be a compiled program. 
They are maintained by the system administrator using the drmgr command. The syntax of the command is as follows: 

drmgr { -i script_name [-w minutes ] [ -f ] | -u script_name } [ -D hostname ]
drmgr [ -b ]
drmgr [ -R script_install_root_directory ]
drmgr [ -S syslog_ID ]
drmgr [ -l ] 

Descriptions of the most important flags for the drmgr command are provided in the following table. 
For a complete reference, refer to the man page or the documentation. 

Table 1. Flags of the drmgr command Flags Description 
-i script_name	This flag is used to install a script specified by the script_name parameter. By default, 
               	scripts are installed in the /usr/lib/dr/scripts/all directory. 
-w minutes	This flag is used to override the time limit value specified by the vendor for the script. 
		The script will be ended if it exceeds the specified time limit. 
-f		Using this flag forces an installed script to be overwritten. 
-u script_name	This flag is used to uninstall a script specified by the script_name parameter. 
-l		This option will display the details regarding the DLPAR scripts that are currently installed. 

For example, to install the /root/root_dlpar_test.sh script in the default directory, the following command could be used: 

drmgr -i /root/root_dlpar_test.sh 

To list the details, the drmgr -l command is used. The output is similar to the following: 

DR Install Root Directory: /usr/lib/dr/scripts
Syslog ID: DRMGR
------------------------------------------------------------
/usr/lib/dr/scripts/all/root_lpar_test.sh DLPAR test script
Vendor:IBM, Version:1.0, Date:19092002
Script Timeout:10, Admin Override Timeout:0
Resources Supported:
Resource Name: cpu Resource Usage: root_dlpar_test.sh command [parameter]
------------------------------------------------------------ 

DLPAR scripts get notified at each of the DLPAR operation phases explained previously. Notifying DLPAR scripts 
involves invoking the scripts in the appropriate environment with the appropriate parameters. 

The environment the script is executed in is as follows: 

The execution user ID and group ID are set to the uid or gid of the script. 
The PATH environment is set to /usr/bin:/etc:/usr/sbin. 
The working directory is /tmp. 
Environment variables that describe the DLPAR event are set. 

DLPAR scripts can write any necessary output to stdout. The format of the output should be name=value pair strings 
separated by newline characters to relay specific information to the drmgr. For example, the output DR_VERSION=1.0 
could be produced with the following ksh command: 

echo "DR_VERSION=1.0" 

Error and logging messages are provided by DLPAR scripts in the same way as regular output by writing 
name=value pairs to stdout. The DR_ERROR=message pair should be used to provide error descriptions. 
The name=value pairs contain information to be used to provide error and debug output for the syslog. 

DLPAR scripts can also write additional information to stdout that will be reflected to the HMC. 
The level of information that should be provided is based on the detail level passed to the script 
in the DR_DETAIL_LEVEL=N environment variable. N must be in the range of 0 to 5, where the default value 
of zero (0) signifies no information. A value of one (1) is reserved for the operating system and is used 
to present the high-level flow. The remaining levels (2-5) can be used by the scripts to provide information 
with the assumption that larger numbers provide greater detail. 

The syntax the DLPAR script is invoked with is as follows: 

[ input_name1=value1 ... ] scriptname command [ input_parameter1 ... ] 

Input variables are set as environment variables on the command line, followed by the script to be invoked that 
is provided with a command and with further parameters. A description of the function the commands should perform 
is provided in the following table. If the script is called with a command that is not implemented, 
it should exit with a return code of 10.


Table 2. DLPAR script commands Command and parameter Description 
scriptinfo Identifies the version, date, and vendor of the script. It is called when the script is installed. 
register Identifies the resources managed by the script. If the script returns the resource name (cpu or mem), the script will be automatically invoked when DLPAR attempts to reconfigure processors and memory, respectively. The register command is called when the script is installed with the DLPAR subsystem. 
usage resource_name Returns information describing how the resource is being used by the application. The description should be relevant so that the user can determine whether to install or uninstall the script. It should identify the software capabilities of the application that are impacted. The usage command is called for each resource that was identified by the register command. 
checkrelease resource_name Indicates whether the DLPAR subsystem should continue with the removal of the named resource. A script might indicate that the resource should not be removed if the application is not DLPAR-aware and the application is considered critical to the operation of the system. 
prerelease resource_name Reconfigures, suspends, or terminates the application so that its hold on the named resource is released. 
postrelease resource_name Reconfigures, resumes, or restarts the application. 
undoprerelease resource_name Invoked if an error is encountered and the resource is not released. Operations done in the prerelease command should be undone. 
checkacquire resource_name Indicates whether the DLPAR subsystem should proceed with the resource addition. It might be used by a license manager to prevent the addition of a new resource, for example, cpu, until the resource is licensed. 
preacquire resource_name Used to prepare for a resource addition. 
undopreacquire resource_name Invoked if an error is encountered in the preacquire phase or when the event is acted upon. Operations performed in the preacquire command should be undone. 
postacquire resource_name Reconfigure, resume, or start the application. 
The input variables that are provided as environment variables are dependent on the resource that is operated on. For memory add and remove operations, the variables provided in the following table are provided (one frame is equal to 4 KB): 

Table 3. Input variables for memory add/remove operations Input variable Description 

DR_FREE_FRAMES=0xFFFFFFFF 	The number of free frames currently in the system, in hexadecimal format. 
DR_MEM_SIZE_COMPLETED=n 	The number of megabytes that were successfully added or removed, in decimal format. 
DR_MEM_SIZE_REQUEST=n 		The size of the memory request in megabytes, in decimal format. 
DR_PINNABLE_FRAMES=0xFFFFFFFF  	The total number of pinnable frames currently in the system, in hexadecimal format. 
				This parameter provides valuable information when removing memory in that it can be used 
				to determine when the system is approaching the limit of pinnable memory, 
				which is the primary cause of failure for memory remove requests. 
DR_TOTAL_FRAMES=0xFFFFFFFF 	The total number of frames currently in the system, in hexadecimal format. 

The environment variables provided in the following table are set for processor add and remove operations:


Table 4. Input variables for processor add/remove operations Input Variable Description 

DR_BCPUID=N		The bind CPU ID of the processor that is being added or removed in decimal format. 
			A bindprocessor attachment to this processor does not necessarily mean that the attachment 
			has to be undone. This is only true if it is the Nth processor in the system, because 
			the Nth processor position is the one that is always removed in a CPU remove operation. 
			Bind IDs are consecutive in nature, ranging from 0 to N and are intended to identify only 
			online processors. Use the bindprocessor command to determine the number of online CPUs. 
DR_LCPUID=N 		The logical CPU ID of the processor that is being added or removed in decimal format. 

In the following example, an example Korn shell script in given that can be installed. For simplicity and demonstration 
purposes this script does not take any action. The actions for the process to control would need to be included 
in the appropriate command section: 

#!/usr/bin/ksh

if [[ $# -eq 0 ]]
then
echo "DR_ERROR= Script usage error"
exit 1
fi

ret_code=0
command=$1
case $command in
scriptinfo )
echo "DR_VERSION=1.0"
echo "DR_DATE=19092002"
echo "DR_SCRIPTINFO=DLPAR test script"
echo "DR_VENDOR=IBM";;
usage )
echo "DR_USAGE=root_dlpar_test.sh command [parameter]";;
register )
echo "DR_RESOURCE=cpu";;
checkacquire )
:;;
preacquire )
:;;
undopreaquire )
:;;
postacquire )
:;;
checkrelease )
:;;
prerelease )
:;;
undoprerelease )
:;;
postrelease )
:;;
* )
ret_code=10;;
esac

exit $ret_code
 
 


=======================================
60. SOME NOTES ON VIRTUALIZATION HP-UX:
=======================================


60.1 General information:
-------------------------

HP has had nPar hard partitions in the HP 9000 midrange and Superdome computers since the September 2000 launch 
of the Superdomes. These servers are based on a four-way cell board, and each cell board can be logically 
and electronically isolated from the others in the system, have its own HP-UX operating system installed on it, 
and function like a free-standing Unix server. In August 2001, HP announced vPar virtual partitions, 
which it rolled out first with the Superdomes and then cascaded down the HP 9000 server line. 
The Itanium-based Integrity server line has had static partitions for HP-UX and Windows operating systems 
at the high-end, and has supported HP-UX, Linux, and Windows at the low end. Only two weeks ago, HP announced 
that Linux was available on eight-way partitions on the 16-way and 64-way variants of the Integrity Superdome boxes 
through eight-way nPars. (Linux was not supported on the Superdomes until then.) 


1. nPar allows physical partioning of server 
2. vPar allows logical partioning of server. 

In both the above cases one server box can be devided in multiple servers, thus allowing consolidation. 

Each npar or vpar is a separate machine. You can transfer CPUs between vpars on the fly, but in a serious hardware 
failure you can lose all vpars. npar is more solid than vpar but you cannot transfer CPUs on the fly, it needs reboot 
and you can transfer only cell boards, I mean single CPU cannot be transfered to another npar. 

nPar is Node Partition. 
Basically distributing the IO ,CPU , Memory , and creating a virtule node within a single box. 
Superdome , V-Class , RP84XX 86XX , are nPar capable. 

v-Par : Virtual Partition. 
With Virtual Partitions (vPars) you can take almost any HP 9000 server and turn it into many "virtual" computers. 
These virtual computers can each be running their own instance of HP-UX and associated applications. 
The virtual computers are isolated from one another at the software level. Software running on one Virtual Partition 
will not affect software running in any other Virtual Partition. In the Virtual Partitions you can run different 
revisions of HP-UX, different patch levels of HP-UX, different applications, or any software you want and not affect 
other partitions. 


- Virtual Partitions versus Hard Partitions

A hard partition is a physical partition of a computer that divides the computer into groups of cell boards 
where each group operates independently of other groups. A hard partition can run a single instance of HP-UX 
or be further divided into virtual partitions.

A virtual partition is a software partition of a computer or hard partition where each virtual partition 
contains an instance of HP-UX. Though a hard partition can contain multiple virtual partitions, the inverse is 
not true. A virtual partition cannot span a hard partition boundary.
 

60.2 Bootsequence of vPar:
--------------------------

Boot Sequence: Quick Reference 

-- On a computer without vPars, a simplified boot sequence is:

1. ISL
   (Initial System Loader)
 
2. hpux
   (secondary system loader)
 
3. /stand/vmunix
   (kernel)
 

-- Adding vPars adds the monitor layer, so now hpux loads the monitor and then the monitor boots the kernels 
   of the virtual partitions. The boot sequence becomes

1. ISL
 
2. hpux
 
3. /stand/vpmon
  (vPars monitor and partition database) 
4. /stand/vmunix
  (kernels of the virtual partitions)
 

With or without vPars, the firmware loads and launches ISL.

ISL>

In a computer without vPars, at the ISL prompt, the secondary system loader hpux loads the kernel /stand/vmunix:

ISL> hpux /stand/vmunix

However, in a computer with vPars, at the ISL prompt, the secondary system loader hpux loads the 
vPars monitor /stand/vpmon:

ISL> hpux /stand/vpmon

The monitor loads the partition database (the default is /stand/vpdb) and internally creates (but does not boot) 
each virtual partition according to the resource assignments in the database.

Next, the vPars monitor runs in interactive mode (when no options to /stand/vpmon are given) with a 
command line interface.

MON>

To boot a kernel in a virtual partition (that is, to launch a virtual partition), use the monitor command 
vparload. For example, to launch the virtual partition named szilva1:

MON> vparload -p szilva1

In this example, the vPars monitor would load the virtual partition szilva1 and launch the kernel from the 
boot device specified for szilva1. (The boot device is assigned when the virtual partition is created and is 
recorded in the monitor database.)

HP-UX is now booted on the virtual partition szilva1.

Once a partition is running, you will be at the virtual console of a partition. Subsequent virtual partitions 
can be booted using the vPars command vparboot at the UNIX shell prompt of szilva1.




61. Alternate disk install AIX:
===============================

Its possible to install AIX onto another disk on the same system. This is not partitioning,
its just a second install of the BOS, on another disk.

You need to have 

"bos.alt_disk_install.rte" fileset installed. This fileset ships the "alt_disk_install" command, 
which allows cloning of the rootvg and installing an AIX mksysb to an alternate disk.

"bos.alt_disk_install.boot_images" fileset installed. This fileset shipts the boot images,
which is required to install mksysb images to an alternate disk.

Once you have installed these filesets, the alternate disk installation functions are available
to you. 

You can use the "smitty alt_install" or "smitty alt_clone" or "smitty alt_mksysb" fastpath:

# smitty alt_install

-----------------------------------------------

               Alternate Disk Installation

Move cursor to desired item and press Enter.

  Install mksysb on an Alternate Disk
  Clone the rootvg to an Alternate Disk

F1=Help  F2=Refresh   etc..
-----------------------------------------------

So, the Alternate Disk Installation can be used in one of two ways:
- Cloning the current rootvg to an alternate disk.
- Installing a mksysb image on another disk.



# smitty alt_mksysb

-----------------------------------------------
         Install mksysb on an Alternate Disk

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

 Target Disk(s) to install          []
 Device or image name               []
 Phase to execute                    all
 image.data file                    []
 Customization script               []
 Set bootlist to boot from this disk
 on next reboot?                     yes
 Reboot when complete                no
 Verbose output?                     no
 Debug output?                       no
 resolv.conv file                   []

-----------------------------------------------


You can also use the "alt_disk_install" command to clone the rootvg to another disk.
The command creates an "altinst_rootvg" volumegroup on the destination disk and prepares
the same logical volumes as in the rootvg, except the names are prepended with "alt_",
for example, alt_hd1. Similar are the filesystems renamed to "/alt_inst/filesystemname"
and the original data (mksysb or rootvg) is copied.

After this first fase, a second fase begins where an optional configuration action 
can be performed, either a custom script or update of software, when cloning rootvg.

The third fase unmounts the /alt_inst/filesystems and renames the filesystems and logical volumes
by removing the alt names. Then the bootlist is altered to boot from the new disk.
After the system is rebooted, the original rootvg is renamed to old_rootvg.

Example:

# lspv
hdisk0      00fa7377474    rootvg
hdisk1      00hdgfh6374    None


# alt_disk_install -BC hdisk1


performs cloning hdisk0 to hdisk1 where hdisk1 will be the new rootvg.


Installing a second AIX52 partition using alt_disk_install:
-----------------------------------------------------------

You can use the alt_disk_install command to clone a system image to another disk, and you may use 
the -O option to remove references in the object data manager (ODM) and device (/dev) entries 
to the existing system. The -O flag tells the alt_disk_install command to call the devreset command, 
which resets the device database. The cloned disk can now be booted as if it were a new system.

An example of this scenario is as follows:

Boot the managed system as a Full System Partition so you have access to all the disks in the managed system. 
Configure the system and install the necessary applications. 
Run the alt_disk_install command to begin cloning the rootvg on hdisk0 to hdisk1, as follows: 

# /usr/sbin/alt_disk_install -O -B -C hdisk1


The cloned disk (hdisk1) will be named altinst_rootvg by default. 
Rename the cloned disk (hdisk1) to alt1, so you can repeat the operation with another disk. 
# /usr/sbin/alt_disk_install -v alt1 hdisk1

Run the alt_disk_install command again to clone to another disk and rename the cloned disk, as follows: 
# /usr/sbin/alt_disk_install -O -B -C hdisk2
# /usr/sbin/alt_disk_install -v alt2 hdisk2

Repeat steps 3 through 5 for all of the disks that you want to clone. 
Use the HMC to partition the managed system with the newly cloned disks. 
Each partition you create will now have a rootvg with a boot image. 
Boot the partition into SMS mode. Use the SMS MultiBoot menu to configure the 
first boot device to be the newly installed disk. Exit the SMS menus and boot the system. 


62. IBM LPAR FAQ:
=================

Logical partitioning
Frequently asked questions
   
  
DLPAR 

 
 What is required to enable dynamic capable LPARs? 
 Does the upgrade of the HMC or Platform Firmware affect my AIX 5.1 partitions? 
 What is the order for AIX, HMC, and Platform Hardware updates? 
 Where would I find latest versions or upgrades for: AIX or HMC or Platform Firmware? 
 Can dynamic and non-dynamic LPARs co-exist on the same pSeries? 
 Is Linux DLPAR capable? 
 Do all DLPAR operations have to be done through the HMC GUI? 
 What conditions may impede DLPAR operations? 
 Are there special rules for DLPAR operations? 
 How much time does it take for a DLPAR operation to complete? 
 How is the "detail level" option in the HMC used? 
 How is the timeout value for DLPAR operations used by the HMC? 
 With a timeout limit of zero, how can I stop a command that may not complete because the DLPAR command will not succeed? 
 If we do dynamic configuration, what will happen to the process pinned or accessing direct memory? 
 Are there special AIX filesets or PTF levels required for DLPAR? 
 Are applications affected by DLPAR operations? 
 What is a "DLPAR aware" application? 
 What is the relationship between DLPAR and Capacity Upgrade on Demand (CUoD)? 
 How does Dynamic Processor Deallocation work with Dynamic Processor Sparing? 
 How does affinity partitioning relate to DLPAR? 
 Are there any examples of using the HMC command line to automate DLPAR? 
 

Question: What is required to enable dynamic capable LPARs?

Answer: An upgrade of AIX, HMC, and Platform Firmware is required. The required levels are as follows:

AIX: 5.2
HMC: Release 3, Version 1.0
Platform Firmware: 10/2002 system firmware or later.

To determine platform firmware level, on any AIX partition type: 
lscfg -vp | grep -p Platform. 
The last 6 digits of the ROM Level represent the Platform Firmware date in the format: "YYMMDD". 


Question: Does the upgrade of the HMC or Platform Firmware affect my AIX 5.1 partitions?

Answer: The upgrade of Platform Firmware on some 5.1 systems may cause some systems difficulty in reboot. 
Thus, users are encouraged to apply APAR IY31961 on their AIX 5.1 partitions before upgrading Platform Firmware.


Question: What is the order for AIX, HMC, and Platform Hardware updates?

Answer: The recommended order is:
1. Install APAR IY31961 on AIX 5.1 partitions, if needed.
2. Upgrade the HMC to version 3.1.
3. Upgrade the Platform Firmware to 10/2002 or later.
4. For 5.2 partitions, perform AIX migration (from 5.1) or install.


Question: Where would I find latest versions or upgrades for: AIX or HMC or Platform Firmware?

Answer: Users should visit the software support sites:
AIX: techsupport.services.ibm.com/server/support?view=pSeries
HMC: techsupport.services.ibm.com/server/hmc
Users should consult their IBM Customer Engineers regarding latest Platform Firmware availability.


Question: Can dynamic and non-dynamic LPARs co-exist on the same pSeries?

Answer: Yes. The HMC GUI will not display Dynamic LPAR menus for partitions that are not DLPAR enabled.
  

Question: Is Linux DLPAR capable?

Answer: Yes. Linux distro's that use the Linux 2.6 Kernel or higher have the capability of supporting DLPAR on POWER5 systems. Currently both Novell/SUSE Linux for Power and RedHat Linux for Power Distro both support DLAR capabilities.

 

Question: Do all DLPAR operations have to be done through the HMC GUI?

Answer: While it is recommended that users use the HMC GUI for dynamic resource re-allocation, it is possible for a user or script to execute commands on the HMC command line to perform dynamic resource operations on a dynamic capable partition.


Question: What conditions may impede DLPAR operations?

Answer: There may be cases where the resource that users wish to deallocate are not available because they are in use by the operating system or applications. In those cases, the operation may not complete until these resources are freed. Dynamic LPAR operations are also constrained by the resource specifications in the active LPAR profile, such as maximum/minimum processors or memory, or required I/O slots.


Question: Are there special rules for DLPAR operations?

Answer: Dynamic operations with processors and memory typically require no special actions. However, the movement of "slots" does require special handling. When the user is moving a "slot", they are attempting to reallocate a resource that is attached to an adapter that is inserted in a slot. An example of this might be a CDROM drive or ethernet adapter that is used by one DLPAR partition that the administrator would like moved to another DLPAR partition. For cases involving slots, the user should:

deconfigure the child device connected to the parent adapter. 
use the SMIT PCI Hot Plug procedures to remove the adapter (but don't physically remove the card). 
use the HMC GUI to move the slot from one Dynamic-capable partition to another. 
after the movement of the slot, re-enable the adapter via the "Hot-Plug" process and reconfigure the parent adapter and then the child device. 


Question: How much time does it take for a DLPAR operation to complete?

Answer: In general, on a non-loaded system, a single processor move can take less than a minute. Memory moves may take a few more minutes than a processor move.

Question: How is the "detail level" option in the HMC used?

Answer: This sets the various level of debug output displayed during DLPAR operations. Additionally, this allows the user to see all the steps that AIX performed in the DLPAR operation providing tracing/logging information for debug and problem determination.


Question: How is the timeout value for DLPAR operations used by the HMC?

Answer: The user can set a time limit (in minutes) setting so that the DLPAR operation request will be canceled if the pre-set time limit is exceeded. An example is a situation requiring memory moves. When the memory cannot be re-allocated because resource memory is pinned to the physical memory, sometimes certain operations will take a very long time to complete. A time limit in this case may be used to limit the amount of retries that take place. A time limit of zero implies that there is no time limit.


Question: With a timeout limit of zero, how can I stop a command that may not complete because the DLPAR command will not succeed?

Answer: although a user may set the timeout limit to zero, HMC and AIX each have a set of default behaviors that will ensure a DLPAR command, that will eventually fail, will return with the appropriate error message.


Question: If we do dynamic configuration, what will happen to the process pinned or accessing direct memory?

Answer: Nothing. If a process has pinned memory, the virtual memory manager transparently migrates the data to a new pinned physical page and atomically updates the virtual to real page mappings to point to the new physical page.


Question: Are there special AIX filesets or PTF levels required for DLPAR?

Answer: The installation of AIX 5.2 is adequate for current pSeries LPARs to perform dynamic operations.


Question: Are applications affected by DLPAR operations?

Answer: A large majority of applications should be DLPAR unaware, which means they are not programmed to take advantage of DLPAR capabilities from within the application. Thus, they should not be affected by DLPAR. Only programs considered "DLPAR aware" might be affected by DLPAR actions.


Question: What is a "DLPAR aware" application?

Answer: A DLPAR aware application cares about the resource levels allocated to the partition and can alter its behavior based on changes in the resource levels. AIX provides APIs for applications that wish to be DLPAR aware.


Question: What is the relationship between DLPAR and Capacity Upgrade on Demand (CUoD)?

Answer: DLPAR can be used to bring online a resource that has been activated through CUoD.


Question: How does Dynamic Processor Deallocation work with Dynamic Processor Sparing?

Answer: If spare (unlicensed CUoD) processors are available, the partition should be able to assign and bring online these processors before it deactivates a failing processor.

 
Question: How does affinity partitioning relate to DLPAR?

Answer: Users can perform DLPAR operations on I/O slots with affinity partitions, but not with processor or memory resources.


Question: Are there any examples of using the HMC command line to automate DLPAR?

Answer: The DLPAR toolset avaliable on alphaworks provides tools that automate DLPAR operations using the HMC command line.


63. bosinst.data file:
======================

AIX only.

The bosinst.data file is an ascii file which controls the installation of AIX.
I can function as a sort of a "response file" in an unattended install.

If you are customizing the /bosinst.data file in order to have it become part of a system backup (mksysb), 
please note that starting with AIX Version 4.3.3, the mksysb command always updates the target_disk_data stanzas 
to reflect the current disks in the rootvg. If you do not want this update to occur you must create the file 
/save_bosinst.data_file. The existance of this file is checked by the mksysb command, before the 
target_disk_data stanzas are updated.

If you are editing the bosinst.data file, use one of the following procedures:


1. Create and Use a Backup Tape:
--------------------------------

Customize the bosinst.data file:

Change your directory, with the cd command, to the /var/adm/ras directory.

Copy the /var/adm/ras/bosinst.data file to a new name, such as bosinst.data.orig. 
This step preserves the original bosinst.data file.

Edit the bosinst.data file with an ASCII editor.

Verify the contents of the edited bosinst.data file using the bicheck command:

/usr/lpp/bosinst/bicheck filename

Copy the edited file to the root directory:

cp /var/adm/ras/bosinst.data /bosinst.data

If you do not want the target_disk_data file updated to reflect the current rootvg, 
create the file /save_bosinst.date_file by using the following command:

touch /save_bosinst.data_file

Create a backup image of the system:

Back up the system, using one of the following: the Web-based System Manager Backups application, 
the System Management Interface Tool (SMIT), or mksysb command. 

BOS installations from this backup will behave according to your customized bosinst.data file.


2. Create and Use a Client File:
--------------------------------

Create one customized bosinst.data file for each client and, using the Network Installation Manager (NIM), 
define the files as resources. Refer to AIX Version 4.3 Network Installation Management Guide and Reference 
for more information about how to use the bosinst.data file as a resource in network installations.


3. Create and Use a Supplementary Diskette:
-------------------------------------------

This procedure describes how to create the supplementary diskette and use it in future installations:

Customize the bosinst.data file:

Change your directory, with the cd command, to the /var/adm/ras directory.

Copy the /var/adm/ras/bosinst.data file to a new name, such as bosinst.data.orig. 
This step preserves the original bosinst.data file.

Edit the bosinst.data file with an ASCII editor.

Create an ASCII file consisting of one word:

data

Save the new ASCII file, naming it signature.

Create the diskette and use it for installation:

Back up the edited bosinst.data file and the new signature file to diskette with the following command:

ls ./bosinst.data ./signature | backup -iqv

OR

If you create a bundle file named mybundle, back up the edited bosinst.data file, 
the new signature file, and the bundle file to diskette with the following command:

ls ./bosinst.data ./signature ./mybundle | backup -iqv

Put the diskette in the diskette drive of the target machine you are installing.

Boot the target machine from the install media (tape, CD-ROM, or network) and install AIX.

The BOS installation program will use the diskette file, rather than the default bosinst.data file 
shipped with the installation media.

Example bosinst.data file:
--------------------------

The following is an example of a modified bosinst.data file that might be used in a nonprompted network installation:

control_flow:
   CONSOLE = Default
   INSTALL_METHOD = overwrite
   PROMPT = no
   EXISTING_SYSTEM_OVERWRITE = yes
   RUN_STARTUP = no
   RM_INST_ROOTS = yes
   ERROR_EXIT = 
   CUSTOMIZATION_FILE = 
   TCB = no
   BUNDLES = 
   RECOVER_DEVICES = Default
   BOSINST_DEBUG = no
   ACCEPT_LICENSES = yes
   INSTALL_CONFIGURATION = 
   DESKTOP = CDE
	INSTALL_DEVICES_AND_UPDATES = yes    
	IMPORT_USER_VGS = yes                
	ENABLE_64BIT_KERNEL = yes             
	CREATE_JFS2_FS = yes                  
	ALL_DEVICES_KERNELS = yes            
	GRAPHICS_BUNDLE = no                 
	DOC_SERVICES_BUNDLE = no             
	NETSCAPE_BUNDLE = yes                
	HTTP_SERVER_BUNDLE = yes             
	KERBEROS_5_BUNDLE = yes              
	SERVER_BUNDLE = yes                  
	ALT_DISK_INSTALL_BUNDLE = yes        
	REMOVE_JAVA_118 = no                 

target_disk_data:
   PVID = 
   CONNECTION = 
   LOCATION =
   SIZE_MB =
   HDISKNAME = hdisk0

locale:
   BOSINST_LANG = en_US
   CULTURAL_CONVENTION = en_US
   MESSAGES = en_US
   KEYBOARD = en_US



64. NIM:
========

64.1 Some notes about NIM:
==========================

=======
Note 1:
=======

http://freeunixtips.com/2009/02/create-aix-nim-master/

NIM is better in that you can install a new OS from a mksysb in as little as 5 minutes versus installing 
from CD which can take 40 minutes or longer. You can also keep your builds consistent if you have 
one mksysb image to build an AIX server. This also makes server builds easier at remote locations, 
no more swapping CD’s.

The following shows a quick way to build a NIM master that you can use to install the AIX OS. 
This allows for more customization than using eznim if you use “Configure a Basic NIM Environment (Easy Startup).”

You can use the following instructions to build a NIM master from 4.3 on. Keep in mind that the newer 
the AIX OS, the better and more NIM functions are introduced. I did not like NIM previous to AIX 5.2.

Install the AIX OS on release that you want to the NIM master to serve (i.e., 5.1, 5.2, 5.3 or 6.1). 
If you want it to be AIX 6.1 TL 02 SP2, you will need to patch the NIM master to that level after installing the base OS.

Once you are satisified with the hostname, IP, network interface, interface speed/setup 
(etherchannel, trunked, etc.), you can setup the NIM master. You can make changes to the network configuration 
later in NIM however, it’s easier if you take care of any configuration before making the server a NIM master.

I would also make /tftpboot a seperate filesystem. This can get quite large depending on how many clients you are building.

Install the following filesets if they are not installed already:

bos.sysmgt.nim.master
bos.sysmgt.nim.spot
bos.sysmgt.nim.client
bos.net.nfs.server

smitty nim / Configure the NIM Environment / Advanced Configuration / Initialize the NIM Master Only

The following 2 fields are all that is required, everything else can be left as default:

* Network Name []
* Primary Network Install Interface [] +

NOTE: “Network Name” can be anything you chose
“Primary Network Install Interface” is the en# of the network you get to the NIM master on. 
You can setup other networks later when you define NIM clients on networks that are different than the NIM master’s.

Once you select “enter” you should see the following indicating that you now have a NIM master:

0513-071 The nimesis Subsystem has been added.
0513-071 The nimd Subsystem has been added.
0513-059 The nimesis Subsystem has been started. Subsystem PID is 282780.

Now you need to create the lpp_source and spot resources so the NIM master can build an OS:

Create a filesystem that you will use to lay down your base lpp_source and spot directories.  
These can be large since each lpp_source is usually 4G or larger and each spot is usually around 700 MB.  
I use /export/nim as a filesystem.

You can use “smitty bffcreate” or copy the installp/ppc directory from the install CD/DVD’s 
to /export/nim/6100/lpp_source

I get rid of any filesets that is not en_us or EN_US since I don’t need other languages, 
this will save 1 to 1.5 GB of disk space.

Copy the patch set you want to install to /export/nim/6100/lpp_source
If you have the base install CD of 6100-01 and want to patch to 6100-02-02 you would get the patch set 
for 6100-02-02 and copy or ftp them to /export/nim/6100/lpp_source/installp/ppc

Get rid of any unnecessary filesets by using lppmgr:

# /usr/lib/instl/lppmgr -d /<installp/ppc_directory>  -u -b -x -r

>> this will remove
-u = removes duplicate update images
-b = removes duplicate base images
-x = removes superceeded updates
-r = actually does the removal. If the ‘-r’ flag is not included, the lppmgr command will run in “preview mode”.

>> Create your lpp_source on the NIM master:
smitty nim / Perform NIM Administration Tasks / Manage Resources / Define a Resource / lpp_source = source device for optional product images

The following fields are required as a minimum:
* Resource Name [TL02-SP2]
* Resource Type lpp_source
* Server of Resource [master] +
* Location of Resource [/export/nim/6100/lpp_source] /

This should only take a few minutes to complete, once you see “ok”

>> Create your spot resource on the NIM master:
smitty nim / Perform NIM Administration Tasks / Manage Resources / Define a Resource / spot = Shared Product Object Tree – equivalent to /usr file

The following fields are required as a minimum:
* Resource Name [TL02-SP2]
* Resource Type spot
* Server of Resource [master] +
* Source of Install Images [TL02-SP2] +
* Location of Resource [/export/nim/6100] /

For “Source of Install Images” select F4 and chose the lpp_source you created in the previous step.
This can take from 15-30 minutes to complete, once you see “ok”

You now have a NIM master that you can use to build out the OS on AIX server.

If you have a mksysb of a server you want to use, copy it to the server, create a mksysb NIM resource. 


=======
Note 2:
=======

The SPOT (Shared Product Object Tree) is created from the "lpp_source".

=======
Note 3:
=======


AIX only.

Network Installation Management, or NIM, means that from a Server, via the network, clients can be
installed with AIX and possibly other software.

With NIM, you can have unattended installation of clients. The NIM Server also provides you with
the backup images of all your Servers (the NIM clients).

NIM objects:
------------
This topic explains the objects concept as it is used in the NIM environment.
The machines you want to manage in the NIM environment, their resources, and the networks through 
which the machines communicate are all represented as objects within a central database that resides 
on the master. Network objects and their attributes reflect the physical characteristics 
of the network environment. This information does not affect the running of a physical network 
but is used internally by NIM for configuration information. 
Each object in the NIM environment has a unique name that you specify when the object is defined. 
The NIM name is independent of any of the physical characteristics of the object it identifies 
and is only used for NIM operations. The benefit of unique names is that an operation can be performed 
using the NIM name without having to specify which physical attribute should be used. 
NIM determines which object attributes to use. For example, to easily identify NIM clients, 
the host name of the system can be used as the NIM object name, but these names are independent 
of each other. When an operation is performed on a machine, the NIM name is used, and all other data 
for the machine (including the host name) is retrieved from the NIM database. 

NIM machines:
-------------
The types of machines that can be managed in the NIM environment are standalone, diskless, 
and dataless clients. This section describes the differences between the machines, the attributes required 
to define the machines, and the operations that can be performed on them. 

The NIM environment is composed of two basic machine roles: master and client. The NIM master manages 
the installation of the rest of the machines in the NIM environment. The master is the only machine 
that can remotely run NIM commands on the clients. All other machines participating in the NIM environment 
are clients to the master, including machines that may also serve resources.


NIM Resources (from somewhat older source):
--------------
NIM allows you to customize installations and maintain clients on the network from a centralized location 
(the NIM master) or the NIM client itself. The master contains the NIM database and can serve resources. 
Resources in NIM are files or directories containing data that NIM will use to install, customize, 
and maintain NIM clients. A NIM client is any machine configured and defined in the NIM database. 
Some key NIM resources used in our setup are:

- Licensed Program Product Source Directory (lpp_source): This directory contains backup file format 
(BFF) images, which AIX installp uses to load software. One way to understand the role of the 
lpp_source directory in a BOS installation is to compare it to all the installation images needed 
to support any configuration (specifically different device configurations) along with a base core set 
of software (called simages) that are on the BASE installation CDs. We created a base 433 lpp_source, 
multiple lpp_sources containing different maintenance levels, and separate lpp_sources for our 32-bit 
and 64-bit third-party application software.

- Shared Product Object Tree (SPOT): This directory is created from an lpp_source and is equivalent 
in content to a /usr file-system on AIX. The purpose of a SPOT in a NIM installation is similar to the 
boot images and BOS installation scripts (bi_main, rc.boot, and rc.bosinst) on volume 1 of the 
BASE install CD. The SPOT must contain support for all boot environments (platform, network type, kernel type). 
We created several different SPOTs for the different data centers and maintenance levels we use to support our systems.

- bosinst_data: This data file contains information that drives the BOS install 
(e.g., prompt vs. no-prompt, which disk to install the OS on, and the type of installation 
(Overwrite, Preservation, or Migration) to name a few). First, we created separate bosinst_data resources 
for each machine type (S80, H70, B50, M80, P680, and 43P). Then, by specifying two disks to target 
in our bosinst_data resource and specifying copies in the image_data resource, we could set up 
mirroring during the initial load.

- image_data: This data file contains information about the characteristics of the OS being installed. 
For example, it includes the size of file systems, whether or not to mirror, and whether or not to 
disk stripe. We created separate image_data resources for each machine type (S80, H70, B50, M80, P680 and 43P).

- Installp_bundles: This data file contains a customized list of additional software to install 
after the base AIX software is loaded. If you have different configurations that you need to duplicate 
on a repeatable basis, this resource is very useful. In our environment, we have different OS software 
requirements for development, QA, and production above the minimal AIX software needed to support 
different hardware systems. The easiest way to facilitate and maintain these different requirements, 
which need to be consistent, is to use installp_bundles.

- mksysb: This is a backup archive file that contains a system image of rootvg. 
Because of our network security restrictions (no one machine could be connected to all the networks 
within our organization), we used mksysb and savevg tapes to replicate the NIM master to the other data centers. 
If we had one machine connected to the different data centers, we could have used NIM to replicate 
and update the NIM masters in the different data centers by BOS-installing a NIM mksysb resource and 
using a NIM script to restore the other volume group data.

- mac_group: This is a logical grouping of machine types (standalone, diskless, or dataless) 
that enables the systems administrator to target one or more machines with a single command or 
NIM operation. We did not use this feature, but we could have taken advantage of this by grouping all like 
systems and like configurations to install to more than one machine at a time.


We used the 43P systems as our NIM masters for each data center because they could complete remote 
installations of machines or be moved and directly connected to a server for OS installations. 
These NIM masters were also designated as the resource servers in our environment. 
To ensure consistency and standardization of each NIM master (for the different data centers), 
we created a standard NIM master machine, which we cloned. We made a stacked tape containing a mksysb image 
and a savevg image of the standard NIM master to sync up and update the other NIM masters. 
Here are the commands we ran on the standard NIM master to create this stacked single tape:


# mksysb -i /dev/rmt0 
# tctl -f/dev/rmt0.1 fsf4 
# savevg -i -m {volume_group_name} -f/dev/rmt0.1 
# mt -f/dev/rmt0 rewind 

To restore the tape to the other NIM masters, we did the following:

Booted and restored the mksysb image from the stacked tape 
# tctl -f/dev/rmt0.1 fsf4 
# restvg volume_group_name 


Setup NIM:
----------

Needed Filesets:

You should have the following installed on your master

# lslpp -l | grep bos.sysmgt.nim

bos.sysmgt.nim.client  5.1.0.25  COMMITTED Network Install Manager
bos.sysmgt.nim.master  5.1.0.25  COMMITTED Network Install Manager
bos.sysmgt.nim.spot    5.1.0.25  COMMITTED Network Install Manager

These are available on the AIX Product CD 1.

If you need to install the NIM client, master and spot filesets

# installp -qaX -d /dev/cd0 bos.sysmgt.nim.master bos.sysmgt.nim.client bos.sysmgt.nim.spot 

At the end of the install you should see the below

Installation Summary
Name Level Part Event Result

bos.sysmgt.nim.client 5.3.0.0 USR APPLY SUCCESS
bos.sysmgt.nim.spot 5.3.0.0 USR APPLY SUCCESS
bos.sysmgt.nim.master 5.3.0.0 USR APPLY SUCCESS
bos.sysmgt.nim.client 5.3.0.0 ROOT APPLY SUCCESS

To install NIM:

You can use the fast path

# smitty nim_config_env

to setup the basic NIM environment for the first time. It needs a minimum of two pieces of information.
- Input device for installation images
- Primary network interface

Default values are provided for the remaining options. Once this smitty panel has been completed successfully,
the following actions will have been completed:
. NIM master initialized on the primary interface
. NIM daemons running
. lpp_source created and available
. SPOT resource created and available (Shared Product Object Tree)

# smitty nim_config_env

                 Configure a Basic NIM Environment (Easy Startup)

     Initialize the NIM Master:
     * Primary Network Interface for the NIM Master            []
     Basic Installation Resources:
     * Input device for installation images                    []
     * LPP_SOURCE Name                                         [lpp_source]
     * LPP_SOURCE Directory                                    [/export/lpp_source]
       Create new filesystem for LPP_SOURCE?                   [yes]
       Filesystem SIZE (MB)                                    [650]
       VOLUME GROUP for new filesystem                         [rootvg]
     * SPOT Name                                               [spot1]
     * SPOT Directory                                          [/export/spot]
       Create new filesystem for SPOT?                         [yes]
       Filesystem SIZE (MB)                                    [350]                   
       VOLUME GROUP for new filesystem                         [rootvg]
     ..
     ..

  

EZNIM:
------

The "smit eznim" option installs the "bos.sysmgt.nim.master" fileset and configures the NIM environment.
The configuration involves creating the NIM database and populating it with several entries.
Several basic NIM resources will then be created and defined in the NIM database.

1. smitty eznim
2. Select "Configure as a NIM Master"
3. Select "Setup the NIM Master Environment"
4. Verify the default selections for software source, volume group etc..

To display the NIM resources that have been created, do the following:
use "smit eznim_master_panel" fast path, or select "Show the NIM environment".


The nim_master_setup command:
-----------------------------

The nim_master_setup command installs the bos.sysmgt.nim.master fileset, configures the NIM master, 
and creates the required resources for installation, including a mksysb system backup. 

The nim_master_setup command uses the rootvg volume group and creates an "/export/nim" file system, by default. 
You can change these defaults using the volume_group and file_system options. The nim_master_setup command 
also allows you to optionally not create a system backup, if you plan to use a mksysb image 
from another system. The nim_master_setup usage is as follows:

Usage nim_master_setup: Setup and configure NIM master.
      nim_master_setup [-a mk_resource={yes|no}]
	[-a file_system=fs_name]
	[-a volume_group=vg_name]
	[-a disk=disk_name]
	[-a device=device]
	[-B] [-v]

	-B    Do not create mksysb resource.
	-v    Enable debug output.

	Default values:
	mk_resource = yes
	file_system = /export/nim
	volume_group = rootvg
	device = /dev/cd0

To install the NIM master fileset and initialize the NIM environment using install media located 
in device /dev/cd1, type: 
# nim_master_setup -a device=/dev/cd1

To initialize the NIM environment without creating NIM install resources, type: 
# nim_master_setup -a mk_resource=no

To initialize the NIM environment, create NIM install resources without creating a backup image, 
using install media located under mount point /cdrom, type: 
# nim_master_setup -a device=/cdrom -B

To define NIM resources in an existing NIM environment, using install media located in device /dev/cd0, 
and create a new file system named /export/resources/NIM under volume group nimvg, type: 
# nim_master_setup -a volume_group=nimvg -a file_system=/export/resources/NIM


The nim_clients_setup command:
------------------------------

The nim_clients_setup command is used to define your NIM clients, allocate the installation resources, 
and initiate a NIM BOS installation on the clients.

The nim_clients_setup command uses the definitions in the basic_res_grp resource to allocate 
the necessary NIM resources to perform a mksysb restore operation on the selected clients. 
The usage for nim_clients_setup is as follows: 

Usage nim_clients_setup: Setup and Initialize BOS install for NIM clients.
       nim_clients_setup [-m mksysb_resource]
	[-c] [-r] [-v] client_objects
-m    specify mksysb resource object name -OR- absolute file path.
-c    define client objects from client.defs file.
-r    reboot client objects for BOS install.
-v    Enables debug output.

Note: If no client object names are given, all clients in the NIM environment are enabled for 
BOS installation; unless clients are defined using the -c option. 

Examples:
To define client objects from /export/nim/client.defs file, initialize the newly defined clients 
for BOS install using resources from the basic_res_grp resource group, and reboot the clients to begin install, type: 
# nim_clients_setup -c -r

To initialize clients client1 and client2 for BOS install, using the backup file 
/export/resource/NIM/530mach.sysb as the restore image, type: 
# nim_clients_setup -m /export/resource/NIM/530mach.sysb \ client1 client2

To initialize all clients in the NIM environment for native (rte) BOS install using resources 
from the basic_res_grp resource group, type: 
# nim_clients_setup -n


How to define a standalone machine in NIM.

      nim -o define -t standalone \
                -a platform=chrp \
                -a if1="subnet-74 FQDN of Machine 0" \
                -a cable_type1=tp \
                -a net_settings1="speed duplex" \
                -a netboot_kernel="up or mp \
                name of resource

How to initiate an install of a machine from a mksysb image.

      nim -o bos_inst \
                -a source=mksysb \
                -a spot=aix520-01_spot \
                -a mksysb=base520-02-64bit_mksysb or base520-02-32bit_mksysb \
                -a accept_licenses=yes \
                -a preserve_res=yes \
                -a installp_flags="cNgXY" \
                -a fb_script=osg-mksysb-install_firstboot \
                name of resource

If you do not want the machine to be rebooted right now, then add the following:

     -a no_client_boot=yes

How to reset the NIM state of a machine.

      nim -o reset \
                name of resource
  
You can add the following to force a reset

                -a force=yes

If after you try to reset the state and try to install again and you are told that the resource is 
still allocated run the following: 

      nim -Fo deallocate \
	-a subclass=all 
	name of resource


How to take a mksysb of a machine.

        nim -o define -t mksysb \
	          -a server=master \
	          -a location=/export/nim/mksysb/<name of resource>.mksysb \
                  -a source=resource name of machine to take mksysb \
 	          -a mk_image=yes \
 	          -a mksysb_flags='e'\
	          -a exclude_files=osg-default_exclude \
	          name of resource 
    
How to make a NIM exclude file.

        nim -o define -t exclude_files \
                  -a server=master
	          -a location=/export/nim/misc/osg-default.exclude \
	          osg-default_exclude
    
How to define a script resource in NIM.

        nim -o define -t script \
	          -a server=master \
	          -a location=/export/nim/misc/<name of the resource>.sh \
	          name of resource
    
How to define a firstboot script

        nim -o define -t fb_script \
	          name of the mksysb
    
How to remove a NIM resource.

        nim -o remove \
	            -a rm_image=yes \ 
	          name of the mksysb
    
Note that this process doesremove the mksysb file on disk. 

Updating installed software
      nimclient -o cust \
                -a lpp_source=lpp source \
                -a installp_bundle=installp bundle


Remark about nimsh:
-------------------

Using the NIM service handler for client communication
NIM makes use of the remote shell server (rshd) when it performs remote execution on clients. The server provides 
remote execution facilities with authentication based on privileged port numbers from trusted hosts.

AIXr 5.3 uses NIM Service Handler (NIMSH) to eliminate the need for rsh services during NIM client communication. 
The NIM client daemon (NIMSH) uses reserved ports 3901 and 3902, and it installs as part of the 
bos.sysmgt.nim.client fileset.

NIMSH allows you to query network machines by hostname. NIMSH processes query requests and returns NIM client 
configuration parameters used for defining hosts within a NIM environment. Using NIMSH, you can define 
NIM clients without knowing any system or network-specific information.

While NIMSH eliminates the need for rsh, it does not provide trusted authentication based on key encryption. 
To use cryptographic authentication with NIMSH, you can configure OpenSSL in the NIM environment. 
When you install OpenSSL on a NIM clients, SSL socket connections are established during NIMSH 
service authentication. Enabling OpenSSL provides SSL key generation and includes all cipher suites 
supported in SSL version 3.



64.2 Complete NIM Example:
==========================

(This is actually a nice example).

1.Installing the NIM filesets<top>

The required filesets for a NIM master server and client

bos.sysmgt.nim.client
bos.sysmgt.nim.master
bos.sysmgt.nim.spot


These are available on the AIX Product CD 1.

Install the NIM client, master and spot filesets

# installp -qaX -d /dev/cd0 bos.sysmgt.nim.master bos.sysmgt.nim.client bos.sysmgt.nim.spot 

At the end of the install you should see the below

Installation Summary
--------------------
Name Level Part Event Result

bos.sysmgt.nim.client 5.3.0.0 USR APPLY SUCCESS
bos.sysmgt.nim.spot 5.3.0.0 USR APPLY SUCCESS
bos.sysmgt.nim.master 5.3.0.0 USR APPLY SUCCESS
bos.sysmgt.nim.client 5.3.0.0 ROOT APPLY SUCCESS


2.Create a tftpboot filesystem and mount it <top>

# crfs -v jfs2 -g rootvg -a size=381M -m /tftpboot -A yes -t rw
# mount /tftpboot


3.Configure the NIM environment (ensure you have AIX product CD 1 loaded in the CD or DVD Drive<top>

# smitty nim_config_env

Select the defaults as below, apart from the size of the /export/lpp_source and /export/spot filesystems. 
As we are going to be copying additional products into these areas we need a reasonable amount of space

You also need to specify the primary network interface and path to the CD or DVD drive

Initialize the NIM Master:
* Primary Network Interface for the NIM Master [en0] 
Basic Installation Resources:
* Input device for installation images [/dev/cd0] 
* LPP_SOURCE Name [lpp_source1]
* LPP_SOURCE Directory [/export/lpp_source] 
Create new filesystem for LPP_SOURCE? [yes] 
Filesystem SIZE (MB) [6553] 
VOLUME GROUP for new filesystem [rootvg] 
* SPOT Name [spot1]
* SPOT Directory [/export/spot] 
Create new filesystem for SPOT? [yes] 
Filesystem SIZE (MB) [2097] 
VOLUME GROUP for new filesystem [rootvg] 
Create Diskless/Dataless Machine Resources? [no] 
Specify Resource Name to Define:
ROOT (required for diskless and dataless) [root1]
DUMP (required for diskless and dataless) [dump1]
PAGING (required for diskless) [paging1]
HOME (optional) [home1]
SHARED_HOME (optional) [shared_home1]
TMP (optional) [tmp1]
Diskless/Dataless resource directory [/export/dd_resource]
Create new filesystem for resources? [yes] 
Filesystem SIZE (MB) [150] 
VOLUME GROUP for new filesystem [rootvg] 
Define NIM System Bundles? [yes] 
Add Machines from a Definition File? [no] 
Specify Filename []
* Remove all newly added NIM definitions [no] 
and filesystems if any part of this
operation fails?


4.Populating the lpp_source1 resource with additional software<top>

Copy the contents of AIX Volume 2,5, Expansion Pack and the AIX ToolBox to the lpp_source, for each CD 
enter the below

# nim -o update -a packages=all -a source=/dev/cd0 lpp_source1


5.Updating the SPOT and lpp_source1 resources<top>

If the AIX CD's you are using to create the lpp and spot resources is a base level AIX CD, and the clients 
you are intending to build are at a higher level than the base level. You will need to update the 
lpp and spot resources. 

Identify the location of your update filesets and update with the below command

# nim_update_all -l lpp_source1 -s spot1 -d /location/of/filesets -u -B

Once complete, confirm the maintenance level of the spot1 resource with the below command

# lsnim -l spot1

In this example, I have updated the lpp_source1 and spot1 to AIX 5.3 ML 3

spot1:
class = resources
type = spot
plat_defined = chrp
arch = power
bos_license = yes
Rstate = ready for use
prev_state = verification is being performed
location = /export/spot/spot1/usr
version = 5
release = 3
mod = 0
oslevel_r = 5300-01
alloc_count = 0
server = master
Rstate_result = success
mk_netboot = yes
mk_netboot = yes
mk_netboot = yes


6.Defining NIM machines <top>

Before you can start a BOS install task you need to define the machines you are going to install. 

You need details of

a.server hostname
b.platform
c.netboot_kernel
d.subnet mask
e.default gateway of the master
f.master name

To define a NIM client, for eg sp-tsm2

# nim -o define -t standalone -a platform=chrp \
-a netboot_kernel=mp \
-a if1="find_net sp-tsm2.caledonia.speedy.wan 0" \
-a net_definition="ent 255.255.255.0 10.110.72.1 master" sp-tsm2

If you are adding a machine that is already running, you need to ensure the bos.sysmgt.nim.client fileset 
is installed and issue the following command on the client

note: change the name= and master= to match the client and master you are adding

# niminit -a name=pr-testerp -a master=pr-tsm -a pif_name=en0


The output from the following command will show your newly defined machine

# lsnim -c machines

[sp-tsm1] scripts # lsnim -c machines
master machines master
sp-tsm2 machines standalone

To get detailed output of your newly created machine, run the below

[sp-tsm1] scripts # lsnim -l sp-tsm2
sp-tsm2:
class = machines
type = standalone
connect = shell
platform = chrp
netboot_kernel = mp
if1 = speedy_network sp-tsm2.caledonia.speedy.wan 0
cable_type1 = N/A
Cstate = ready for a NIM operation
prev_state = not running
Mstate = currently running
cpuid = 00C13E8A4C00
Cstate_result = success


7.Configuring client communications

To configure SSL client communication as opposed to the traditional and un-secure rhost method perform the following

a.On the master server and clients install the openssl rpm from the AIX toolbox

# rpm -ivh openssl-0.9.7g-1

b.Next configure the NIM master for SSL

# nimconfig -c 

c.Then on each client configure as below

# mv /etc/niminfo /etc/niminfo.bak
# niminit -aname=pr-testdb -amaster=pr-tsm -a connect=nimsh
# nimclient -C

d.On the NIM master test the nimsh communication

# nim -o lslpp pr-testdb


8.Defining NIM groups<top>

Once you have defined your machines, add them to add mac_group. This will aid administration for future 
installation tasks

To define a group containing the sp-tsm2 machine run the below command
# nim -o define -t mac_group -a add_member=sp-tsm2 speedy_mac_group

For each machine to be added, use the option and argument `-a add_member=<hostname>' where <hostname> is the name 
of the server you are adding


9.Defining a bosinst.data file<top>

A bosinst data file is a file contained answers to questions usually asked during a manual BOS install. 
A standard Red Squared bosinst.data file contains the below information and is stored in the /export/bosinst 
directory. (note the highlighted areas, specifically the disk location. We will be mirroring the root disk 
as part of the post task during the BOS install procedure)

control_flow:
CONSOLE = Default
INSTALL_METHOD = overwrite
PROMPT = no
EXISTING_SYSTEM_OVERWRITE = yes
INSTALL_X_IF_ADAPTER = yes
RUN_STARTUP = yes
RM_INST_ROOTS = no
ERROR_EXIT =
CUSTOMIZATION_FILE =
TCB = no
INSTALL_TYPE =
BUNDLES =
RECOVER_DEVICES = no
BOSINST_DEBUG = no
ACCEPT_LICENSES = yes
DESKTOP = NONE
INSTALL_DEVICES_AND_UPDATES = yes
IMPORT_USER_VGS =
ENABLE_64BIT_KERNEL = yes
CREATE_JFS2_FS = yes
ALL_DEVICES_KERNELS = yes
GRAPHICS_BUNDLE = yes
MOZILLA_BUNDLE = no
KERBEROS_5_BUNDLE = no
SERVER_BUNDLE = yes
REMOVE_JAVA_118 = no
HARDWARE_DUMP = yes
ADD_CDE = yes
ADD_GNOME = no
ADD_KDE = no
ERASE_ITERATIONS = 0
ERASE_PATTERNS =
target_disk_data:
LOCATION =
SIZE_MB =
HDISKNAME = hdisk0

locale:
BOSINST_LANG = en_US
CULTURAL_CONVENTION = en_GB
MESSAGES = en_US
KEYBOARD = en_GB

large_dumplv:
DUMPDEVICE=lg_dumplv
SIZEGB=2

dump:
PRIMARY=/dev/lg_dumplv
SECONDARY=/dev/sysdumpnull
FORCECOPY=no
COPYDIR=/dump
ALLOWS_ALLOW=yes

Once you have created the bosinst.data file, you need to define it to the NIM environment with the below command

# nim -o define -t bosinst_data -a server=master \


10.Defining a post script resource<top>

A script resource is used as part of the bosinst task. The resource contains commands to be executed 
on the NIM client after the BOS install has completed. The inst_script file should reside in the "/export/bosinst" 
directory.

The below inst_script contains commands relevant to an Oracle database server

Note: In all instances the root disk should be mirrored

/usr/sbin/chdev -l sys0 -a maxuproc=5000
/usr/sbin/chdev -l sys0 -a autorestart=true
/usr/sbin/vmo -o lru_file_repage=0
/usr/sbin/vmo -o strict_maxclient=0
/usr/sbin/vmo -o maxperm%=45
/usr/sbin/vmo -o maxclient%=45
/usr/sbin/vmo -o minperm%=15
/usr/sbin/tunchange -f nextboot -t vmo -o lru_file_repage=0
/usr/sbin/tunchange -f nextboot -t vmo -o strict_maxclient=0
/usr/sbin/tunchange -f nextboot -t vmo -o maxperm%=45
/usr/sbin/tunchange -f nextboot -t vmo -o maxclient%=45
/usr/sbin/tunchange -f nextboot -t vmo -o minperm%=15
/usr/sbin/chfs -a size=+1024M /usr
/usr/sbin/chfs -a size=+512M /opt
/usr/sbin/chfs -a size=+512M /home
/usr/sbin/chfs -a size=+512M /tmp
/usr/sbin/chfs -a size=+512M /var
/usr/bin/mkgroup id=500 oinstall
/usr/bin/mkuser id=1001 groups=oinstall oracle
/usr/bin/mkgroup id=501 red2ops
/usr/bin/mkuser id=1002 groups=red2ops red2ops
/usr/bin/mkdir /home/root
/usr/bin/chuser home=/home/root root
/usr/sbin/crfs -v jfs2 -g rootvg -a size=128M -m /usr/local -A yes -t rw
/usr/sbin/crfs -v jfs2 -g rootvg -a size=128M -m /usr/red2 -A yes -t rw
/usr/sbin/crfs -v jfs2 -g rootvg -a size=8G -m /oracle_home -A yes -t rw
/usr/sbin/crfs -v jfs2 -g rootvg -a size=3G -m /oracle_base -A yes -t rw
/usr/sbin/crfs -v jfs2 -g rootvg -a size=3G -m /dump -A yes -t rw
/usr/sbin/mount -a
/usr/bin/chown oracleinstall /oracle_home
/usr/bin/chown oracleinstall /oracle_base
/usr/bin/echo 'sp-tsm1 root' >> /home/root/.rhosts
/usr/bin/rcp sp-tsm1:/home/root/.profile /home/root/.profile
/usr/bin/rcp sp-tsm1:/home/root/.kshrc /home/root/.kshrc
/usr/bin/rcp sp-tsm1:/etc/security/limits.nim /etc/security/limits
/usr/sbin/extendvg rootvg hdisk1
/usr/sbin/mirrorvg rootvg
/usr/sbin/syncvg -v rootvg
/usr/sbin/bosboot -a -d /dev/hdisk1
/usr/sbin/chvg -a 'y' -Q 'n' -x 'n' rootvg
/usr/bin/bootlist -m normal hdisk0 hdisk1
/usr/sbin/chps -s16 hd6
/usr/bin/sysdumpdev -K

Once created, define the script to the NIM server with the below command

# nim -o define -t script -a server=master \
-a location=/export/bosinst/inst_script inst_script


Details of your newly created script resource can be viewed with the below

[sp-tsm1] bosinst # lsnim -l speedy_inst_script
speedy_inst_script:
class = resources
type = script
Rstate = ready for use
prev_state = unavailable for use
location = /export/bosinst/inst_script
alloc_count = 0
server = master


11.Backing up and restoring the NIM database<top>

Now that you have created a number of resources and machines, it would be a good idea to add a cron job 
to take a backup of the NIM database on a weekly basis. This will by default be picked up by Tivoli and mksysb 
then sent to tape.

Create an executable script called nim_backup_db.sh located in /usr/red2/scripts

#!/bin/sh
#--------------------------------------------------------------------------------
#
# File : nim_backup_db.sh
#
# Author : Steve Burgess
#
# Description : Wrapper script to backup the NIM database
#
# Change History:
#
# Date Version Author Description
# ------- ------- ---------------- -----------------------------
#--------------------------------------------------------------------------------

#-------------------------
# Backup The NIM database
#-------------------------

/usr/lpp/bos.sysmgt/nim/methods/m_backup_db /etc/objrepos/nimdb_backup 2>&1 | tee /usr/red2/logs/nim_backup.log

if [ $? -ne 0 ]
then
echo "`date +%Y%m%d` NIM_BACKUP_FAILURE" | tee -a /usr/red2/logs/nim_backup.log
else
echo "`date +%Y%m%d` NIM_BACKUP_SUCCESS" | tee -a /usr/red2/logs/nim_backup.log
fi



Add the script to roots crontab (as below)


# Backup the NIM database once a week
1 00 * * 0 /usr/red2/scripts/nim_backup_db.sh > /dev/null 2>&1


To restore the NIM database following corruption or applying to another server

# /usr/lpp/bos.sysmgt/nim/methods/m_restore_db -f /etc/objrepos/nimdb.backup



12.Initiating a BOS Installation <top>

You are now ready to initiate a BOS install for one of your defined machines. Run the below command 
to initate a BOS install for sp-tsm2:

nim -o bos_inst -a source=rte \
-a lpp_source=lpp_source1 \
-a spot=spot1 \
-a filesets="Java14_64 bos.adt bos.iconv X11.adt vac.C vac.aix50 tivoli.tsm.client.api.32bit tivoli.tsm.client.ba openssl-0.9.7d-1.aix5.1.ppc.rpm openssh.base
openssh.license lsof-4.61-3.aix5.1.ppc.rpm zip-2.3-3.aix4.3.ppc.rpm unzip-5.51-1.aix5.1.ppc.rpm" \
-a accept_licenses=yes \
-a script=inst_script \
-a boot_client=no \
-a bosinst_data=bosinst \
sp-tsm2

This will make the previously created resources, inst_script and bosinst available to the server. 

Additional filesets, as defined in the line

-a filesets=<fileset names>

will be installed as part of the installation procedure. For additional filesets, append them to the filesets line

Next you need to follow the below procedure to boot your machine from the NIM server

£ Begin with your machine turned off. 
If the system provided requires a System Management Services (SMS) diskette, insert it into the diskette drive of the client and turn on the machine. If you do not insert an SMS diskette at this time and one is required, you will be prompted to insert one later.
A graphics image is displayed on your screen. Press the F1 key as icons begin to display from left to right on the bottom of your display. 
The System Management Services menu displays on your screen. Select the Utilities option. 
From the System Management Services Utilities menu, select the Remote Initial Program Load Setup option. 
From the Network Parameters screen, select the IP Parameters option.
Set or change the values displayed so they are correct for snhent01. 

Specify the IP address of: 
The client machine you are booting in the client address field. : 10.20.5.253 
Your NIM server in the bootp server address field. : 10.20.5.254 
Your client's gateway in the gateway address field. 10.20.5.1 
Specify the subnet mask of 255.255.255.0 for the client machine if you are prompted for one in the subnet mask field. All machines in your subnet have the same subnet mask. 
After you specify the addresses, press Enter to save the addresses and continue. 
The Network Parameters screen is displayed. Select the Ping option. 
Select the network adapter to be used as the client's boot device. 
Verify that the displayed addresses are the same as the addresses you specified for your boot device. 
If the addresses are incorrect, press Esc until you return to the main menu. Then, go back to step 5. 
If the addresses are correct, press Enter to perform the ping test. The ping test may take several seconds to complete. 
If the ping test fails, verify that the addresses are correct, and perform network problem determination if necessary. If the ping test completes successfully, press Enter to acknowledge the success message. Then, press Esc until you return to the System Management Services menu. 
From the System Management Services menu, choose the Select Boot Devices option.
Select the network adapter to be used for the network boot from the list of displayed bootable devices. Be sure to select the correct network type Ethernet. After making your selection, the machine will boot over the network.

Following successful BOS installation, you will need to confirm the post tasks you defined in your inst_script have completed. Anything that has failed will need to be run manually


13.Taking a mksysb of your new server<top>

To take a mksysb of the newly created server onto the NIM server, you will need to create an new filesystem (not in rootvg) to hold the mksysb images. The filesystem should have a mount point of /export/mksysb_clients and of the type jfs2. To create a 20gb filesystem in tsmvg run the below command

# crfs -v jfs2 -g tsmvg -a size=20G -m /export/mksysb_clients -A yes -t rw


To take a mksysb of a NIM client, run the below command

nim -o define -t mksysb \
-a server=master \
-a location=/export/mksysb_clients/sp-tsm2 \
-a source=sp-tsm2 \
-a mk_image=yes \
-a mksysb_flags=-e \
sp-tsm2_image 

This will create a mksysb resource, as below

[sp-tsm1] scripts # lsnim -l sp-tsm2_image
sp-uat1_image:
class = resources
type = mksysb
arch = power
Rstate = ready for use
prev_state = unavailable for use
location = /export/mksysb_clients/sp-tsm2/sp-tsm2.mksysb
version = 5
release = 3
mod = 0
oslevel_r = 5300-03
alloc_count = 0
server = master


14.Restoring a host from a mksysb<top>

The procedure of restoring a host from a mksysb is fairly simple. In this example, we restore sp-tsm2

Enter the below command to initiate the restore from the NIM server

# nim -o bos_inst -a source=mksysb \
-a mksysb=sp-tsm2_image \
-a lpp_source=lpp_source1 \
-a spot=spot1 \
-a accept_licenses=yes \
-a boot_client=no \
sp-tsm2

Once entered, refer to section 11 to boot the server you are recovering over the network

15.Booting the server into diagnostics<top>

Occasionally you may need to boot the server into diagnostic mode to allow you to resolve a hardware issue. To do this, first enter the below


# nim -o diag -a spot=spot1 sp-tsm2

Once entered, refer to section 11 to boot the server into diagnostics over the network

16.Booting a server into maintenance<top>

Occasionally you may need to boot the server into maintenance mode. To do this, first enter the below

# nim -o maint_boot spot=spot1 sp-tsm2

Once entered, refer to section 11 to boot the server into diagnostics over the network

After successfully booting and defining the console, the System Maintenance menu is displayed. The maintenance menu options and their descriptions are described below. 

Access a Root Volume Group
This option allows you to activate the root volume group and start the maintenance shell with a full set of commands.
Copy a System Dump to Removable Media
This option allows you to copy a previous system dump to external media.
Access Advanced Maintenance Function
This option allows you to start a maintenance shell with a limited set of commands.


17.Installing additional software on a client<top>

Occasionally you may need to install additional filesets on a client. You first need to add the software to the lpp_source by simply copying it to the lpp_source directory. You then need to action the below command

# nim -Fo check lpp_source1

Following that you can initiate the install on the client

# nim -o cust -a lpp_source=lpp_source1 -a filesets=bos.adt \
-a installp_flags="a c g X p" sp-tsm2

Note: refer to the installp man page for options on installp_flags

To install a pre-defined or new installp bundle (output from # lsnim -t installp_bundle)

# nim -o cust -a lpp_source=lpp_source1 -a installp_bundle=openssh_server -a installp_flags=" a c g X p" pr-testdb

18.Update software on a client<top>

To update a client with the whole contents of an lpp resource, enter the below

# nim -o update -a packages=all -a source=lpp_source1 sp-tsm2

19.To add a new lpp resource that contains a new AIX level, then apply that update to a NIM client. <top>

First copy the contents of the ML to a filesystem area, then run

# nim -o define -t lpp_source -a location=/export/lpp_source/aix_maint_ML3 \ 
-a server=master aix_maint_ML3

To update a server from the new aix maint level # nim -o cust -a lpp_source=aix_maint_ML3 -a fixes=update_all \ 
-a installp_flags="a c g X p" sp-tsm2     Tutorial Tools 
 Show Printable Version  
 Email this Page  
 

    

65. ACCOUNTING:
===============

General in unix:
----------------

The following is a step-by-step summary of how UNIX system accounting works: 

When the UNIX system is switched into multiuser state, the /usr/lib/acct/startup program is executed. 
The startup program executes several other programs that invoke accounting: 
acctwtmp, turnacct, and remove. 

- acctwtmp adds a ``boot'' record to /var/adm/wtmp. In this record, the system name is shown 
  as the login name in the wtmp record. 

- turnacct, invoked with the on option, begins process accounting. Specifically, turnacct on executes 
  the accton program with the argument /var/adm/pacct. 

- remove ``cleans up'' the saved pacct and wtmp files left in the sum directory by runacct. 

Raw Accounting Data

The login and init programs record connect sessions by writing records into /var/adm/wtmp. 
Any date changes (made by running date with an argument) are also written to /var/adm/wtmp. 
Reboots and shutdowns (via acctwtmp) are also recorded in /var/adm/wtmp. 
When a process ends, the kernel writes one record per process, in the form of acct.h, in the /var/adm/pacct file. 

Two programs track disk usage by login: acctdusg and diskusg. They are invoked by the shell script dodisk. 

Every hour cron executes the ckpacct program to check the size of /var/adm/pacct. 
If the file grows past 500 blocks (default), turnacct switch is executed. (The turnacct switch program 
moves the pacct file and creates a new one.) The advantage of having several smaller pacct files 
becomes apparent when trying to restart runacct if a failure occurs when processing these records. 

If the system is shut down using shutdown, the shutacct program is executed automatically. 
The shutacct program writes a reason record into /var/adm/wtmp and turns off process accounting. 

If you provide services on a request basis (such as file restores), you can keep billing records 
by login by using the chargefee program. It allows you to add a record to /var/adm/fee each time a user 
incurs a charge. The next time runacct is executed, this new record is picked up and merged into the total 
accounting records. 

runacct is executed via cron each night. It processes the accounting files /var/adm/pacct?, 
/var/adm/wtmp, /var/adm/fee, and /var/adm/acct/nite/disktacct to produce command summaries 
and usage summaries by login. 

/usr/lib/acct/prdaily program is executed on a daily basis by runacct to write the daily accounting 
information collected by runacct (in ASCII format) in /var/adm/acct/sum/rprtMMDD. 

The monacct program should be executed on a monthly basis (or at intervals determined by you, 
such as the end of every fiscal period). The monacct program creates a report based on data stored 
in the sum directory that has been updated daily by runacct. After creating the report, monacct 
``cleans up'' the sum directory to prepare the directory's files for the new runacct data. 


On AIX:
-------

- Connect time accounting:
Connect time data is collected by the init and the login command. When you login, the login program
writes a record in the "/etc/utmp" file. This record includes your user name, the date and time of the login,
and the login port. Commands such as who, use this file to find out which users are logged into
the various display stations. 
If the /var/adm/wtmp connect-time accounting file exists, the login command adds a copy of this 
login record to it.

When your login program ends (when you logout), the init command records the end of the session
by writing another record in the "/var/adm/wtmp" file.
Both the login and logout records have the form described in the utmp.h file.

- Shutdown:
acctwtmp command:
The "acctwtmp" command also writes special entries in the /var/adm/wtmp file concerning
system shutdowns and startups.

- Process accounting:
accton command:
The system collects data on resource usage for each process as it runs, including
the memory use, elapsed time and processor time, user and group id under which the process runs etc..
The "accton" command records these data in the "/var/adm/pacct" file.

- Disk usage accounting:
dodisk command:
The dodisk command, run as specified by the cron demon, periodically writes disk-usage records
to the "/var/adm/acct/nite(x)/dacct" file. To accomplish this, the dodisk command calls other commands.
Depending upon the thoroughness of the accounting search, the diskusg command or the acctdusg command
can be used to collect data. The acctdisk command is used to write a total accounting record.
The total accounting record, in turn, is used by the acctmerg command to prepare the daily
accounting report.

- Printer usage accounting:
enq command:
The collection of printer usage data is a cooperative effort between the enq command and the queuing demon.
The enq command enqueues the user name, job number, and the name of the file to be printed.
After the file is printed, the qdaemon command writes an ascii record to a file, usually the
"/var/adm/qacct" file, containing the user name, user id, and the number of pages printed.
You can sort these records and convert them to total accounting records.


66. Combining cards, Link Aggregation,EtherChannel in AIX:
==========================================================

Note 1:
-------

EtherChannel and IEEE 802.3ad Link Aggregation are network port aggregation technologies that allow 
several Ethernet adapters to be aggregated together to form a single pseudo Ethernet device. 
For example, ent0 and ent1 can be aggregated into an EtherChannel adapter called ent3; interface en3 
would then be configured with an IP address. The system considers these aggregated adapters as one adapter. 
Therefore, IP is configured over them as over any Ethernet adapter. In addition, all adapters 
in the EtherChannel or Link Aggregation are given the same hardware (MAC) address, so they are treated 
by remote systems as if they were one adapter. Both EtherChannel and IEEE 802.3ad Link Aggregation require 
support in the switch so it is aware which switch ports should be treated as one.

The main benefit of EtherChannel and IEEE 802.3ad Link Aggregation is that they have the network bandwidth 
of all of their adapters in a single network presence. If an adapter fails, network traffic is automatically 
sent on the next available adapter without disruption to existing user connections. The adapter is automatically 
returned to service on the EtherChannel or Link Aggregation when it recovers.

There are some differences between EtherChannel and IEEE 802.3ad Link Aggregation. Consider the differences 
given in Table 15 to determine which would be best for your situation.

Table 15. 
Differences between EtherChannel and IEEE 802.3ad Link Aggregation. 

EtherChannel                                 IEEE 802.3ad 
Requires switch configuration                Little, if any, configuration of switch required to form aggregation. 
                                             Some initial setup of the switch may be required. 
Supports different packet distribution modes Supports only standard distribution mode 

Beginning with AIX 5.2 with 5200-03, Dynamic Adapter Membership functionality is available. 
This functionality allows you to add or remove adapters from an EtherChannel without having to disrupt 
any user connections. For more details, see Dynamic Adapter Membership.

Supported Adapters
EtherChannel and IEEE 802.3ad Link Aggregation are supported on the following Ethernet adapters:

10/100 Mbps Ethernet PCI Adapter 
Universal 4-Port 10/100 Ethernet Adapter 
10/100 Mbps Ethernet PCI Adapter II 
10/100/1000 Base-T Ethernet PCI Adapter 
Gigabit Ethernet-SX PCI Adapter 
10/100/1000 Base-TX Ethernet PCI-X Adapter 
Gigabit Ethernet-SX PCI-X Adapter 
2-port 10/100/1000 Base-TX Ethernet PCI-X Adapter 
2-port Gigabit Ethernet-SX PCI-X Adapter 
Gigabit Ethernet-SX Adapter 
10 Gigabit Ethernet PCI-X Adapter


Only the basic EtherChannel functions (operating exclusively in standard or round-robin mode without a backup) 
are supported in the following Ethernet adapters:

PCI Ethernet BNC/RJ-45 Adapter 
PCI Ethernet AUI/RJ-45 Adapter
For additional release information about new adapters, see the AIX Release Notes that correspond to your 
level of AIXr.

Important:
Mixing adapters of different speeds in the same EtherChannel, even if one of them is operating 
as the backup adapter, is not supported. This does not mean that such configurations will not work. 
The EtherChannel driver makes every reasonable attempt to work even in a mixed-speed scenario.
For information on configuring and using EtherChannel, see EtherChannel. For more information on configuring 
and using IEEE 802.3ad link aggregation, see IEEE 802.3ad Link Aggregation. For information on the different 
AIX and switch configuration combinations and the results they produce, see Interoperability Scenarios.

EtherChannel
The adapters that belong to an EtherChannel must be connected to the same EtherChannel-enabled switch. 
You must manually configure this switch to treat the ports that belong to the EtherChannel 
as an aggregated link. Your switch documentation might refer to this capability as link aggregation 
or trunking.

Traffic is distributed across the adapters in either the standard way (where the adapter over which 
the packets are sent is chosen depending on an algorithm) or on a round-robin basis (where packets 
are sent evenly across all adapters). Incoming traffic is distributed in accordance to the 
switch configuration and is not controlled by the EtherChannel operation mode.

In AIX, you can configure multiple EtherChannels per system, but it is required that all the links 
in one EtherChannel are attached to a single switch. Because the EtherChannel cannot be spread across 
two switches, the entire EtherChannel is lost if the switch is unplugged or fails. To solve this problem, 
a new backup option available in AIX 5.2 and later keeps the service running when the main EtherChannel fails. 
The backup and EtherChannel adapters should be attached to different network switches, which must be 
inter-connected for this setup to work properly. In the event that all of the adapters in the EtherChannel fail, 
the backup adapter will be used to send and receive all traffic. When any link in the EtherChannel is restored, 
the service is moved back to the EtherChannel.

For example, ent0 and ent1 could be configured as the main EtherChannel adapters, and ent2 as the backup adapter, 
creating an EtherChannel called ent3. Ideally, ent0 and ent1 would be connected to the same 
EtherChannel-enabled switch, and ent2 would be connected to a different switch. In this example, all traffic 
sent over en3 (the EtherChannel's interface) would be sent over ent0 or ent1 by default (depending on the 
EtherChannel's packet distribution scheme), whereas ent2 will be idle. If at any time both ent0 and ent1 fail, 
all traffic would be sent over the backup adapter, ent2. When either ent0 or ent1 recover, they will once again 
be used for all traffic.

Network Interface Backup, a mode of operation available for EtherChannel in AIX 4.3.3 and AIX 5.1, 
protects against a single point of Ethernet network failure. No special hardware is required to use 
Network Interface Backup, but the backup adapter should be connected a separate switch for maximum reliability. 
In Network Interface Backup mode, only one adapter at a time is actively used for network traffic. 
The EtherChannel tests the currently-active adapter and, optionally, the network path to a user-specified node. 
When a failure is detected, the next adapter will be used for all traffic. Network Interface Backup provides 
detection and failover with no disruption to user connections. Network Interface Backup was originally 
implemented as a mode in the EtherChannel SMIT menu. In AIX 5.2 and later, the backup adapter provides 
the equivalent function, so the mode was eliminated from the SMIT menu. To configure network interface backup 
in AIX 5.2 and later, see Configure Network Interface Backup.

Configuring EtherChannel:
-------------------------
Follow these steps to configure an EtherChannel.

Considerations
You can have up to eight primary Ethernet adapters and only one backup Ethernet adapter per EtherChannel. 
You can configure multiple EtherChannels on a single system, but each EtherChannel constitutes an additional 
Ethernet interface. The no command option, ifsize, may need to be increased to include not only the 
Ethernet interfaces for each adapter, but also any EtherChannels that are configured. 
In AIX 5.2 and earlier, the default ifsize is eight. In AIX 5.2 and later, the default size is 256. 
You can use any supported Ethernet adapter in an EtherChannel (see Supported Adapters). However, the Ethernet adapters 
must be connected to a switch that supports EtherChannel. See the documentation that came with your switch 
to determine if it supports EtherChannel (your switch documentation may refer to this capability also as 
link aggregation or trunking). 
All adapters in the EtherChannel should be configured for the same speed (100 Mbps, for example) and should be 
full duplex. 
The adapters used in the EtherChannel cannot be accessed by the system after the EtherChannel is configured. 
To modify any of their attributes, such as media speed, transmit or receive queue sizes, and so forth,
you must do so before including them in the EtherChannel. 
The adapters that you plan to use for your EtherChannel must not have an IP address configured on them 
before you start this procedure. When configuring an EtherChannel with adapters that were previously configured 
with an IP address, make sure that their interfaces are in the detach state. The adapters to be added 
to the EtherChannel cannot have interfaces configured in the up state in the Object Data Manager (ODM), 
which will happen if their IP addresses were configured using SMIT. This may cause problems bringing up 
the EtherChannel when the machine is rebooted because the underlying interface is configured before the 
EtherChannel with the information found in ODM. Therefore, when the EtherChannel is configured, it finds 
that one of its adapters is already being used. To change this, before creating the EtherChannel, 
type smit chinet, select each of the interfaces of the adapters to be included in the EtherChannel, 
and change its state value to "detach". This will ensure that when the machine is rebooted the EtherChannel 
can be configured without errors. 
For more information about ODM, see Object Data Manager (ODM) in AIX 5L Version 5.3 
General Programming Concepts: Writing and Debugging Programs.

If you will be using 10/100 Ethernet adapters in the EtherChannel, you may need to enable link polling 
on those adapters before you add them to the EtherChannel. Type "smit chgenet" at the command line. 
Change the Enable Link Polling value to yes, and press Enter. 

Note:
In AIX 5.2 with 5200-03 and later, enabling the link polling mechanism is not necessary. The link poller 
will be started automatically.
If you plan to use jumbo frames, you may need to enable this feature in every adapter before creating 
the EtherChannel and in the EtherChannel itself. Type smitty chgenet at the command line. 
Change the Enable Jumbo Frames value to yes and press Enter. Do this for every adapter for which you want 
to enable Jumbo Frames. You will enable jumbo frames in the EtherChannel itself later. 

Note:
In AIX 5.2 and later, enabling the jumbo frames in every underlying adapter is not necessary once it is enabled 
in the EtherChannel itself. The feature will be enabled automatically if you set the Enable Jumbo Frames attribute to yes.

Configure an EtherChannel:
--------------------------
Type "smit etherchannel" at the command line. 
Select Add an EtherChannel / Link Aggregation from the list and press Enter. 
Select the primary Ethernet adapters that you want on your EtherChannel and press Enter. If you are planning to use 
EtherChannel backup, do not select the adapter that you plan to use for the backup at this point. 
The EtherChannel backup option is available in AIX 5.2 and later. 

Note:
The Available Network Adapters displays all Ethernet adapters. If you select an Ethernet adapter that is already 
being used (has an interface defined), you will get an error message. You first need to detach this interface 
if you want to use it.

Enter the information in the fields according to the following guidelines: 

- EtherChannel / Link Aggregation Adapters: You should see all primary adapters that you are using 
in your EtherChannel. You selected these adapters in the previous step. 

- Enable Alternate Address: This field is optional. Setting this to yes will enable you to specify 
a MAC address that you want the EtherChannel to use. If you set this option to no, the EtherChannel 
will use the MAC address of the first adapter. 

- Alternate Address: If you set Enable Alternate Address to yes, specify the MAC address that you want 
to use here. The address you specify must start with 0x and be a 12-digit hexadecimal address 
(for example, 0x001122334455). 

- Enable Gigabit Ethernet Jumbo Frames: This field is optional. In order to use this, your switch 
must support jumbo frames. This will only work with a Standard Ethernet (en) interface, 
not an IEEE 802.3 (et) interface. Set this to yes if you want to enable it. 

- Mode: You can choose from the following modes: 

standard: In this mode the EtherChannel uses an algorithm to choose which adapter it will send 
the packets out on. The algorithm consists of taking a data value, dividing it by the number of adapters 
in the EtherChannel, and using the remainder (using the modulus operator) to identify the outgoing link. 
The Hash Mode value determines which data value is fed into this algorithm (see the Hash Mode attribute 
for an explanation of the different hash modes). For example, if the Hash Mode is standard, it will use 
the packet's destination IP address. If this is 10.10.10.11 and there are 2 adapters in the EtherChannel, 
(1 / 2) = 0 with remainder 1, so the second adapter is used (the adapters are numbered starting from 0). 
The adapters are numbered in the order they are listed in the SMIT menu. This is the default operation mode. 

round_robin: In this mode the EtherChannel will rotate through the adapters, giving each adapter one packet 
before repeating. The packets may be sent out in a slightly different order than they were given to the 
EtherChannel, but it will make the best use of its bandwidth. It is an invalid combination to select 
this mode with a Hash Mode other than default. If you choose the round-robin mode, leave the Hash Mode 
value as default. 

netif_backup: This option is available only in AIX 5.1 and AIX 4.3.3. In this mode, the EtherChannel 
will activate only one adapter at a time. The intention is that the adapters are plugged into different 
Ethernet switches, each of which is capable of getting to any other machine on the subnet or network. 
When a problem is detected either with the direct connection (or optionally through the inability 
to ping a machine), the EtherChannel will deactivate the current adapter and activate a backup adapter. 
This mode is the only one that makes use of the Internet Address to Ping, Number of Retries, and 
Retry Timeout fields. 
Network Interface Backup Mode does not exist as an explicit mode in AIX 5.2 and later. 
To enable Network Interface Backup Mode in AIX 5.2 and later, you must configure one adapter in the 
main EtherChannel and a backup adapter. For more information, see Configure Network Interface Backup.

8023ad: This options enables the use of the IEEE 802.3ad Link Aggregation Control Protocol (LACP) 
for automatic link aggregation. For more details about this feature, see IEEE 802.3ad Link Aggregation.

Hash Mode: You can choose from the following hash modes, which will determine which data value will be used 
by the algorithm to determine the outgoing adapter: 

default: In this hash mode the destination IP address of the packet will be used to determine the outgoing adapter. 
For non-IP traffic (such as ARP), the last byte of the destination MAC address is used to do the calculation. 
This mode will guarantee packets are sent out over the EtherChannel in the order they were received, but it may 
not make full use of the bandwidth. 
src_port: In this hash mode the source UDP or TCP port value of the packet will be used to determine the 
outgoing adapter. If the packet is not UDP or TCP traffic, the last byte of the destination IP address will be used. 
If the packet is not IP traffic, the last byte of the destination MAC address will be used. 
dst_port: In this hash mode the destination UDP or TCP port value of the packet will be used to determine 
the outgoing adapter. If the packet is not UDP or TCP traffic, the last byte of the destination IP will be used. 
If the packet is not IP traffic, the last byte of the destination MAC address will be used. 
src_dst_port: In this hash mode both the source and destination UDP or TCP port values of the packet will be used 
to determine the outgoing adapter (specifically, the source and destination ports are added and then divided 
by two before being fed into the algorithm). If the packet is not UDP or TCP traffic, the last byte of the 
destination IP will be used. If the packet is not IP traffic, the last byte of the destination MAC address 
will be used. This mode can give good packet distribution in most situations, both for clients and servers. 

Note:

It is an invalid combination to select a Hash Mode other than default with a Mode of round_robin.
To learn more about packet distribution and load balancing, see Load-balancing options. 

Backup Adapter: This field is optional. Enter the adapter that you want to use as your EtherChannel backup. 
EtherChannel backup is available in AIX 5.2 and later. 

Internet Address to Ping: This field is optional and only takes effect if you are running Network Interface 
Backup mode or if you have only one adapter in the EtherChannel and a backup adapter. The EtherChannel will 
ping the IP address or host name that you specify here. If the EtherChannel is unable to ping this address 
for the Number of Retries times in Retry Timeout intervals, the EtherChannel will switch adapters. 

Number of Retries: Enter the number of ping response failures that are allowed before the EtherChannel 
switches adapters. The default is three. This field is optional and valid only if you have set an 
Internet Address to Ping. 

Retry Timeout: Enter the number of seconds between the times when the EtherChannel will ping the Internet Address 
to Ping. The default is one second. This field is optional and valid only if you have set an Internet Address to Ping.

Press Enter after changing the desired fields to create the EtherChannel. 

Configure IP over the newly-created EtherChannel device by typing smit chinet at the command line. 
Select your new EtherChannel interface from the list. 
Fill in all the required fields and press Enter.

Configure Network Interface Backup
Network Interface Backup protects against a single point of network failure by providing failure detection 
and failover with no disruption to user connections. When operating in this mode, only one adapter is active 
at any given time. If the active adapter fails, another adapter in the EtherChannel will be used for all traffic. 
When operating in Network Interface Backup mode, it is not necessary to connect to EtherChannel-enabled switches.

The Network Interface Backup setup is most effective when the adapters are connected to different network switches, 
as this provides greater redundancy than connecting all adapters to one switch. When connecting to different switches, 
make sure there is a connection between the switches. This provides failover capabilities from one adapter 
to another by ensuring that there is always a route to the currently-active adapter.

In releases prior to AIX 5.2, Network Interface Backup mode was implemented as an explicit mode of operation 
in the EtherChannel SMIT menu. In AIX 5.2 and later, however, the backup adapter functionality provides 
the equivalent behavior, so the mode was eliminated from the SMIT menu.

Additionally, AIX 5.2 and later versions provide priority, meaning that the adapter configured in the primary 
EtherChannel will be used preferentially over the backup adapter. As long as the primary adapter is functional, 
it will be used. This contrasts from the behavior of Network Interface Backup mode in releases prior to AIX 5.2, 
where the backup adapter was used until it also failed, regardless of whether the primary adapter had 
already recovered.

For example, ent0 could be configured as the main adapter, and ent2 as the backup adapter, creating an 
EtherChannel called ent3. Ideally, ent0 and ent2 would be connected to two different switches. In this example, 
all traffic sent over en3 (the EtherChannel's interface) would be sent over ent0 by default, whereas ent2 
will be idle. If at any time ent0 fails, all traffic would be sent over the backup adapter, ent2. 
When ent0 recovers, it will once again be used for all traffic.

While operating in Network Interface Backup Mode, it is also possible to configure the EtherChannel to detect 
link failure and network unreachability. To do this, specify the IP address or host name of a remote host 
where connectivity should always be present. The EtherChannel will periodically ping this host to determine 
whether there is still a network path to it. If a specified number of ping attempts go unanswered, the EtherChannel 
will fail over to the other adapter in the hope that there is a network path to the remote host through the 
other adapter. In this setup, not only should every adapter be connected to a different switch, but each switch 
should also have a different route to the host that is pinged.

This ping feature is only available in Network Interface Backup mode. However, in AIX 5.2 and later, if there is 
a failover due to unanswered pings on the primary adapter, the backup adapter will remain the active channel as long 
as it is working. There is no way of knowing, while operating on the backup adapter, whether it is possible to reach 
the host being pinged from the primary adapter. To avoid failing over back and forth between the primary and 
the backup, it will simply keep operating on the backup (unless the pings go unanswered on the backup adapter 
as well, or if the backup adapter itself fails, in which case it would fail over to the primary adapter). 
However, if the failover occurred because the primary adapter failed (not because the pings went unanswered), 
the EtherChannel will then come back to the primary adapter as soon it has come back up, as usual.

To configure Network Interface Backup in AIX 5.2, see Configure Network Interface Backup in AIX 5.2 and later. 
To configure Network Interface Backup in previous versions of AIX, see Appendix D. Configure Network Interface Backup 
in previous AIX versions

Configure Network Interface Backup in AIX 5.2 and later
With root authority, type smit etherchannel on the command line. 
Select Add an EtherChannel / Link Aggregation from the list and press Enter. 
Select the primary Ethernet adapter and press Enter. This is the adapter that will be used until it fails. 
Note:
The Available Network Adapters displays all Ethernet adapters. If you select an Ethernet adapter that is already being used, you will get an error message and will need to detach this interface before you can use it. See the ifconfig command for information on how to detach an interface.
Enter the information in the fields according to the following guidelines: 
EtherChannel / Link Aggregation Adapters: You should see the primary adapter you selected in the previous step. 
Enable Alternate Address: This field is optional. Setting this to yes will enable you to specify a MAC address that you want the EtherChannel to use. If you set this option to no, the EtherChannel will use the MAC address of the primary adapter. 
Alternate Address: If you set Enable Alternate Address to yes, specify the MAC address that you want to use here. The address you specify must start with 0x and be a 12-digit hexadecimal address (for example 0x001122334455). 
Enable Gigabit Ethernet Jumbo Frames: This field is optional. In order to use this, your switch must support jumbo frames. This will only work with a Standard Ethernet (en) interface, not an IEEE 802.3 (et) interface. Set this to yes if you want to use it. 
Mode: It is irrelevant which mode of operation you select because there is only one adapter in the main EtherChannel. All packets will be sent over that adapter until it fails. There is no netif_backup mode because that mode can be emulated using a backup adapter. 
Hash Mode: It is irrelevant which hash mode you select because there is only one adapter in the main EtherChannel. All packets will be sent over that adapter until it fails. 
Backup Adapter: Enter the adapter that you want to be your backup adapter. After a failover, this adapter will be used until the primary adapter recovers. It is recommended to use the preferred adapter as the primary adapter. 
Internet Address to Ping: The field is optional. The EtherChannel will ping the IP address or host name that you specify here. If the EtherChannel is unable to ping this address for Number of Retries times in Retry Timeout intervals, the EtherChannel will switch adapters. 
Number of Retries: Enter the number of ping response failures that are allowed before the EtherChannel switches adapters. The default is three. This field is optional and valid only if you have set an Internet Address to Ping. 
Retry Timeout: Enter the number of seconds between the times when the EtherChannel will ping the Internet Address to Ping. The default is one second. This field is optional and valid only if you have set an Internet Address to Ping.
Press Enter after changing the desired fields to create the EtherChannel. 
Configure IP over the newly-created interface by typing smit chinet at the command line. 
Select your new EtherChannel interface from the list. 
Fill in all the required fields and press Enter.
For additional tasks that can be performed after the EtherChannel is configured, see Managing EtherChannel and IEEE 802.3ad Link Aggregation.

Load-balancing options
There are two load balancing methods for outgoing traffic in EtherChannel, as follows: round-robin, which spreads the outgoing traffic evenly across all the adapters in the EtherChannel; and standard, which selects the adapter using an algorithm. The Hash Mode parameter determines which numerical value is fed to the algorithm.

The following table summarizes the valid load balancing option combinations offered.

Table 16. Mode and Hash Mode combinations and the outgoing traffic distributions each will produce. Mode Hash Mode Outgoing Traffic Distribution 
standard or 8023ad default The traditional AIX behavior. The adapter selection algorithm uses the last byte of the destination IP address (for TCP/IP traffic) or MAC address (for ARP and other non-IP traffic). This mode is typically a good initial choice for a server with a large number of clients. 
standard or 8023ad src_dst_port The outgoing adapter path is selected by an algorithm using the combined source and destination TCP or UDP port values. Since each connection has a unique TCP or UDP port, the three port-based hash modes provide additional adapter distribution flexibility when there are several, separate TCP or UDP connections between an IP address pair. 
standard or 8023ad src_port The adapter selection algorithm uses the source TCP or UDP port value. In the netstat -an command output, the port is the TCP/IP address suffix value in the Local column. 
standard or 8023ad dst_port The outgoing adapter path is selected by the algorithm using the destination system port value. In the netstat -an command output, the TCP/IP address suffix in the Foreign column is the TCP or UDP destination port value. 
round-robin default Outgoing traffic is spread evenly across all the adapter ports in the EtherChannel. This mode is the typical choice for two hosts connected back-to-back (without an intervening switch). 

Round-Robin
All outgoing traffic is spread evenly across all of the adapters in the EtherChannel. It provides the highest bandwidth optimization for the AIX server system. While round-robin distribution is the ideal way to utilize all the links equally, consider that it also introduces the potential for out-of-order packets at the receiving system.

In general, round-robin mode is ideal for back-to-back connections running jumbo frames. In this environment, there is no intervening switch, so there is no chance that processing at the switch could alter the packet delivery time, order, or adapter path. On this direct cable network path, packets are received exactly as sent. Jumbo frames (9000 byte MTU) always yield better file transfer performance than traditional 1500 byte MTUs. In this case, however, they add another benefit. These larger packets take longer to send so it is less likely that the receiving host would be continuously interrupted with out-of-order packets.

Round-robin mode can be implemented in other environments but at increased risk of out-of-order packets at the receiving system. This risk is particularly high when there are few, long-lived, streaming TCP connections. When there are many such connections between a host pair, packets from different connections could be intermingled, thereby decreasing the chance of packets for the same connection arriving out-of-order. Check for out-of-order packet statistics in the tcp section of the netstat -s command output. A steadily-increasing value indicates a potential problem in traffic sent from an EtherChannel.

If out-of-order packets are a problem on a system that must use traditional Ethernet MTUs and must connected through a switch, try the various hash modes offered in standard mode operation. Each mode has a particular strength, but the default and src_dst_port modes are the logical starting points as they are more widely applicable.

Standard or 8032ad
Standard algorithm. The standard algorithm is used for both standard and IEEE 802.3ad-style link aggregations. AIX divides the last byte of the "numerical value" by the number of adapters in the EtherChannel and uses the remainder to identify the outgoing link. If the remainder is zero, the first adapter in the EtherChannel is selected; a remainder of one means the second adapter is selected, and so on (the adapters are selected in the order they are listed in the adapter_names attribute).

The Hash Mode selection determines the numerical value used in the calculation. By default, the last byte of the destination IP address or MAC address is used in the calculation, but the source and destination TCP or UDP port values may also be used. These alternatives allow you to fine-tune the distribution of outgoing traffic across the real adapters in the EtherChannel.

In default hash mode, the adapter selection algorithm is applied to the last byte of the destination IP address for IP traffic. For ARP and other non-IP traffic, the same formula is applied on the last byte of the destination MAC address. Unless there is an adapter failure which causes a failover, all traffic between a host pair in default standard mode goes out over the same adapter. The default hash mode may be ideal when the local host establishes connections to many different IP addresses.

If the local host establishes lengthy connections to few IP addresses, however, you will notice that some adapters carry a greater load than others, because all the traffic sent to a specific destination is sent over the same adapter. While this prevents packets from arriving out-of-order, it may not utilize bandwidth in the most effective fashion in all cases. The port-based hash modes still send packets in order, but they allow packets belonging to different UDP or TCP connections, even if they are sent to the same destination, to be sent over different adapters, thus utilizing better the bandwidth of all the adapters.

In src_dst_port hash mode, the TCP or UDP source and destination port values of the outgoing packet are added, then divided by two. The resultant whole number (no decimals) is plugged into the standard algorithm. TCP or UDP traffic is sent on the adapter selected by the standard algorithm and selected hash mode value. Non-TCP or UDP traffic will fall back to the default hash mode, meaning the last byte of either the destination IP address or MAC address. The src_dst_port hash mode option considers both the source and the destination TCP or UDP port values. In this mode, all of the packets in one TCP or UDP connection are sent over a single adapter so they are guaranteed to arrive in order, but the traffic is still spread out because connections (even to the same host) may be sent over different adapters. The results of this hash mode are not skewed by the connection establishment direction because it uses both the source and destination TCP or UDP port values.

In src_port hash mode, the source TCP or UDP port value of the outgoing packet is used. In dst_port hash mode, the destination TCP or UDP port value of the outgoing packet is used. Use the src_port or dst_port hash mode options if port values change from one connection to another and if the src_dst_port option is not yielding a desirable distribution.

Managing EtherChannel and IEEE 802.3ad Link Aggregation
This section will tell you how to perform the following tasks:

Listing EtherChannels or Link Aggregations 
Changing the Alternate Address 
Adding, removing, or changing adapters in an EtherChannel or Link Aggregation 
Remove an EtherChannel or Link Aggregation 
Configure or remove a backup adapter on an existing EtherChannel or Link Aggregation
Listing EtherChannels or Link Aggregations
On the command line, type smit etherchannel. 
Select List All EtherChannels / Link Aggregations and press Enter.
Changing the Alternate Address
This enables you to specify a MAC address for your EtherChannel or Link Aggregation.

On AIX 5.2 with 5200-01 and earlier, type ifconfig interface detach, where interface is your EtherChannel's or Link Aggregation's interface. (On AIX 5.2 with 5200-03 and later, you can change the alternate address of the EtherChannel without detaching its interface). 
On the command line, type smit etherchannel. 
Select Change / Show Characteristics of an EtherChannel and press Enter. 
If you have multiple EtherChannels, select the EtherChannel for which you want to create an alternate address. 
Change the value in Enable Alternate EtherChannel Address to yes. 
Enter the alternate address in the Alternate EtherChannel Address field. The address must start with 0x and be a 12-digit hexadecimal address (for example, 0x001122334455). 
Press Enter to complete the process. 
Note:
Changing the EtherChannel's MAC address at runtime may cause a temporary loss of connectivity. This is because the adapters need to be reset so they learn of their new hardware address, and some adapters take a few seconds to be initialized.
Dynamic Adapter Membership
Prior to AIX 5.2 with 5200-03, in order to add or remove an adapter from an EtherChannel, its interface first had to be detached, temporarily interrupting all user traffic. To overcome this limitation, Dynamic Adapter Membership (DAM) was added in AIX 5.2 with 5200-03. It allows adapters to be added or removed from an EtherChannel without having to disrupt any user connections. A backup adapter can also be added or removed; an EtherChannel can be initially created without a backup adapter, and one can be added a later date if the need arises

Not only can adapters be added or removed without disrupting user connections, it is also possible to modify most of the EtherChannel attributes at runtime. For example, you may begin using the "ping" feature of Network Interface Backup while the EtherChannel is in use, or change the remote host being pinged at any point.

You may also turn a regular EtherChannel into an IEEE 802.3ad Link Aggregation (or vice versa), allowing users to experiment with this feature without having to remove and recreate the EtherChannel.

Furthermore, with DAM, you may choose to create a one-adapter EtherChannel. A one-adapter EtherChannel behaves exactly like a regular adapter; however, should this adapter ever fail, it would be possible to replace it at runtime without ever losing connectivity. To accomplish this, you would add a temporary adapter to the EtherChannel, remove the defective adapter from the EtherChannel, replace the defective adapter with a working one using Hot Plug, add the new adapter to the EtherChannel, and then remove the temporary adapter. During this process you would never notice a loss in connectivity. If the adapter had been working as a standalone adapter, however, it would have had to be detached before being removed using Hot Plug, and during that time any traffic going over it would simply have been lost.

Adding, removing, or changing adapters in an EtherChannel or Link Aggregation
There are two ways to add, remove, or change an adapter in an EtherChannel or Link Aggregation. One method requires the EtherChannel or Link Aggregation interface to be detached, while the other does not (using Dynamic Adapter Membership, which is available in AIX 5.2 with 5200-03 and later).

Making changes to an EtherChannel using Dynamic Adapter Membership
Making changes using Dynamic Adapter Membership does not require you to stop all traffic going over the EtherChannel by detaching its interface. Consider the following before proceeding:

Notes:
When adding an adapter at runtime, note that different Ethernet adapters support different capabilities (for example, the ability to do checksum offload, to use private segments, to do large sends, and so forth). If different types of adapters are used in the same EtherChannel, the capabilities reported to the interface layer are those supported by all the adapters (for example, if all but one adapter supports the use of private segments, the EtherChannel will state it does not support private segments; if all adapters do support large send, the channel will state it supports large send). When adding an adapter to an EtherChannel at runtime, be sure that it supports at least the same capabilities as the other adapters already in the EtherChannel. If you attempt to add an adapter that does not support all the capabilities the EtherChannel supports, the addition will fail. Note, however, that if the EtherChannel's interface is detached, you may add any adapter (regardless of which capabilities it supports), and when the interface is reactivated the EtherChannel will recalculate which capabilities it supports based on the new list of adapters. 
If you are not using an alternate address and you plan to delete the adapter whose MAC address was used for the EtherChannel (the MAC address used for the EtherChannel is "owned" by one of the adapters), the EtherChannel will use the MAC address of the next adapter available (in other words, the one that becomes the first adapter after the deletion, or the backup adapter in case all main adapters are deleted). For example, if an EtherChannel has main adapters ent0 and ent1 and backup adapter ent2, it will use by default ent0's MAC address (it is then said that ent0 "owns" the MAC address). If ent0 is deleted, the EtherChannel will then use ent1's MAC address. If ent1 is then deleted, the EtherChannel will use ent2's MAC address. If ent0 were later re-added to the EtherChannel, it will continue to use ent2's MAC address because ent2 is now the owner of the MAC address. If ent2 were then deleted from the EtherChannel, it would start using ent0's MAC address again. 
Deleting the adapter whose MAC address was used for the EtherChannel may cause a temporary loss of connectivity, because all the adapters in the EtherChannel need to be reset so they learn of their new hardware address. Some adapters take a few seconds to be initialized.

If your EtherChannel is using an alternate address (a MAC address you specified), it will keep using this MAC address regardless of which adapters are added or deleted. Furthermore, it means that there will be no temporary loss of connectivity when adding or deleting adapters because none of the adapters "owns" the EtherChannel's MAC address.

Almost all EtherChannel attributes can now be modified at runtime. The only exception is Enable Gigabit Ethernet Jumbo Frames. To modify the Enable Gigabit Ethernet Jumbo Frames attribute, you must first detach the EtherChannel's interface before attempting to modify this value. 
For any attribute that cannot be changed at runtime (currently, only Enable Gigabit Ethernet Jumbo Frames), there is a field called Apply change to DATABASE only. If this attribute is set to yes, it is possible to change, at runtime, the value of an attribute that usually cannot be modified at runtime. With the Apply change to DATABASE only field set to yes the attribute will only be changed in the ODM and will not be reflected in the running EtherChannel until it is reloaded into memory (by detaching its interface, using rmdev -l EtherChannel_device and then mkdev -l EtherChannel_device commands), or until the machine is rebooted. This is a convenient way of making sure that the attribute is modified the next time the machine boots, without having to disrupt the running EtherChannel. 
To make changes to the EtherChannel or Link Aggregation using Dynamic Adapter Membership, follow these steps:

At the command line, type smit etherchannel. 
Select Change / Show Characteristics of an EtherChannel / Link Aggregation. 
Select the EtherChannel or Link Aggregation that you want to modify. 
Fill in the required fields according to the following guidelines: 
In the Add adapter or Remove adapter field, select the Ethernet adapter you want to add or remove. 
In the Add backup adapter or Remove backup adapter fields, select the Ethernet adapter you want to start or stop using as a backup. 
Almost all the EtherChannel attributes may be modified at runtime, although the Enable Gigabit Ethernet Jumbo Frames attribute cannot. 
To turn a regular EtherChannel into an IEEE 802.3ad Link Aggregation, change the Mode attribute to 8023ad. To turn an IEEE 802.3ad Link Aggregation into an EtherChannel, change the Mode attribute to standard or round_robin.
Fill in the necessary data, and press Enter.
Making changes on AIX 5.2 with 5200-01 and earlier
Follow these steps to detach the interface before making changes:

Type ifconfig interface detach, where interface is your EtherChannel's interface. 
On the command line type, smit etherchannel. 
Select Change / Show Characteristics of an EtherChannel / Link Aggregation and press Enter. 
Select the EtherChannel or Link Aggregation that you want to modify. 
Modify the attributes you want to change in your EtherChannel or Link Aggregation and press Enter. 
Fill in the necessary fields and press Enter.
Remove an EtherChannel or Link Aggregation
Type ifconfig interface detach, where interface is your EtherChannel's interface. 
On the command line type smit etherchannel. 
Select Remove an EtherChannel / and press Enter. 
Select the EtherChannel that you want to remove and press Enter.
Configure or remove a backup adapter on an existing EtherChannel or Link Aggregation
The following procedure configures or removes a backup adapter on an EtherChannel or Link Aggregation. This option is available only in AIX 5.2 and later.

Type ifconfig interface detach, where interface is your EtherChannel's or Link Aggregation's interface. 
On the command line, type smit etherchannel. 
Select Change / Show Characteristics of an EtherChannel / Link Aggregation. 
Select the EtherChannel or Link Aggregation that you are adding or modifying the backup adapter on. 
Enter the adapter that you want to use as your backup adapter in the Backup Adapter field, or select NONE if you wish to stop using the backup adapter.
Troubleshooting EtherChannel
If you are having trouble with your EtherChannel, consider the following:

Tracing EtherChannel
Use tcpdump and iptrace to troubleshoot the EtherChannel. The trace hook id for the transmission packets is 2FA and for other events is 2FB. You cannot trace receive packets on the EtherChannel as a whole, but you can trace each adapter's receive trace hooks.

Viewing EtherChannel Statistics
Use the entstat command to get the aggregate statistics of all the adapters in the EtherChannel. For example, entstat ent3 will display the aggregate statistics of ent3. Adding the -d flag will also display the statistics of each adapter individually. For example, typing entstat -d ent3 will show you the aggregate statistics of the EtherChannel as well as the statistics of each individual adapter in the EtherChannel.

Note:
In the General Statistics section, the number shown in Adapter Reset Count is the number of failovers. In EtherChannel backup, coming back to the main EtherChannel from the backup adapter is not counted as a failover. Only failing over from the main channel to the backup is counted. 
In the Number of Adapters field, the backup adapter is counted in the number displayed.

Improving Slow Failover
If the failover time when you are using network interface backup mode or EtherChannel backup is slow, verify that your switch is not running the Spanning Tree Protocol (STP). When the switch detects a change in its mapping of switch port to MAC address, it runs the spanning tree algorithm to see if there are any loops in the network. Network Interface Backup and EtherChannel backup may cause a change in the port to MAC address mapping.

Switch ports have a forwarding delay counter that determines how soon after initialization each port should begin forwarding or sending packets. For this reason, when the main channel is re-enabled, there is a delay before the connection is re-established, whereas the failover to the backup adapter is faster. Check the forwarding delay counter on your switch and make it as small as possible so that coming back to the main channel occurs as fast as possible.

For the EtherChannel backup function to work correctly, the forwarding delay counter must not be more than 10 seconds, or coming back to the main EtherChannel might not work correctly. Setting the forwarding delay counter to the lowest value allowed by the switch is recommended.

Adapters not Failing Over
If adapter failures are not triggering failovers and you are running AIX 5.2 with 5200-01 or earlier, check to see if your adapter card needs to have link polling enabled to detect link failure. Some adapters cannot automatically detect their link status. To detect this condition, these adapters must enable a link polling mechanism that starts a timer that periodically verifies the status of the link. Link polling is disabled by default. For EtherChannel to work correctly with these adapters, however, the link polling mechanism must be enabled on each adapter before the EtherChannel is created. If you are running AIX 5.2 with 5200-03 and later, the link polling is started automatically and this cannot be an issue.

Adapters that have a link polling mechanism have an ODM attribute called poll_link, which must be set to yes for the link polling to be enabled. Before creating the EtherChannel, use the following command on every adapter to be included in the channel:

smit chgenet
Change the Enable Link Polling value to yes and press Enter.

Using Jumbo Frames
For the jumbo frames option to work properly in AIX 5.2 and earlier, aside from enabling the use_jumbo_frame attribute on the EtherChannel, you must also enable jumbo frames on each adapter before creating the EtherChannel using the following command:

smitty chgenet
Change the Enable Jumbo Frames value to yes and press Enter. On AIX 5.2 and later, jumbo frames are enabled automatically in every underlying adapter when it is set to yes.

Remote Dump
Remote dump is not supported over an EtherChannel.

IEEE 802.3ad Link Aggregation
IEEE 802.3ad is a standard way of doing link aggregation. Conceptually, it works the same as EtherChannel in that several Ethernet adapters are aggregated into a single virtual adapter, providing greater bandwidth and protection against failures. For example, ent0 and ent1 can be aggregated into an IEEE 802.3ad Link Aggregation called ent3; interface en3 would then be configured with an IP address. The system considers these aggregated adapters as one adapter. Therefore, IP is configured over them as over any Ethernet adapter.

Like EtherChannel, IEEE 802.3ad requires support in the switch. Unlike EtherChannel, however, the switch does not need to be configured manually to know which ports belong to the same aggregation.

The advantages of using IEEE 802.3ad Link Aggregation instead of EtherChannel are that it creates the link aggregations in the switch automatically, and that it allows you to use switches that support the IEEE 802.3ad standard but do not support EtherChannel.

In IEEE 802.3ad, the Link Aggregation Control Protocol (LACP) automatically tells the switch which ports should be aggregated. When an IEEE 802.3ad aggregation is configured, Link Aggregation Control Protocol Data Units (LACPDUs) are exchanged between the server machine and the switch. LACP will let the switch know that the adapters configured in the aggregation should be considered as one on the switch without further user intervention.

Although the IEEE 802.3ad specification does not allow the user to choose which adapters are aggregated, the AIX implementation does allow the user to select the adapters. According to the specification, the LACP determines, completely on its own, which adapters should be aggregated together (by making link aggregations of all adapters with similar link speeds and duplexity settings). This prevents you from deciding which adapters should be used standalone and which ones should be aggregated together. The AIX implementation gives you control over how the adapters are used, and it never creates link aggregations arbitrarily.

To be able to aggregate adapters (meaning that the switch will allow them to belong to the same aggregation) they must be of the same line speed (for example, all 100 Mbps, or all 1 Gbps) and they must all be full duplex. If you attempt to place adapters of different line speeds or different duplex modes, the creation of the aggregation on the AIX system will succeed, but the switch may not aggregate the adapters together. If the switch does not successfully aggregate the adapters together, you may notice a decrease in network performance. For information on how to determine whether an aggregation on a switch has succeeded, see Troubleshooting IEEE 802.3ad.

According to the IEEE 802.3ad specification, packets going to the same IP address are all sent over the same adapter. Thus, when operating in 8023ad mode, the packets will always be distributed in the standard fashion, never in a round-robin fashion.

The backup adapter feature is available for IEEE 802.3ad Link Aggregations just as it is for EtherChannel. The backup adapter does not need to be connected to an IEEE 802.3ad-enabled switch, but if it is, the backup adapter will still follow the IEEE 802.3ad LACP.

You can also configure an IEEE 802.3ad Link Aggregation if the switch supports EtherChannel but not IEEE 802.3ad. In that case, you would have to manually configure the ports as an EtherChannel on the switch (just as if a regular EtherChannel had been created). By setting the mode to 8023ad, the aggregation will work with EtherChannel-enabled as well as IEEE 802.3ad-enabled switches. For more information about interoperability, see Interoperability Scenarios.

Note:
The steps to enable the use of IEEE 802.3ad varies from switch to switch. You should consult the documentation for your switch to determine what initial steps, if any, must be performed to enable LACP in the switch.
For information in how to configure an IEEE 802.3ad aggregation, see Configuring IEEE 802.3ad Link Aggregation.

Considerations
Consider the following before configuring an IEEE 802.3ad Link Aggregation:

Although not officially supported, the AIX implementation of IEEE 802.3ad will allow the Link Aggregation to contain adapters of different line speeds; however, you should only aggregate adapters that are set to the same line speed and are set to full duplex. This will help avoid potential problems configuring the Link Aggregation on the switch. Refer to your switch's documentation for more information on what types of aggregations your switch allows. 
If you will be using 10/100 Ethernet adapters in the Link Aggregation on AIX 5.2 with 5200-01 and earlier, you need to enable link polling on those adapters before you add them to the aggregation. Type smitty chgenet at the command line. Change the Enable Link Polling value to yes, and press Enter. Do this for every 10/100 Ethernet adapter that you will be adding to your Link Aggregation. 
Note:
In AIX 5.2 with 5200-03 and later, enabling the link polling mechanism is not necessary. The link poller will be started automatically.

Configuring IEEE 802.3ad Link Aggregation
Follow these steps to configure an IEEE 802.3ad Link Aggregation:

Type smit etherchannel at the command line. 
Select Add an EtherChannel / Link Aggregation from the list and press Enter. 
Select the primary Ethernet adapters that you want on your Link Aggregation and press Enter. If you are planning to use a backup adapter, do not select the adapter that you plan to use for the backup at this point. The backup adapter option is available in AIX 5.2 and later. 
Note:
The Available Network Adapters displays all Ethernet adapters. If you select an Ethernet adapter that is already being used (has an interface defined), you will get an error message. You first need to detach these interfaces if you want to use them.
Enter the information in the fields according to the following guidelines: 
EtherChannel / Link Aggregation Adapters: You should see all primary adapters that you are using in your Link Aggregation. You selected these adapters in the previous step. 
Enable Alternate Address: This field is optional. Setting this to yes will enable you to specify a MAC address that you want the Link Aggregation to use. If you set this option to no, the Link Aggregation will use the MAC address of the first adapter. 
Alternate Address: If you set Enable Alternate Address to yes, specify the MAC address that you want to use here. The address you specify must start with 0x and be a 12-digit hexadecimal address (for example, 0x001122334455). 
Enable Gigabit Ethernet Jumbo Frames: This field is optional. In order to use this, your switch must support jumbo frames. This will only work with a Standard Ethernet (en) interface, not an IEEE 802.3 (et) interface. Set this to yes if you want to enable it. 
Mode: Enter 8023ad. 
Hash Mode: You can choose from the following hash modes, which will determine which data value will be used by the algorithm to determine the outgoing adapter: 
default: In this hash mode the destination IP address of the packet will be used to determine the outgoing adapter. For non-IP traffic (such as ARP), the last byte of the destination MAC address is used to do the calculation. This mode will guarantee packets are sent out over the EtherChannel in the order they were received, but it may not make full use of the bandwidth. 
src_port: In this hash mode the source UDP or TCP port value of the packet will be used to determine the outgoing adapter. If the packet is not UDP or TCP traffic, the last byte of the destination IP address will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used. 
dst_port: In this hash mode the destination UDP or TCP port value of the packet will be used to determine the outgoing adapter. If the packet is not UDP or TCP traffic, the last byte of the destination IP will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used. 
src_dst_port: In this hash mode both the source and destination UDP or TCP port values of the packet will be used to determine the outgoing adapter (specifically, the source and destination ports are added and then divided by two before being fed into the algorithm). If the packet is not UDP or TCP traffic, the last byte of the destination IP will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used. This mode can give good packet distribution in most situations, both for clients and servers.
To learn more about packet distribution and load balancing, see Load-balancing options. 
Backup Adapter: This field is optional. Enter the adapter that you want to use as your backup. The backup adapter option is available in AIX 5.2 and later. 
Internet Address to Ping: This field is optional, and only available if you have only one adapter in the main aggregation and a backup adapter. The Link Aggregation will ping the IP address or host name that you specify here. If the Link Aggregation is unable to ping this address for the Number of Retries times in Retry Timeout intervals, the Link Aggregation will switch adapters. 
Number of Retries: Enter the number of ping response failures that are allowed before the Link Aggregation switches adapters. The default is three. This field is optional and valid only if you have set an Internet Address to Ping. 
Retry Timeout: Enter the number of seconds between the times when the Link Aggregation will ping the Internet Address to Ping. The default is one second. This field is optional and valid only if you have set an Internet Address to Ping.
Press Enter after changing the desired fields to create the Link Aggregation. 
Configure IP over the newly-created Link Aggregation device by typing smit chinet at the command line. 
Select your new Link Aggregation interface from the list. 
Fill in all the required fields and press Enter.
Managing IEEE 802.3ad
For management tasks that can be performed on an IEEE 802.3ad Link Aggregation after configuration, see Managing EtherChannel and IEEE 802.3ad Link Aggregation.

Troubleshooting IEEE 802.3ad
If you are having trouble with your IEEE 802.3ad Link Aggregation, use the following command to verify the mode of operation of the Link Aggregation:

entstat -d device
where device is the Link Aggregation device.

This will also make a best-effort determination of the status of the progress of LACP based on the LACPDUs received from the switch. The following status values are possible:

Inactive: LACP has not been initiated. This is the status when a Link Aggregation has not yet been configured, either because it has not yet been assigned an IP address or because its interface has been detached. 
Negotiating: LACP is in progress, but the switch has not yet aggregated the adapters. If the Link Aggregation remains on this status for longer than one minute, verify that the switch is correctly configured. For instance, you should verify that LACP is enabled on the ports. 
Aggregated: LACP has succeeded and the switch has aggregated the adapters together. 
Failed: LACP has failed. Some possible causes are that the adapters in the aggregation are set to different line speeds or duplex modes or that they are plugged into different switches. Verify the adapters' configuration. 
In addition, some switches allow only contiguous ports to be aggregated and may have a limitation on the number of adapters that can be aggregated. Consult the switch documentation to determine any limitations that the switch may have, then verify the switch configuration.

Note:
The Link Aggregation status is a diagnostic value and does not affect the AIX side of the configuration. This status value was derived using a best-effort attempt. To debug any aggregation problems, it is best to verify the switch's configuration.
Interoperability Scenarios
The following table shows several interoperability scenarios. Consider these scenarios when configuring your EtherChannel or IEEE 802.3ad Link Aggregation. Additional explanation of each scenario is given after the table.

Table 17. Different AIX and switch configuration combinations and the results each combination will produce. EtherChannel mode Switch configuration Result 
8023ad IEEE 802.3ad LACP OK - AIX initiates LACPDUs, which triggers an IEEE 802.3ad Link Aggregation on the switch. 
standard or round_robin EtherChannel OK - Results in traditional EtherChannel behavior. 
8023ad EtherChannel OK - Results in traditional EtherChannel behavior. AIX initiates LACPDUs, but the switch ignores them. 
standard or round_robin IEEE 802.3ad LACP Undesirable - Switch cannot aggregate. The result may be poor performance as the switch moves the MAC address between switch ports 

8023ad with IEEE 802.3ad LACP: 
This is the most common IEEE 802.3ad configuration. The switch can be set to passive or active LACP.

standard or round_robin with EtherChannel: 
This is the most common EtherChannel configuration.

8023ad with EtherChannel: 
In this case, AIX will send LACPDUs, but they will go unanswered because the switch is operating as an EtherChannel. However, it will work because the switch will still treat those ports as a single link.

Note:
In this case, the entstat -d command will always report the aggregation is in the Negotiating state.
standard or round_robin with IEEE 802.3ad LACP: 
This setup is invalid. If the switch is using LACP to create an aggregation, the aggregation will 
never happen because AIX will never reply to LACPDUs. For this to work correctly, 8023ad should be 
the mode set on AIX.


Note 5:
-------

Internet Protocol over Fibre Channel
Beginning with AIX 5.2 with 5200-03, IP packets can be sent over a physical fibre-channel connection. 
After a system is configured to use IP over Fibre Channel, its network activity will function just as 
if an Ethernet or Token-Ring adapter were being used.

In order to use IP over Fibre Channel, your system must have a Fibre Channel switch and either 
the 2 Gigabit Fibre Channel Adapter for 64-bit PCI Bus or the 2 Gigabit Fibre Channel PCI-X Adapter.

In addition, the following filesets must be installed:

devices.common.ibm.fc 
devices.pci.df1000f7 
devices.pci.df1080f9 
devices.pci.df1000f9

Configuring IP over Fiber Channel
The following procedure will lead you through a configuration of IP over Fibre Channel. 
The Fibre Channel IP device driver must first be enabled. Following the enablement, 
the cfgmgr command will be run to create the Fibre Channel interface. After the interface is created, 
the network attributes (such as its IP address, Network Mask, Nameserver, and Gateway) will be assigned.

Enable the Fibre Channel IP Device Driver
By default, the Fibre Channel IP device is not enabled. To enable this device, follow these steps:

From the command line, type smit dev. 
Select FC Adapter. 
Select FC Network Protocol Device. 
Select Enable a FC Network Device. 
Select the adapter that is going to be enabled. 
Use the cfgmgr command to create the Fibre Channel interface. 
See cfgmgr in AIX 5L Version 5.3 Commands Reference.

Assign network properties to Fibre Channel interface

After the adapter has been enabled, IP needs to be configured over it. Follow these steps to configure IP:

From the command line, type smit tcpip. 
Select Minimum Configuration & Startup. 
Select the interface that you want to configure. In this case, it will be fcx, where x is the 
minor number of the interface. 
Assign all required attributes.
After the IP attributes have been assigned, verify that the changes took place by tying the 
following command at the command line:

ifconfig -a
If your configuration was successful, you will see results similar to the following among the results:

fc1: flags=e000843 <UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN>
        inet 11.11.11.18 netmask 0xffffff00 broadcast 11.11.11.255
Additionally, you can run the following command:

ifconfig fcx
where x is the minor number of the interface.


67. More on Memory in AIX:
==========================


67.1 Show memory in AIX: 
------------------------

# bootinfo -r
# lsattr -El sys0 -a realmem 
# ps -eo user,pid,pcpu,vsz,time,args    (vsz gives size per process)


To look at your virtual memory and its causes, you can use a combination of: 
  
# ipcs -bm  (shared memory) 
# lsps -a   (paging) 
# vmstat -v (shows all current values)
# vmo -a    (virtual memory options) 
# vmo -L    (all tunable VMM options and values)
# svmon -G  (basic memory allocations) 
# svmon -U  (virtual memory usage by user) 




67.2 AIX Memory Tune-ables:
---------------------------

Through Environment variables: See section 9 below.

Otherwise:

- vmtune command in AIX lower than AIX 5L, like AIX 4.1:
--------------------------------------------------------

The vmtune command can be used to modify the VMM parameters that control the behavior of the memory-management 
subsystem. Some options are available to alter the defaults for LVM and file systems; the options dealing 
with disk I/O are discussed in the following sections. 

To determine whether the vmtune command is installed and available, run the following command: 

# lslpp -lI bos.adt.samples

The executable program for the vmtune command is found in the /usr/samples/kernel directory. 
The vmtune command can only be executed by the root user. Changes made by this tool remain in place until 
the next reboot of the system. If a permanent change is needed, place an appropriate entry in the /etc/inittab file. 
For example: 

vmtune:2:wait:/usr/samples/kernel/vmtune -P 50

Note: The vmtune command is in the samples directory because it is VMM-implementation dependent. 
The vmtune code that accompanies each release of the operating system is tailored specifically to the VMM 
in that release. Running the vmtune command from one release on a system with a different VMM release might
result in an operating system failure. It is also possible that the functions of the vmtune command may change 
from release to release. Be sure to review the appropriate tuning information before using the vmtune command 
to change system parameters. 

How to use the vmtune command? Use vmtune with a flag representing the parameter you want to change,
for example "maxfree". 

maxfree
Purpose: The maximum size to which the VMM page-frame free list will grow by page stealing. 
Values: Default: configuration-dependent, Range: 16 to 204800 (4KB frames) 
Display: vmtune 
Change: vmtune -F NewValue  


- tuning commands in AIX 5L:
----------------------------

Introduction 
By default, AIX is tuned for a mixed workload, and will grow its VMM file cache up to 80% of physical RAM. 
While this may be great for an NFS server, SMTP relay or web server, it is very poor for running any application 
which does its own cache management. This includes most databases (Oracle, DB2, Sybase, PostgreSQL, 
MySQL using InnoDB tables, TSM) and some other software (eg. the Squid web cache). 

Common symptoms include high paging (high pgspin and pgspout in topas), high system CPU time, 
the lrud kernel thread using CPU, slow overall system throughput, slow backups and slow process startup. 

For most database systems, the ideal solution is to use raw logical volumes. If this is not acceptable, 
then direct I/O and concurrent I/O should be used. If for some reason this is not possible, then the last solution 
is to tune the AIX file caches to be less aggressive. 

Parameters 
The three main parameters that should be tuned are those controlling the size of the persistent file cache 
(minperm% and maxperm%) used for JFS filesystems, and the client file cache (maxclient%) used by 
NFS, CDRFS and JFS2 filesystems 

- numperm% 
  Defines the current size of the persistent file cache. 
- minperm% 
  Defines the minimum amount of RAM the persistent file cache may occupy. If numperm% is less than or equal 
  to minperm%, file pages will not be stolen when RAM is required. 
- maxperm% 
  Defines the maximum amount of RAM the persistent file cache may occupy before it is used as the sole source 
  of new pages by the page stealing algorithm. By default, numperm% may exceed maxperm% if there is 
  free memory available. The setting strict_maxperm may be set to one to change maxperm% into a hard limit, 
  guaranteeing numperm% will never exceed maxperm%. 
- strict_maxperm 
  As above, if set to 1, changes maxperm% into a hard limit. 
- numclient% 
  Defines the current size of the client file cache. 
- maxclient% 
  Defines the hard maximum size of the client file cache. 
- strict_maxclient 
  Introduced in 5.2 ML4, allows the changing of maxclient% into a soft limit, similar to strict_maxperm. 

Note that maxclient% may never exceed maxperm%. In later versions of vmtune, this is enforced by changing both 
parameters if necessary. 

Note: AIX 5.2 includes a compatibilty version of vmtune. It is probably most wise to become familiar with 
the new tools, instead of relying on the backwards compatibility commands. 

The main tool to use is /usr/sbin/vmo, installed as part of the bos.perf.tune fileset. To display current 
cache sizes (numperm% and numclient%) use vmstat -v. 

vmo can change both persistent (reboot) values as well as runtime values, and so does not need to be 
present in the startups. It stores the persistent values in the /etc/tunables/nextboot file. 

Current values and characteristics may be displayed using: 

# vmo -L
NAME                      CUR    DEF    BOOT   MIN    MAX    UNIT           TYPE
     DEPENDENCIES
--------------------------------------------------------------------------------
memory_frames             512K          512K                 4KB pages         S
--------------------------------------------------------------------------------
pinnable_frames           427718        427718               4KB pages         S
--------------------------------------------------------------------------------
maxfree                   128    128    128    16     200K   4KB pages         D
     minfree
     memory_frames
...


Kernel tuning parameters:

AIX 5.2 introduces a new method that is more flexible and centralized for setting most of the AIX kernel 
tuning parameters. It is now possible to make permanent changes without having to edit any rc files. 
This is achieved by placing the reboot values for all tunable parameters in a new stanza file, 
/etc/tunables/nextboot. When the machine is rebooted, the values in that file are automatically applied. 
Another stanza file, /etc/tunables/lastboot is automatically generated with all the values as they were set 
just after the reboot. This provides the capability to return to those values at any time. The log file for 
any changes made or impossible to make during reboot is stored in /etc/tunables/lastboot.log. There are sets 
of SMIT panels and a WebSm plug-in also available to manipulate current and reboot values for all tuning 
parameters as well as the files in the /etc/tunables directory.

There are four new commands introduced in AIX 5.2 to modify the tunables files. The tunsave command is used 
to save values to a stanza file. The tunrestore command is used to apply a file, for example, to change all 
tunables parameter values to those listed in a file. The command tuncheck must be used to validate a file 
created manually and the tundefault command is available to reset tunable parameters to their default values. 
All four commands work on both current and reboot tunables parameters values. See the respective man pages 
for more information.

Modifications to vmtune and schedtune
Vmtune and schedtune are being replaced by the newly supported commands called "vmo", "ioo", and "schedo". 
Both vmo and ioo together replace vmtune, while schedo replaces schedtune. All existing parameters are 
covered by the new commands.

The ioo command will handle all the I/O related tuning parameters, while the vmo command will handle 
all the other VMM parameters previously managed by vmtune. All three commands are part of the new fileset 
"bos.perf.tune" which also contains tunsave, tunrestore, tuncheck, and tundefault. 
The bos.adt.samples fileset will still include the vmtune and schedtune commands, which will simply 
be compatibility shell scripts calling vmo, ioo, and schedo as appropriate. The compatibility scripts 
only support changes to parameters which can be changed interactively. That is, parameters that need bosboot 
and then require a reboot of the machine to be effective are no longer supported by the vmtune script. 
To change those parameters, users must now use vmo -r. The options (all from vmtune) and parameters 
in question are as follows:

vmtune option     parameter name    new command 
-C 0|1            page coloring     vmo -r -o pagecoloring=0|1 

-g n1 
-L n2 large page size 
number of large pages to reserve vmo -r -o lpg_size=n1 -o lpg_regions=n2 
-m n memory pools vmo -r -o mempools=n 
-v n number of frames per memory pool vmo -r -o framesets=n 
-i n interval for special data segment identifiers vmo -r -o spec_dataseg_int=n 
-V n number of special data segment identifiers to reserve vmo -r -o num_spec_dataseg 
-y 0|1 p690 memory affinity vmo -r -o memory_affinity=0|1 

Enhancements to no and nfso
The no and nfso commands have been enhanced to support making permanent changes to tunable parameters. 
They now interact with the /etc/tunables/nextboot file to achieve this new functionality. 
They both also have a new -h flag which can be used to display help about any parameter. The content of the 
help includes the purpose of the parameter, the possible values (default, range and type), and diagnostic 
and tuning information to decide when to change the parameter value. This information is also listed entirely 
in the respective man pages. Note that all five tuning commands (ioo, nfso, no, vmo, and schedo) use 
the same common syntax. See the respective man pages for more details and also the complete list of 
tuning parameters supported.

-- The vmo command:
-------------------

Purpose
Manages Virtual Memory Manager tunable parameters.

Syntax
vmo [ -p | -r ] { -o Tunable [= Newvalue]}

vmo [ -p | -r ] {-d Tunable }

vmo [ -p | -r ] -D

vmo [ -p | -r ] -a

vmo -?

vmo -h [ Tunable ]

vmo -L [ Tunable ]

vmo -x [ Tunable ]

Note:
Multiple -o, -d, -x and -L are allowed.

Description
Note:
The vmo command can only be executed by root.
Use the vmo command to configure Virtual Memory Manager tuning parameters. This command sets or displays 
current or next boot values for all Virtual Memory Manager tuning parameters. This command can also make 
permanent changes or defer changes until the next reboot. Whether the command sets or displays a parameter 
is determined by the accompanying flag. The -o flag performs both actions. It can either display the 
value of a parameter or set a new value for a parameter.

The Virtual Memory Manager (VMM) maintains a list of free real-memory page frames. These page frames are
available to hold virtual-memory pages needed to satisfy a page fault. When the number of pages on the 
free list falls below that specified by the minfree parameter, the VMM begins to steal pages to add to 
the free list. The VMM continues to steal pages until the free list has at least the number of pages 
specified by the maxfree parameter.

If the number of file pages (permanent pages) in memory is less than the number specified by the 
minperm% parameter, the VMM steals frames from either computational or file pages, regardless 
of repage rates. If the number of file pages is greater than the number specified by the maxperm% parameter, 
the VMM steals frames only from file pages. Between the two, the VMM normally steals only file pages, 
but if the repage rate for file pages is higher than the repage rate for computational pages, 
computational pages are stolen as well.

You can also modify the thresholds that are used to decide when the system is running out of paging space. 
The npswarn parameter specifies the number of paging-space pages available at which the system begins 
warning processes that paging space is low. The npskill parameter specifies the number of paging-space 
pages available at which the system begins killing processes to release paging space.


maxfree and minfree:

The purpose of the free list is to keep track of real-memory page frames released by terminating processes 
and to supply page frames to requestors immediately, without forcing them to wait for page steals 
and the accompanying I/O to complete.

The minfree limit specifies the free-list size below which page stealing to replenish the free list 
is to be started. The maxfree parameter is the size above which stealing ends. In the case of enabling 
strict file cache limits, like the strict_maxperm or strict_maxclient parameters, 
the minfree value is used to start page stealing.


Examples:

1. Configure large pages:

You must configure your system to use large pages and you must also specify the amount of physical memory 
that you want to allocate to back large pages. The system default is to not have any memory allocated 
to the large page physical memory pool. You can use the vmo command to configure the size of the large page 
physical memory pool. The following example allocates 4 GB to the large page physical memory pool:

# vmo -r -o lgpg_regions=64 -o lgpg_size=16777216
To use large pages for shared memory, you must enable the SHM_PIN shmget() system call with the following command, 
which persists across system reboots:

# vmo -p -o v_pinshm=1

To see how many large pages are in use on your system, use the vmstat -l command as in the following example:

# vmstat -l

kthr     memory             page              faults        cpu      large-page 
                                                                                
----- ----------- ------------------------ ------------ ----------- ------------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa   alp   flp 
 2  1 52238 124523   0   0   0   0    0   0 142   41  73  0  3 97  0     16     16

From the above example, you can see that there are 16 active large pages, alp, and 16 free large pages, flp.


2. Tuning Examples:

Show Virtual Memory Tuning parameters: 
vmo -L

Show min and max values controlling file I/O cache: 
vmo -L minperm% -L maxperm%  -L maxclient%

Permanently adjust these values: 
vmo -p -o minperm%=5 -o maxperm%=20  -o maxclient%=20

Show Filesystem Tuning parameters: 
ioo -L

Show Network Tuning parameters: 
no -a

Show NFS Tuning paramters: 
nfso -LChange/

Show Kernel operating parameters: 

smitty chgsys

Enable Asynchronous I/O: 
smitty aio


Another example:
----------------

Suppose we have an Oracle DB instance on an AIX 5.3 machine. What is the best and simplest way
to tune the memory so its optimized for Oracle?

Take a look at the cache:

root@zd111l04:/root#vmo -L minperm% -L maxperm%  -L maxclient%
NAME                      CUR    DEF    BOOT   MIN    MAX    UNIT           TYPE
     DEPENDENCIES
--------------------------------------------------------------------------------
maxclient%                80     80     80     1      100    % memory          D
     maxperm%
     minperm%
--------------------------------------------------------------------------------
maxperm%                  80     80     80     1      100    % memory          D
     minperm%
     maxclient%
--------------------------------------------------------------------------------
minperm%                  20     20     20     1      100    % memory          D
     maxperm%
     maxclient%
--------------------------------------------------------------------------------


# vmo -p -o minperm%=5            # Was 20
# vmo -p -o maxclient%=10         # Was 80
# vmo -p -o maxperm%=10           # Was 80


67.3 Websphere and AIX Memory:
------------------------------

67.3.1 Errors you may find in Websphere logs

1. java.lang.OutOfMemory
2. javax.naming.NameNotFoundException
3. javax.servlet.ServletException
4. java.lang.StringIndexOutOfBoundsException
5. java.net.SocketException
6. java.io.IOException
7. java.io.FileNotFoundException
8. java.util.MissingResourceException
9. java.lang.ClassNotFoundException
10.java.lang.StringIndexOutOfBoundsException
11.java.io.InterruptedIOException
12.com.splwg.cis.common.NestedRuntimeException


The number that is associated with action determines the type of garbage
collection that is being done:
action=1 means a preemptive garbage collection cycle.
action=2 means a full allocation failure.
action=3 means that a heap expansion takes place.
action=4 means that all known soft references are cleared.
action=5 means that stealing from the transient heap is done.
action=6 means that free space is very low.


Note 1 on java.lang.OutOfMemory
-------------------------------

The Java process has two memory areas: the Java heap, and the "native heap", 
which combine total the memory usage of the process. 
The Java heap is controlled via the -Xms and -Xmx setting, and the space 
available to the native heap is that which isn't used by the Java heap. 


The act of reducing the maximum Java heap size has made the "native heap" 
bigger, and this is the area that was memory constrained. 
We know this because the OutOfMemoryError was generated the message informed 
you that the JVM was unable to allocate a new native stack, this is 
allocated onto the native heap (there is also a Java thread object which is 
created and allocated onto the Java heap). 


It is entirely possible that the amount of "native heap" available to the 
JVM was insufficient to allocate the underlying resources to run the Java 
process under the load that was being driven through it. The native heap is 
now 500MB bigger, and unless there is a memory leak or the load is 
significantly increased, this change should prevent any OutOfMemoryErrors 
based on the native heap. 

Note 2 on java.lang.OutOfMemory
-------------------------------

Hi,

I'm experiment with Tomcat with simple "Hello World" servlet.
When I send 50 concurrent requests, I got java.lang.outOfMemory error.
Tomcat works fine upto 40 concurrent requests for the same servlet.
I'm using Tomcat 3.1M1 with Java 1.2 on Solaris 2.7.

We try to add -mx swith to the Java invocation in tomcat.sh
(line 102)
    $JAVACMD -mx96m org.apache.tomcat.shell.Startup "$@" &
And it still out of memory.

Any suggestion?

Lishin

Hi Lishin

This could be to do with exceeding max file-descriptors - this gave us the
error below (45 connections)

We are running tomcat on Solaris 2.6.  Each new connection uses at least one
socket connection, which is treated as a file-descriptor.  There is a
default limit (user) of 64 file descriptors 

To check this try: 

ulimit -n

To increase this try

ulimit -n <num>

There will be a system limit - for Solaris this is default 1024:

system limit:

ulimit -Hn


I hope this helps - I had a very frustrating time solving this one!

Joe.

Note 3 on java.lang.OutOfMemory
-------------------------------

LDR_CNTRL Purpose: Allows tuning of the kernel loader. 
Values: Default: Not set Possible Values: PREREAD_SHLIB, LOADPUBLIC, IGNOREUNLOAD, USERREGS, MAXDATA, 
DSA, PRIVSEG_LOADS 
Display: echo $LDR_CNTRL 
Change: LDR_CNTRL={PREREAD_SHLIB | LOADPUBLIC| ...} export LDR_CNTRLChange takes effect immediately in this shell. 
Change is effective until logging out of this shell. Permanent change is made by adding the following line to 
the /etc/environment file: LDR_CNTRL={PREREAD_SHLIB | LOADPUBLIC| ...} 
Diagnosis: N/A 
Tuning: The LDR_CNTRL environment variable can be used to control one or more aspects of the system loader behavior. 
You can specify multiple options with the LDR_CNTRL variable. When doing this, separate the options using 
an @ character (that is, LDR_CNTRL=PREREAD_SHLIB@LOADPUBLIC). Specifying the PREREAD_SHLIB option will cause 
entire libraries to be read as soon as they are accessed. With VMM readahead tuned, a library can be read in from disk 
and be cached in memory by the time the program starts to access its pages. While this method can use more memory, 
it can enhance performance of programs that use many shared library pages providing the access pattern 
is non-sequential. (for example, Catia). Specifying the LOADPUBLIC option directs the system loader to load 
all modules requested by an application into the global shared library segment. If a module cannot be loaded 
publicly into the global shared library segment then it is loaded privately for the application. Specifying 
the IGNOREUNLOAD option will cause modules that are marked to be unloaded and used again 
(if the module has not been unloaded already). As a side effect of this option, you can end up with 
two different data instances for the module. Specifying the USERREGS option will tell the system to save 
all general-purpose user registers across system calls made by an application. This can be helpful in 
applications doing garbage collection. Specifying the MAXDATA option sets the maximum heap size for a process, 
including overriding any MAXDATA value specified in an executable. If you want to use Large Program Support 
with a data heap size of 0x30000000, then specify LDR_CNTRL=MAXDATA=0x30000000. To turn off Large Program Support, 
specify LDR_CNTRL=MAXDATA=0. Specifying the DSA (Dynamic Segment Allocation) option tells the system loader 
to run applications using Very Large Program Support. The DSA option is only valid for 32-bit applications. 
Specifying the PRIVSEG_LOADS option directs the system loader to put dynamically loaded private modules into 
the process private segment. This might improve the availability of memory in large memory model applications 
that perform private dynamic loads and tend to run out of memory in the process heap. If the process private segment 
lacks sufficient space, the PRIVSEG_LOADS option has no effect. The PRIVSEG_LOADS option is only valid for 
32-bit applications with a non-zero MAXDATA value. 


Note 4: Java SDK and Websphere for AIX :
----------------------------------------

At 06/06/06, the following versions for Websphere on AIX are frequently found:

5.0.2:
------

5.0.2.x  x in 2-16

5.0.2 ca131-20030618 (sr5) 

5.0.2.1
5.0.2.2
5.0.2.3
5.0.2.4
5.0.2.5
5.0.2.6
5.0.2.7
5.0.2.8
5.0.2.9
5.0.2.10
5.0.2.11
5.0.2.12
5.0.2.13
5.0.2.14
5.0.2.15 SDK is not updated 

5.1:
----

5.1 ca141-20031011 (sr1) 

5.1.1 ca1420-20040626 

5.1.1.1
5.1.1.2
5.1.1.3
5.1.1.4
5.1.1.5
5.1.1.6
5.1.1.7
5.1.1.8
5.1.1.9 SDK is not updated 

6.0:
----

6.0 ca142sr1w-20041028

6.0.0.2
6.0.0.3 SDK is not updated 

6.0.1 ca142sr1a-20050209(SR1a) 

6.0.1.1
6.0.1.2 SDK is not updated 

6.0.2 ca142-20050609 

6.0.2.1
6.0.2.3
6.0.2.5
6.0.2.7 SDK is not updated 

6.0.2.9 How critical is this fix pack? 
Recommended. This fix pack must be installed on top of WebSphere Application Server V6.0.2, 6.0.2.1, 6.0.2.3, 
6.0.2.5, or 6.0.2.7.


68. Kernel parameters AIX:
==========================

- Kernel Tunable Parameters
Following are kernel parameters, grouped into the following sections:

- Scheduler and Memory Load Control Tunable Parameters:

Virtual Memory Manager Tunable Parameters 
Synchronous I/O Tunable Parameters 
Asynchronous I/O Tunable Parameters 
Disk and Disk Adapter Tunable Parameters 
Interprocess Communication Tunable Parameters
Scheduler and Memory Load Control Tunable Parameters
Most of the scheduler and memory load control tunable parameters are fully described in the schedo man page. 
The following are a few other related parameters:

- maxuproc 
Purpose: Specifies the maximum number of processes per user ID. 
Values: Default: 40; Range: 1 to 131072 
Display: lsattr -E -l sys0 -a maxuproc 
Change: chdev -l sys0 -a maxuproc=NewValue 
Change takes effect immediately and is preserved over boot. If value is reduced, then it goes into effect 
only after a system boot. 
Diagnosis: Users cannot fork any additional processes. 
Tuning: This is a safeguard to prevent users from creating too many processes. 

- ncargs 
Purpose: Specifies the maximum allowable size of the ARG/ENV list (in 4KB blocks) when running exec() subroutines. 
Values: Default: 6; Range: 6 to 1024 
Display: lsattr -E -l sys0 -a ncargs 
Change: chdev -l sys0 -a ncargs=NewValue 
Change takes effect immediately and is preserved over boot. 
Diagnosis: Users cannot execute any additional processes because the argument list passed to the exec() 
system call is too long. A low default value might cause some programs to fail with the arg list too long 
error message, in which case you might try increasing the ncargs value with the chdev command above and then 
rerunning the program. 
Tuning: This is a mechanism to prevent the exec() subroutines from failing if the argument list 
is too long. Please note that tuning to a higher ncargs value puts additional constraints on system memory resources. 
 

- Virtual Memory Manager Tunable Parameters:

The complete listing of the virtual memory manager tunable parameters is located in the vmo man page.

- Synchronous I/O Tunable Parameters:

Most of the synchronous I/O tunable parameters are fully described in the ioo man page. 
The following are a few other related parameters:

maxbuf Purpose: Number of (4 KB) pages in the block-I/O buffer cache. 
Values: Default: 20; Range: 20 to 1000 
Display: lsattr -E -l sys0 -a maxbuf 
Change: chdev -l sys0 -a maxbuf=NewValue 
Change is effective immediately and is permanent. If the -T flag is used, the change is immediate and lasts until 
the next boot. If the -P flag is used, the change is deferred until the next boot and is permanent. 
Diagnosis: If the sar -b command shows breads or bwrites with %rcache and %wcache being low, you might want to 
tune this parameter. 
Tuning: This parameter normally has little performance effect on systems, where ordinary I/O does not use the 
block-I/O buffer cache. 
Refer to: Tuning Asynchronous Disk I/O 

maxpout Purpose: Specifies the maximum number of pending I/Os to a file. 
Values: Default: 0 (no checking); Range: 0 to n (n should be a multiple of 4, plus 1) 
Display: lsattr -E -l sys0 -a maxpout 
Change: chdev -l sys0 -a maxpout=NewValue 
Change is effective immediately and is permanent. If the -T flag is used, the change is immediate and lasts 
until the next boot. If the -P flag is used, the change is deferred until the next boot and is permanent. 
Diagnosis: If the foreground response time sometimes deteriorates when programs with large amounts 
of sequential disk output are running, sequential output may need to be paced. 
Tuning: Set maxpout to 33 and minpout to 16. If sequential performance deteriorates unacceptably, 
increase one or both. If foreground performance is still unacceptable, decrease both. 

minpout Purpose: Specifies the point at which programs that have reached maxpout can resume writing to the file. 
Values: Default: 0 (no checking); Range: 0 to n (n should be a multiple of 4 and should be at least 4 less than maxpout) 
Display: lsattr -E -l sys0 -a minpout 
Change: chdev -l sys0 -a minpout=NewValue 
Change is effective immediately and is permanent. If the -T flag is used, the change is immediate and lasts until 
the next boot. If the -P flag is used, the change is deferred until the next boot and is permanent. 
Diagnosis: If the foreground response time sometimes deteriorates when programs with large amounts of sequential 
disk output are running, sequential output may need to be paced. 
Tuning: Set maxpout to 33 and minpout to 16. If sequential performance deteriorates unacceptably, 
increase one or both. If foreground performance is still unacceptable, decrease both. 

mount -o nointegrity Purpose: A new mount option (nointegrity) may enhance local file system performance for 
certain write-intensive applications. This optimization basically eliminates writes to the JFS log. 
Note that the enhanced performance is achieved at the expense of metadata integrity. Therefore, use this 
option with extreme caution because a system crash can make a file system mounted with this option unrecoverable. 
Nevertheless, certain classes of applications do not require file data to remain consistent after a system crash, 
and these may benefit from using the nointegrity option. Two examples in which a nointegrity file system may be 
beneficial is for compiler temporary files, and for doing a nonmigration or mksysb installation. 

Paging Space Size Purpose: The amount of disk space required to hold pages of working storage. 
Values: Default: configuration-dependent; Range: 32 MB to n MB for hd6, 16 MB to n MB for non-hd6 
Display: lsps -a mkps or chps or smitty pgsp 
Change: Change is effective immediately and is permanent. Paging space is not necessarily put into use immediately, however. 
Diagnosis: Run: lsps -a. If processes have been killed for lack of paging space, monitor the situation with the 
psdanger() subroutine. 
Tuning: If it appears that there is not enough paging space to handle the normal workload, 
add a new paging space on another physical volume or make the existing paging spaces larger. 

syncd Interval Purpose: The time between sync() calls by syncd. 
Values: Default: 60; Range: 1 to any positive integer 
Display: grep syncd /sbin/rc.boot vi /sbin/rc.boot or 
Change: Change is effective at next boot and is permanent. An alternate method is to use the kill command 
to terminate the syncd daemon and restart it from the command line with the command /usr/sbin/syncd interval. 
Diagnosis: I/O to a file is blocked when syncd is running. 
Tuning: At its default level, this parameter has little performance cost. No change is recommended. Significant 
reductions in the syncd interval in the interests of data integrity (as for HACMPT) could have adverse performance 
consequences. 

Asynchronous I/O Tunable Parameters
maxreqs Purpose: Specifies the maximum number of asynchronous I/O requests that can be outstanding at any one time. 
Values: Default: 4096; Range: 1 to AIO_MAX (/usr/include/sys/limits.h) 
Display: lsattr -E -l aio0 -a maxreqs 
Change: chdev -l aio0 -a maxreqs=NewValue 
Change is effective after reboot and is permanent. 
Diagnosis: N/A 
Tuning: This includes requests that are in progress, as well as those that are waiting to be started. 
The maximum number of asynchronous I/O requests cannot be less than the value of AIO_MAX, as defined 
in the /usr/include/sys/limits.h file, but can be greater. It would be appropriate for a system 
with a high volume of asynchronous I/O to have a maximum number of asynchronous I/O requests larger than AIO_MAX. 
Refer to: Tuning Asynchronous Disk I/O 

maxservers Purpose: Specifies the maximum number of AIO kprocs per processor. 
Values: Default: 10 per processor 
Display: lsattr -E -l aio0 -a maxservers 
Change: chdev -l aio0 -a maxservers=NewValue 
Change is effective after reboot and is permanent. 
Diagnosis: N/A 
Tuning: This value limits the number of concurrent asynchronous I/O requests. The value should be about the same as the expected number of concurrent AIO requests. This tunable parameter only affects AIO on JFS file systems (or Virtual Shared Disks (VSD) before AIX 4.3.2). 
Refer to: Tuning Asynchronous Disk I/O 

minservers Purpose: Specifies the number of AIO kprocs that will be created when the AIO kernel extension is loaded. 
Values: Default: 1 
Display: lsattr -E -l aio0 -a maxservers 
Change: chdev -l aio0 -a minservers=NewValue 
Change is effective after reboot and is permanent. 
Diagnosis: N/A 
Tuning: Making this a large number is not recommended, because each process takes up some memory. 
Leaving this number small is acceptable in most cases because AIO will create additional kprocs 
up to maxservers as needed. This tunable is only effective for AIO on JFS file systems (or VSDs before AIX 4.3.2). 
Refer to: Tuning Asynchronous Disk I/O 

Disk and Disk Adapter Tunable Parameters
Disk Adapter Outstanding-Requests Limit Purpose: Maximum number of requests that can be outstanding on a SCSI bus. (Applies only to the SCSI-2 Fast/Wide Adapter.) 
Values: Default: 40; Range: 40 to 128 
Display: lsattr -E -l scsin -a num_cmd_elems 
Change: chdev -l scsin -a num_cmd_elems=NewValue 
Change is effective immediately and is permanent. If the -T flag is used, the change is immediate and lasts until the next boot. If the -P flag is used, the change is deferred until the next boot and is permanent. 
Diagnosis: Applications performing large writes to striped raw logical volumes are not obtaining the desired throughput rate. 
Tuning: Value should equal the number of physical drives (including those in disk arrays) on the SCSI bus, times the queue depth of the individual drives. 

Disk Drive Queue Depth Purpose: Maximum number of requests the disk device can hold in its queue. 
Values: Default: IBMr disks=3; Non-IBM disks=0; Range: specified by manufacturer 
Display: lsattr -E -l hdiskn 
Change: chdev -l hdiskn -a q_type=simple -a queue_depth=NewValue 
Change is effective immediately and is permanent. If the -T flag is used, the change is immediate and lasts until the next boot. If the -P flag is used, the change is deferred until the next boot and is permanent. 
Diagnosis: N/A 
Tuning: If the non-IBM disk drive is capable of request-queuing, make this change to ensure that the operating system takes advantage of the capability. 
Refer to: Setting SCSI-Adapter and Disk-Device Queue Limits 

Interprocess Communication Tunable Parameters
msgmax Purpose: Specifies maximum message size. 
Values: Dynamic with maximum value of 4 MB 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

msgmnb Purpose: Specifies maximum number of bytes on queue. 
Values: Dynamic with maximum value of 4 MB 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

msgmni Purpose: Specifies maximum number of message queue IDs. 
Values: Dynamic with maximum value of 131072 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

msgmnm Purpose: Specifies maximum number of messages per queue. 
Values: Dynamic with maximum value of 524288 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

semaem Purpose: Specifies maximum value for adjustment on exit. 
Values: Dynamic with maximum value of 16384 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

semmni Purpose: Specifies maximum number of semaphore IDs. 
Values: Dynamic with maximum value of 131072 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

semmsl Purpose: Specifies maximum number of semaphores per ID. 
Values: Dynamic with maximum value of 65535 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

semopm Purpose: Specifies maximum number of operations per semop() call. 
Values: Dynamic with maximum value of 1024 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

semume Purpose: Specifies maximum number of undo entries per process. 
Values: Dynamic with maximum value of 1024 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

semvmx Purpose: Specifies maximum value of a semaphore. 
Values: Dynamic with maximum value of 32767 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

shmmax Purpose: Specifies maximum shared memory segment size. 
Values: Dynamic with maximum value of 256 MB for 32-bit processes and 0x80000000u for 64-bit 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

shmmin Purpose: Specifies minimum shared-memory-segment size. 
Values: Dynamic with minimum value of 1 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. 

shmmni Purpose: Specifies maximum number of shared memory IDs. 
Values: Dynamic with maximum value of 131072 
Display: N/A 
Change: N/A 
Diagnosis: N/A 
Tuning: Does not require tuning because it is dynamically adjusted as needed by the 


69. AIX TUNABLE ENVIRONMENT PARAMETERS:
=======================================

Thread Support Tunable Parameters
Following is a list of thread support parameters that can be tuned:

AIXTHREAD_COND_DEBUG (AIX 4.3.3 and subsequent versions) Purpose: Maintains a list of condition variables for use by the debugger. 
Values: Default: ON 
Range: ON, OFF 
Display: echo $AIXTHREAD_COND_DEBUG (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_COND_DEBUG={ON|OFF} 
export AIXTHREAD_COND_DEBUG 
Change takes effect immediately in this shell. Change is effective until logging out of this shell. 
Permanent change is made by adding AIXTHREAD_COND_DEBUG={ON|OFF} command to the /etc/environment file. 
Diagnosis: Leaving it on makes debugging threaded applications easier, but may impose some overhead. 
Tuning: If the program contains a large number of active condition variables and frequently creates and destroys condition variables, this may create higher overhead for maintaining the list of condition variables. Setting the variable to OFF will disable the list. 
Refer to Thread Debug Options. 

AIXTHREAD_ENRUSG Purpose: Enable or disable pthread resource collection. 
Values: Default: OFF 
Range: ON, OFF 
Display: echo $AIXTHREAD_ENRUSG (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_ENRUSG={ON|OFF} 
export AIXTHREAD_ENRUSG 
Change takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_ENRUSG={ON|OFF} command to the /etc/environment file. 
Diagnosis: Turning it on allows for resource collection of all pthreads in a process, but will impose some overhead. 
Tuning:  
Refer to Thread Environment Variables. 

AIXTHREAD_GUARDPAGES (AIX 4.3 and later) Purpose: Controls the number of guard pages to add to the end of the pthread stack. 
Values: Default: 0Range: A positive integer 
Display: echo $AIXTHREAD_GUARDPAGES (This is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_GUARDPAGES=nexport AIXTHREAD_GUARDPAGESChange takes effect immediately in this shell. 
Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_GUARDPAGES=n 
command to the /etc/environment file. 
Diagnosis: N/A 
Tuning: N/A 
Refer to Thread Environment Variables. 

AIXTHREAD_MINKTHREADS (AIX 4.3 and later) Purpose Controls the the minimum number of kernel threads that should be used. 
Values: Default: 8 
Range: A positive integer value 
Display: echo $AIXTHREAD_MINKTHREADS (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_MINKTHREADS=nexport AIXTHREAD_MINKTHREADSChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_MINKTHREADS =n command to the /etc/environment file. 
Diagnosis: N/A 
Tuning: The library scheduler will not reclaim kernel threads below this figure. A kernel thread may be reclaimed at virtually any point. Generally, a kernel thread is targeted as a result of a pthread terminating. 
Refer to: Variables for Process-Wide Contention Scope 

AIXTHREAD_MNRATIO (AIX 4.3 and later) Purpose: Controls the scaling factor of the library. This ratio is used when creating and terminating pthreads. 
Values: Default: 8:1 
Range: Two positive values (p:k), where k is the number of kernel threads that should be employed to handle p runnable pthreads 
Display: echo $AIXTHREAD_MNRATIO (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_MNRATIO=p:kexport AIXTHREAD_MNRATIOChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_MNRATIO=p:k command to the /etc/environment file. 
Diagnosis: N/A 
Tuning: May be useful for applications with a very large number of threads. However, always test a ratio of 1:1 because it may provide for better performance. 
Refer to: Variables for Process-Wide Contention Scope 

AIXTHREAD_MUTEX_DEBUG (AIX 4.3.3 and later) Purpose: Maintains a list of active mutexes for use by the debugger. 
Values: Default: OFF 
Range: ON, OFF 
Display: echo $AIXTHREAD_MUTEX_DEBUG (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_MUTEX_DEBUG={ON|OFF}export AIXTHREAD_MUTEX_DEBUGChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_MUTEX_DEBUG={ON|OFF} command to the /etc/environment file. 
Diagnosis: Setting the variable to ON makes debugging threaded applications easier, but may impose some overhead. 
Tuning: If the program contains a large number of active mutexes and frequently creates and destroys mutexes, this may create higher overhead for maintaining the list of mutexes. Leaving the variable off disables the list. 
Refer to: Thread Debug Options 

AIXTHREAD_RWLOCK_DEBUG (AIX 4.3.3 and later) Purpose: Maintains a list of read-write locks for use by the debugger. 
Values: Default: ON 
Range: ON, OFF 
Display: echo $AIXTHREAD_RWLOCK_DEBUG (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_RWLOCK_DEBUG={ON|OFF}export AIXTHREAD_RWLOCK_DEBUGChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_RWLOCK_DEBUG={ON|OFF} command to the /etc/environment file. 
Diagnosis: Leaving it on makes debugging threaded applications easier, but may impose some overhead. 
Tuning: If the program contains a large number of active read-write locks and frequently creates and destroys read-write locks, this may create higher overhead for maintaining the list of read-write locks. Setting the variable to OFF will disable the list. 
Refer to: Thread Debug Options 

AIXTHREAD_SCOPE (AIX 4.3.1 and later) Purpose: Controls contention scope. P signifies process-based 
contention scope (M:N). S signifies system-based contention scope (1:1). 
Values: Default: P 
Possible Values: P or S 
Display: echo $AIXTHREAD_SCOPE (this is turned on internally, so the initial default value will not be seen 
with the echo command) 
Change: AIXTHREAD_SCOPE={P|S}export AIXTHREAD_SCOPE Change takes effect immediately in this shell. 
Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_SCOPE={P|S} 
command to the /etc/environment file. 

Diagnosis: If fewer threads are being dispatched than expected, then system scope should be tried. 
Tuning: Tests on AIX 4.3.2 have shown that certain applications can perform much better with system based 
contention scope (S). The use of this environment variable impacts only those threads created with the 
default attribute. The default attribute is employed when the attr parameter to pthread_create is NULL. 
Refer to: Thread Environment Variables 


AIXTHREAD_SLPRATIO (AIX 4.3 and later) Purpose: Controls the number of kernel threads that should be held in reserve for sleeping threads. 
Values: Default: 1:12 
Range: Two positive values (k:p), where k is the number of kernel threads that should be held in reserve for p sleeping pthreads 
Display: echo $AIXTHREAD_SLPRATIO (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_SLPRATIO=k:pexport AIXTHREAD_SLPRATIOChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_SLPRATIO=k:p command to the /etc/environment file. 
Diagnosis: N/A 
Tuning: In general, fewer kernel threads are required to support sleeping pthreads, because they are generally woken one at a time. This conserves kernel resources. 
Refer to: Variables for Process-Wide Contention Scope 

AIXTHREAD_STK=n (AIX 4.3.3 ML 09 and later) Purpose: The decimal number of bytes that should be allocated for each pthread. This value may be overridden by pthread_attr_setstacksize. 
Values: Default: 98,304 bytes for 32bit applications, 196,608 bytes for 64bit applications. 
Range: Decimal integer values from 0 to 268,435,455 which will be rounded up to the nearest page (currently 4,096). 
Display: echo $AIXTHREAD_STK (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: AIXTHREAD_STK=size export AIXTHREAD_STK Change takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_STK=size to the /etc/environment file. 
Diagnosis: If analysis of a failing program indicates stack overflow, the default stack size can be increased. 
Tuning: If trying to reach the 32,000 thread limit on a 32 bit application, it may be necessary to decrease the default stack size. 

MALLOCBUCKETS (Version 4.3.3.25 and later) Purpose: Enables buckets-based extension in the default memory allocator which may enhance performance of applications that issue large numbers of small allocation requests. 
Values: MALLOCTYPE=buckets 
 

MALLOCBUCKETS=[[ number_of_buckets:n | bucket_sizing_factor:n | blocks_per_bucket:n | bucket_statistics:[stdout|stderr|pathname]],...] 
The following table displays default values of MALLOCBUCKETS. MALLOCBUCKETS Default Values

MALLOCBUCKETS Options 
Default Value 
number_of_buckets1 
16 
bucket_sizing_factor (32-bit)2 
32 
bucket_sizing_factor (64-bit)3 
64 
blocks_per_bucket 
10244  
Notes:

1. The minimum value allowed is 1. The maximum value allowed is 128.

2. For 32-bit implementations, the value specified for bucket_sizing_factor must be a multiple of 8.

3. For 64-bit implementations, the value specified for bucket_sizing_factor must be a multiple of 16.

4. The bucket_statistics option is disabled by default.
 
Display: echo $MALLOCBUCKETS; echo $MALLOCTYPE 
Change: Use the shell specific method of exporting the environment variables. 
Diagnosis: If malloc performance is slow and many small malloc requests are issued, this feature may enhance performance. 
Tuning: To enable malloc buckets, the MALLOCTYPE environment variable has to be set to the value "buckets". 
 

The MALLOCBUCKETS environment variable may be used to change the default configuration of the malloc buckets, although the default values should be sufficient for most applications. 
 

The number_of_buckets:n option can be used to specify the number of buckets available per heap, where n is the number of buckets. The value specified for n will apply to all available heaps. 
 

The bucket_sizing_factor:n option can be used to specify the bucket sizing factor, where n is the bucket sizing factor in bytes. 
 

The blocks_per_bucket:n option can be used to specify the number of blocks initially contained in each bucket, where n is the number of blocks. This value is applied to all of the buckets. The value of n is also used to determine how many blocks to add when a bucket is automatically enlarged because all of its blocks have been allocated. 
 

The bucket_statistics option will cause the malloc subsystem to output a statistical summary for malloc buckets upon typical termination of each process that calls the malloc subsystem while malloc buckets is enabled. This summary will show buckets configuration information and the number of allocation requests processed for each bucket. If multiple heaps have been enabled by way of malloc multiheap, the number of allocation requests shown for each bucket will be the sum of all allocation requests processed for that bucket for all heaps. 
 

The buckets statistical summary will be written to one of the following output destinations, as specified with the bucket_statistics option. 
stdout 
Standard output 
stderr 
Standard error 
pathname 
A user-specified pathname 
 

If a user-specified pathname is provided, statistical output will be appended to the existing contents of the file (if any). Avoid using standard output as the output destination for a process whose output is piped as input into another process. 
Refer to: Malloc Buckets 

MALLOCMULTIHEAP (AIX 4.3.1 and later) Purpose: Controls the number of heaps within the process private segment. 
Values: Default: 16 for 4.3.1 and 4.3.2, 32 for 4.3.3 and later 
Range: A positive number between 1 and 32) 
Display: echo $MALLOCMULTIHEAP (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: MALLOCMULTIHEAP=[[heaps:n | considersize],...] export MALLOCMULTIHEAPChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding MALLOCMULTIHEAP=[[heaps:n | considersize],...] command to the /etc/environment file. 
Diagnosis: Look for lock contention on the malloc lock (located in segment F) or fewer than expected runnable threads. 
Tuning: Smaller number of heaps can help reduce size of the process. Certain multithreaded user processes which use the malloc subsystem heavily may obtain better performance by exporting the environment variable MALLOCMULTIHEAP=1 before starting the application. 
 

The potential performance enhancement is particularly likely for multithreaded C++ programs, because these may make use of the malloc subsystem whenever a constructor or destructor is called. 
 

Any available performance enhancement will be most evident when the multithreaded user process is running on an SMP system, and particularly when system scope threads are used (M:N ratio of 1:1). However, in some cases, enhancement may also be evident under other conditions, and on uniprocessors. 
 

If the considersize option is specified, an alternate heap selection algorithm is used that tries to select an available heap that has enough free space to handle the request. This may minimize the working set size of the process by reducing the number of sbrk() calls. However, there is a bit more processing time required for this algorithm. 
Refer to: Thread Environment Variables 

SPINLOOPTIME Purpose: Controls the number of times to retry a busy lock before yielding to another processor (only for libpthreads). 
Values: Default: 1 on uniprocessors, 40 on multiprocessors 
Range: A positive integer 
Display: echo $SPINLOOPTIME (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: SPINLOOPTIME=nexport SPINLOOPTIMEChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding SPINLOOPTIME=n command to the /etc/environment file. 
Diagnosis: If threads are going to sleep often (lot of idle time), then the SPINLOOPTIME may not be high enough. 
Tuning: Increasing the value from default of 40 on multiprocessor systems might be of benefit if there is pthread mutex contention. 
Refer to: Thread Environment Variables 

YIELDLOOPTIME Purpose: Controls the number of times to yield the processor before blocking on a busy lock (only for libpthreads). The processor is yielded to another kernel thread, assuming there is another runnable kernel thread with sufficient priority. 
Values: Default: 0 
Range: A positive value 
Display: echo $YIELDLOOPTIME (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: YIELDLOOPTIME=nexport YIELDLOOPTIMEChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding YIELDLOOPTIME=n command to the /etc/environment file. 
Diagnosis: If threads are going to sleep often (lot of idle time), then the YIELDLOOPTIME may not be high enough. 
Tuning: Increasing the value from default value of 0 may benefit if you do not want the threads to go to sleep when waiting for locks. 
Refer to: Thread Environment Variables 

Miscellaneous Tunable Parameters
Following is a list of miscellaneous parameters that can be tuned:

EXTSHM (AIX 4.2.1 and later) Purpose: Turns on the extended shared memory facility. 
Values: Default: Not set 
Possible Value: ON 
Display: echo $EXTSHM 
Change: EXTSHM=ON export EXTSHMChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding EXTSHM=ON command to the /etc/environment file. 
Diagnosis: N/A 
Tuning: Setting value to ON will allow a process to allocate shared memory segments as small as 1 byte (though this will be rounded up to the nearest page); this effectively removes the limitation of 11 user shared memory segments. Maximum size of all segments together can still only be 2.75 GB worth of memory for 32-bit processes. 64-bit processes do not need to set this variable since a very large number of segments is available. Some restrictions apply for processes that set this variable, and these restrictions are the same as with processes that use mmap buffers. 
Refer to: Extended Shared Memory (EXTSHM) 

LDR_CNTRL Purpose: Allows tuning of the kernel loader. 
Values: Default: Not set Possible Values: PREREAD_SHLIB, LOADPUBLIC, IGNOREUNLOAD, USERREGS, MAXDATA, DSA, PRIVSEG_LOADS 
Display: echo $LDR_CNTRL 
Change: LDR_CNTRL={PREREAD_SHLIB | LOADPUBLIC| ...} export LDR_CNTRLChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding the following line to the /etc/environment file: LDR_CNTRL={PREREAD_SHLIB | LOADPUBLIC| ...} 
Diagnosis: N/A 
Tuning: The LDR_CNTRL environment variable can be used to control one or more aspects of the system loader behavior. You can specify multiple options with the LDR_CNTRL variable. When doing this, separate the options using an @ character (that is, LDR_CNTRL=PREREAD_SHLIB@LOADPUBLIC). Specifying the PREREAD_SHLIB option will cause entire libraries to be read as soon as they are accessed. With VMM readahead tuned, a library can be read in from disk and be cached in memory by the time the program starts to access its pages. While this method can use more memory, it can enhance performance of programs that use many shared library pages providing the access pattern is non-sequential. (for example, Catia). Specifying the LOADPUBLIC option directs the system loader to load all modules requested by an application into the global shared library segment. If a module cannot be loaded publicly into the global shared library segment then it is loaded privately for the application. Specifying the IGNOREUNLOAD option will cause modules that are marked to be unloaded and used again (if the module has not been unloaded already). As a side effect of this option, you can end up with two different data instances for the module. Specifying the USERREGS option will tell the system to save all general-purpose user registers across system calls made by an application. This can be helpful in applications doing garbage collection. Specifying the MAXDATA option sets the maximum heap size for a process, including overriding any MAXDATA value specified in an executable. If you want to use Large Program Support with a data heap size of 0x30000000, then specify LDR_CNTRL=MAXDATA=0x30000000. To turn off Large Program Support, specify LDR_CNTRL=MAXDATA=0. Specifying the DSA (Dynamic Segment Allocation) option tells the system loader to run applications using Very Large Program Support. The DSA option is only valid for 32-bit applications. Specifying the PRIVSEG_LOADS option directs the system loader to put dynamically loaded private modules into the process private segment. This might improve the availability of memory in large memory model applications that perform private dynamic loads and tend to run out of memory in the process heap. If the process private segment lacks sufficient space, the PRIVSEG_LOADS option has no effect. The PRIVSEG_LOADS option is only valid for 32-bit applications with a non-zero MAXDATA value. 

NODISCLAIM Purpose: Controls how calls to free() are being handled. When PSALLOC is set to early, all free() calls result in a disclaim() system call. When NODISCLAIM is set to True, this does not occur. 
Values: Default: Not set 
Possible Value: True 
Display: echo $NODISCLAIM 
Change: NODISCLAIM=true export NODISCLAIMChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding NODISCLAIM=true command to the /etc/environment file. 
Diagnosis: If number of disclaim() system calls is very high, you may want to set this variable. 
Tuning: Setting this variable will eliminate calls to disclaim() from free() if PSALLOC is set to early. 
Refer to: Early Page Space Allocation 

NSORDER Purpose: Overwrites the set name resolution search order. 
Values: Default: bind, nis, local 
Possible Values: bind, local, nis, bind4, bind6, local4, local6, nis4, or nis6 
Display: echo $NSORDER (this is turned on internally, so the initial default value will not be seen with the echo command) 
Change: NSORDER=value, value, ... export NSORDERChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding NSORDER=value command to the /etc/environment file. 
Diagnosis: N/A 
Tuning: NSORDER overrides the /etc/netsvc.conf file. 
Refer to: Tuning Name Resolution 

PSALLOC Purpose: Sets the PSALLOC environment variable to determine the paging-space allocation policy. 
Values: Default: Not set 
Possible Value: early 
Display: echo $PSALLOC 
Change: PSALLOC=early export PSALLOCChange takes effect immediately in this shell. Change is effective until logging out of this shell. 
Diagnosis: N/A 
Tuning: To ensure that a process is not killed due to low paging conditions, this process can preallocate paging space by using the Early Page Space Allocation policy. However, this may result in wasted paging space. You may also want to set the NODISCLAIM environment variable. 
Refer to: Allocation and Reclamation of Paging Space Slots and Early Page Space Allocation 

RT_GRQ (AIX 4.3.3.1 and later) Purpose: Causes thread to be put on a global run queue rather than on a per-CPU run queue. 
Values: Default: Not set; Range: ON, OFF 
Display: echo $RT_GRQ 
Change: RT_GRQ={OFF/ONexport RT_GRQChange takes effect immediately. Change is effective until next boot. Permanent change is made by adding RT_GRQ={ON|OFF} command to the /etc/environment file. 
Diagnosis: N/A 
Tuning: May be tuned on multiprocessor systems. Set to ON, will cause the thread to be put on a global run queue. In that case, the global run queue is searched to see which thread has the best priority. This might allow to get the thread dispatched sooner and can improve performance for threads that are running SCHED_OTHER, and are interrupt driven. 
Refer to: Scheduler Run Queue 

RT_MPC (AIX 4.3.3 and later) Purpose: When running the kernel in real-time mode (see bosdebug command), an MPC can be sent to a different CPU to interrupt it if a better priority thread is runnable so that this thread can be dispatched immediately. 
Values: Default: Not set; Range: ON 
Display: echo $RT_MPC 
Change: RT_MPC=ON 
export RT_MPC 
Change takes effect immediately. Change is effective until next boot. Permanent change is made by adding RT_MPC=ON command to the /etc/environment file. 
Diagnosis: N/A 


Note on LDR_CNTRL:
------------------


Setting the maximum number of AIX data segments that a process can use (LDR_CNTRL)
In AIX, Version 4.3.3 and later, the number of segments that a process can use for data is controlled 
by the LDR_CNTRL environment variable. It is defined in the parent process of the process that 
is to be affected. For example, the following defines one additional data segment: 

export LDR_CNTRL =MAXDATA=0x10000000
start_process
unset LDR_CNTRL

It is a good idea to unset the LDR_CNTRL environment variable, so that it does not unintentionally 
affect other processes. 

Unlike other environment variables for the IBM SecureWay Directory server process (slapd), 
the LDR_CNTRLenvironment variable cannot be set as a front-end variable in the slapd32.conf file. 
It must be set as an environment variable. 

The following table shows the LDR_CNTRL setting and memory increase for various numbers of data segments: 

LDP_CNTRL Setting  	Number of Additional Segments  Process Memory Limit Increase  
Unset  				0 (default)  		256 MB  
LDR_CNTRL=MAXDATA=0x1000000  	1  			512 MB  
LDR_CNTRL=MAXDATA=0x2000000  	2  			768 MB  
LDR_CNTRL=MAXDATA=0x3000000  	3  			1 GB  
LDR_CNTRL=MAXDATA=0x4000000  	4  			1.25 GB  
LDR_CNTRL=MAXDATA=0x5000000  	5 			1.5 GB  
LDR_CNTRL=MAXDATA=0x6000000  	6  			1.75 GB  
LDR_CNTRL=MAXDATA=0x7000000  	7  			2 GB  
LDR_CNTRL=MAXDATA=0x8000000  	8  			2.25 GB





70. Charactersets and Codepages:
================================


70.1 LANG variable on UNIX systems:
-----------------------------------

Most UNIX systems use the LANG variable to specify the desired locale. Different UNIX operating systems, however, 
require different locale names to specify the same language. Be sure to use a value for LANG that is supported 
by the UNIX operating system that you are using.

To obtain the locale names for your UNIX system, enter the following: 

# locale -a

As specified by open systems standards, other environment variables override LANG for some or all 
locale categories. These variables include the following: 

LC_COLLATE 
LC_CTYPE 
LC_MONETARY 
LC_NUMERIC 
LC_TIME 
LC_MESSAGES 
LC_ALL


To verify that you have a language package installed for your UNIX or Linux system, enter the following:

# locale 

If you had loaded a language package (for example bos.loc.iso.en_us), the output of the locale command would be:

LANG=en_US
LC_COLLATE="en_US"
LC_CTYPE="en_US"
LC_MONETARY="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_MESSAGES="en_US"
LC_ALL=

If no language packages have been installed, the output would be:

LANG=en_US
LC_COLLATE="C"
LC_CTYPE="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_MESSAGES="C"
LC_ALL=


Changing the LANG variable for the Unix shell session:
# export LANG=en_US

The  LANG  environment  variable  provides  the  ability  to specify  the user's requirements for native languages, 
localcustoms and character set, as an ASCII string in the form

LANG=language[_territory[.codeset]]

A user who speaks German as it is spoken in Austria and  has a  terminal which operates in ISO 8859/1 codeset, 
would want the setting of the LANG variable to be

# export LANG=De_A.88591

With this setting it should be possible  for  that  user  to  find any  relevant catalogs should they exist.
Should  the  LANG  variable  not  be  set,  the   value   of  LC_MESSAGES  as returned by setlocale() is used.  
If this is NULL, the default path as defined in <nl_types.h> is used.


70.2 UTF-8 on Unix/Linux:
-------------------------

The proper way to activate UTF-8 is the POSIX locale mechanism. A locale is a configuration setting that 
contains information about culture-specific conventions of software behaviour, including the character encoding, 
the date/time notation, alphabetic sorting rules, the measurement system and common office paper size, etc. 
The names of locales usually consist of ISO 639-1 language and ISO 3166-1 country codes, sometimes with 
additional encoding names or other qualifiers. 

You can get a list of all locales installed on your system (usually in /usr/lib/locale/) with the command 
locale -a. Set the environment variable LANG to the name of your preferred locale. When a C program executes 
the setlocale(LC_CTYPE, "") function, the library will test the environment variables 
LC_ALL, LC_CTYPE, and LANG in that order, and the first one of these that has a value will determine which 
locale data is loaded for the LC_CTYPE category (which controls the multibyte conversion functions). 
The locale data is split up into separate categories. For example, LC_CTYPE defines the character encoding 
and LC_COLLATE defines the string sorting order. The LANG environment variable is used to set the default locale 
for all categories, but the LC_* variables can be used to override individual categories. Do not worry too much 
about the country identifiers in the locales. Locales such as en_GB (English in Great Britain) and en_AU 
(English in Australia) differ usually only in the LC_MONETARY category (name of currency, rules for printing 
monetary amounts), which practically no Linux application ever uses. LC_CTYPE=en_GB and LC_CTYPE=en_AU have exactly 
the same effect. 

You can query the name of the character encoding in your current locale with the command locale charmap. 
This should say UTF-8 if you successfully picked a UTF-8 locale in the LC_CTYPE category. The command locale -m 
provides a list with the names of all installed character encodings. 

If you use exclusively C library multibyte functions to do all the conversion between the external character 
encoding and the wchar_t encoding that you use internally, then the C library will take care of using the right 
encoding according to LC_CTYPE for you and your program does not even have to know explicitly what the current 
multibyte encoding is. 

Users have to select a UTF-8 locale, for example with 

# export LANG=en_GB.UTF-8 
# export LANG en_US.UTF-8

in order to activate the UTF-8 support in applications. 

Note:

For some apps you must have the LANG and LC_ALL environment variables set to the appropriate locale 
in your current session before you start that app.

X11.loc.NN_NN for the UTF-8 locale 


70.3 Listing of locale env. vars:
---------------------------------

LANG
This variable determines the locale category for native language, local customs and coded character set 
in the absence of the LC_ALL and other LC_* (LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, 
LC_TIME) environment variables. This can be used by applications to determine the language to use for 
error messages and instructions, collating sequences, date formats, and so forth. 

LC_ALL
This variable determines the values for all locale categories. The value of the LC_ALL environment variable 
has precedence over any of the other environment variables starting with LC_ (LC_COLLATE, LC_CTYPE, LC_MESSAGES, 
LC_MONETARY, LC_NUMERIC, LC_TIME) and the LANG environment variable. 

LC_COLLATE
This variable determines the locale category for character collation. It determines collation information 
for regular expressions and sorting, including equivalence classes and multi-character collating elements, 
in various utilities and the strcoll() and strxfrm() functions. Additional semantics of this variable, if any, 
are implementation-dependent. 

LC_CTYPE
This variable determines the locale category for character handling functions, such as tolower(), toupper() 
and isalpha(). This environment variable determines the interpretation of sequences of bytes of text data 
as characters (for example, single- as opposed to multi-byte characters), the classification of characters 
(for example, alpha, digit, graph) and the behaviour of character classes. Additional semantics of 
this variable, if any, are implementation-dependent. 

LC_MESSAGES
This variable determines the locale category for processing affirmative and negative responses and the language 
and cultural conventions in which messages should be written. It also affects the behaviour of the 
catopen() function in determining the message catalogue. Additional semantics of this variable, if any, 
are implementation-dependent. The language and cultural conventions of diagnostic and informative messages 
whose format is unspecified by this specification set should be affected by the setting of LC_MESSAGES. 

LC_MONETARY
This variable determines the locale category for monetary-related numeric formatting information. 
Additional semantics of this variable, if any, are implementation-dependent. 

LC_NUMERIC
This variable determines the locale category for numeric formatting (for example, thousands separator 
and radix character) information in various utilities as well as the formatted I/O operations in printf() 
and scanf() and the string conversion functions in strtod(). Additional semantics of this variable, if any, 
are implementation-dependent. 

LC_TIME
This variable determines the locale category for date and time formatting information. It affects the behaviour 
of the time functions in strftime(). Additional semantics of this variable, if any, are implementation-dependent. 

NLSPATH
This variable contains a sequence of templates that the catopen() function uses when attempting to locate 
message catalogues. Each template consists of an optional prefix, one or more substitution fields, a filename 
and an optional suffix. For example: 
NLSPATH="/system/nlslib/%N.cat"


71. ar, ld commands:
====================

Note 1:
-------

ar Command

Purpose

Maintains the indexed libraries used by the linkage editor.

Syntax

ar [ -c ] [ -l ] [ -g | -o ] [ -s ] [ -v ] [ -C ] [ -T ] [ -z ] { -h | -p | -t |
-x } [ -X {32|64|32_64}] ArchiveFile [ File ... ]

ar [ -c ] [ -l ] [ -g | -o ] [ -s ] [ -v ] [ -C ] [ -T ] [ -z ] { -m | -r [ -u ]
} [ { -a | -b | -i } PositionName ] [ -X {32|64|32_64}] ArchiveFile File ...

ar [ -c ] [ -l ] [ -g | -o ] [ -s ] [ -v ] [ -C ] [ -T ] [ -z ] { -d | -q } [ -X
{32|64|32_64}] ArchiveFile File ...

ar [ -c ] [ -l ] [ -v ] [ -C ] [ -T ] [ -z ] { -g | -o | -s | -w } [ -X
{32|64|32_64}] ArchiveFile

Description

The ar command maintains the indexed libraries used by the linkage editor. The
ar command combines one or more named files into a single archive file written
in ar archive format. When the ar command creates a library, it creates headers
in a transportable format; when it creates or updates a library, it rebuilds the
symbol table. See the ar file format entry for information on the format and
structure of indexed archives and symbol tables.

There are two file formats that the ar command recognizes. The Big Archive
Format, ar_big, is the default file format and supports both 32-bit and 64-bit
object files. The Small Archive Format can be used to create archives that are
recognized on versions older than AIX 4.3, see the -g flag. If a 64-bit object
is added to a small format archive, ar first converts it to the big format,
unless -g is specified. By default, ar only handles 32-bit object files; any
64-bit object files in an archive are silently ignored. To change this behavior,
use the -X flag or set the OBJECT_MODE environment variable.

Flags

In an ar command, you can specify any number of optional flags from the set
cClosTv. You must specify one flag from the set of flags dhmopqrstwx. If you
select the -m or -r flag, you may also specify a positioning flag (-a, -b, or
-i); for the -a, -b, or -i flags, you must also specify the name of a file
within ArchiveFile (PositionName), immediately following the flag list and
separated from it by a blank.

-a PositionName Positions the named files after the existing file identified by
the PositionName parameter.

-b PositionName Positions the named files before the existing file identified by
the PositionName parameter.

-c Suppresses the normal message that is produced when library is created.

-C Prevents extracted files from replacing like-named files in the file system.

-d Deletes the named files from the library.

-g Orders the members of the archive to ensure maximum loader efficiency with a
minimum amount of unused space. In almost all cases, the -g flag physically
positions the archive members in the order in which they are logically linked.
The resulting archive is always written in the small format, so this flag can be
used to convert a big-format archive to a small-format archive. Archives that
contain 64-bit XCOFF objects cannot be created in or converted to the small
format.

-h Sets the modification times in the member headers of the named files to the
current date and time. If you do not specify any file names, the ar command sets
the time stamps of all member headers. This flag cannot be used with the -z
flag.

-i PositionName Positions the named files before the existing file identified by
the PositionName parameter (same as the -b).

-m Moves the named files to some other position in the library. By default, it
moves the named files to the end of the library. Use a positioning flag (abi) to
specify some other position.

-o Orders the members of the archive to ensure maximum loader efficiency with a
minimum amount of unused space. In almost all cases, the -o flag physically
positions the archive members in the order in which they are logically linked.
The resulting archive is always written in the big archive format, so this flag
can be used to convert a small-format archive to a big-format archive.

-p Writes to standard output the contents of the named in the Files parameter,
or all files specified in the ArchiveFile parameter if you do not specify any
files.

-q Adds the named files to the end of the library. In addition, if you name the
same file twice, it may be put in the library twice.

-r Replaces a named file if it already appears in the library. Because the named
files occupy the same position in the library as the files they replace, a
positioning flag does not have any additional effect. When used with the -u flag
(update), the -r flag replaces only files modified since they were last added to
the library file.

If a named file does not already appear in the library, the ar command adds it.
In this case, positioning flags do affect placement. If you do not specify a
position, new files are placed at the end of the library. If you name the same
file twice, it may be put in the library twice.

-s Forces the regeneration of the library symbol table whether or not the ar
command modifies the library contents. Use this flag to restore the library
symbol table after using the strip command on the library.

-t Writes to the standard output a table of contents for the library. If you
specify file names, only those files appear. If you do not specify any files,
the -t flag lists all files in the library.

-T Allows file name truncation if the archive member name is longer than the
file system supports. This option has no effect because the file system supports
names equal in length to the maximum archive member name of 255 characters.

-u Copies only files that have been changed since they were last copied (see the
-r flag discussed previously).

-v Writes to standard output a verbose file-by-file description of the making of
the new library. When used with the -t flag, it gives a long listing similar to
that of the ls -l command. When used with the -x flag, it precedes each file
with a name. When used with the -h flag, it lists the member name and the
updated modification times.

-w Displays the archive symbol table. Each symbol is listed with the name of the
file in which the symbol is defined.

-x Extracts the named files by copying them into the current directory. These
copies have the same name as the original files, which remain in the library. If
you do not specify any files, the -x flag copies all files out of the library.
This process does not alter the library.

-X mode Specifies the type of object file ar should examine. The mode must be
one of the following:

32
  Processes only 32-bit object files
64
  Processes only 64-bit object files
32_64
  Processes both 32-bit and 64-bit object files

The default is to process 32-bit object files (ignore 64-bit objects). The mode
can also be set with the OBJECT_MODE environment variable. For example,
OBJECT_MODE=64 causes ar to process any 64-bit objects and ignore 32-bit
objects. The -X flag overrides the OBJECT_MODE variable.



72. REMARKS ON PRINTING IN AIX:
===============================


Example of a printqueue check session:
--------------------------------------

$ enq -q -PAMSPM00040
Queue   Dev   Status    Job Files              User         PP %   Blks  Cp Rnk
------- ----- --------- --- ------------------ ---------- ---- -- ----- --- ---
AMSPM00 hp@UT READY

$ lp -dAMSPM00040 testje.txt
Job number is: 351

admprod@noordkaper:/appl/alliance/beheer $ enq -q -PAMSPM00040
Queue   Dev   Status    Job Files              User         PP %   Blks  Cp Rnk
------- ----- --------- --- ------------------ ---------- ---- -- ----- --- ---
AMSPM00 hp@UT READY

$ rm testje.txt

$ qchk -#351
Queue   Dev   Status    Job Files              User         PP %   Blks  Cp Rnk
------- ----- --------- --- ------------------ ---------- ---- -- ----- --- ---
prtdumm null  READY
qstatus: (WARNING): 0781-350 Job 351 not found -- perhaps it's done?



Further information:
--------------------


The following defines terms commonly used when discussing UNIX printing.

* Print Job
A print job is a unit of work to be run on a printer. A print job can consist of
printing one or more files depending on how the print job is requested. The
system assigns a unique job number to each job it runs.

* Queue
The queue is where you direct a print job. It is a stanza in the /etc/qconfig
file whose name is the name of the queue and points to the associated
queue device.

* Queue Device
The queue device is the stanza in the /etc/qconfig file that normally follows
the local queue stanza. It specifies the /dev file (printer device) that should
be used.

* qdaemon
The qdaemon is a process that runs in the background and controls the
queues. It is generally started during IPL.

* Print Spooler
The spooler is not specifically a print job spooler. Instead, it provides a
generic spooling function that can be used for queuing various types of
jobs including print jobs queued to a printer.
The spooler does not normally know what type of job it is queuing

The main spooler command is the enq command. Although you can invoke
this command directly to queue a print job, three front-end commands are
defined for submitting a print job: The lp, lpr, and qprt commands. A print
request issued by one of these commands is first passed to the enq
command, which then places the information about the file in the queue for
the qdaemon to process.

* Real Printer
A real printer is the printer hardware attached to a serial or parallel port at
a unique hardware device address. The printer device driver in the kernel
communicates with the printer hardware and provides an interface
between the printer hardware and a virtual printer, but it is not aware of
the concept of virtual printers. Real printers sometimes run out of paper.

* Local and Remote Printers
When you attach a printer to a node or host, the printer is referred to as a
local printer. A remote print system allows nodes that are not directly
linked to a printer to have printer access.
To use remote printing facilities, the individual nodes must be connected
to a network using the Transmission Control Protocol/Internet Protocol
(TCP/IP) and must support the required TCP/IP applications.

* Printer Backend
The printer backend is a collection of programs called by the spooler's
qdaemon command to manage a print job that is queued for printing. The
printer backend performs the following functions:

- Receives from the qdaemon command a list of one or more files to be
printed
- Uses printer and formatting attribute values from the database
overridden by flags entered on the command line
- Initializes the printer before printing a file
- Runs filters as necessary to convert the print data stream to a format
supported by the printer
- Provides filters for simple formatting of ASCII documents
- Provides support for printing national language characters
- Passes the filtered print data stream to the printer device driver
- Generates header and trailer pages
- Generates multiple copies
- Reports paper out, intervention required, and printer error conditions
- Reports problems detected by the filters
- Cleans up after a print job is canceled
- Provides a print environment that a system administrator can
customize to address specific printing needs

AIX supports The AIX printsubsystem and the System5 BSD like printsubsystem.

- Devices and Drivers:

Local printing to serial and parallel attached printers for both printsubsystems
is done through standard AIX device drivers.
You can add printdevices with smitty, WSM, or commandline.

In order to show the present devices, use the lsdev command:

# lsdev -Cc printer
lp0 Available 00-00-0P-00 Lexmark...
lp1 Available 00-00-S2-00 IBM...
lp2 Available 00-00-S1-00 Hewlett-Packard...

Individual device files can be listed with the ls command, for example

# ls -al /dev/lp0
crw-rw-rw- 1 root system 25,0 Oct 19 13:62 /dev/lp0

* The Print Configuration File

The file that holds the configuration for the printers that exist on the system is
the /etc/qconfig file. It is the most important file in the spooler domain for
these reasons:

- It contains the definition of every queue known to the spooler.
- A system administrator can read this file and discern the function of each
queue.
- Although it is not recommended, this file can be edited to modify spooler
queues without halting the spooler.
The /etc/qconfig file describes all of the queues defined in the AIX operating
system. A queue is a named, ordered list of requests for a specific device. A
device is something (either hardware or software) than can handle those
requests one at a time. The queue provides serial access to the device.

The following is an example of the partial contents of the /etc/qconfig file.

..
..
lpforu:
device = lp0
lp0:
file = /dev/lp0
header = never
trailer = never
access = both
backend = /usr/lib/lpd/piobe



Quick checks:
-------------

Submit print jobs 	Status print jobs 	Cancel print jobs
-----------------       -----------------       -----------------
enq 			enq -A 			enq -x
qprt 			qchk 			qcan
lp 			lpstat 			lprm
lpr 			lpq


- The lpstat command displays information about the current status of the
line printer.
The lpstat command syntax is as follows:
lpstat [ -aList ] [ -cList ] [ -d ] [ -oList ] [ -pList ] [ -r ] [ -s ]
[ -t ] [ -uList ] [ -vList ] [ -W ]
An example of the lpstat command without any flags is as follows:
# lpstat
Queue Dev Status Job Files User PP% Blks Cp Rnk
------ ---- ------- --- ---------------- ------------ --- ---- -- ---
lpforu lp0 READY

- The qchk command displays the current status information regarding
specified print jobs, print queues, or users.
The qchk command syntax is as follows:
qchk [ -A ] [ -L | -W ] [ -P Printer ] [ -# JobNumber ] [ -q ] [ -u
UserName ] [ -w Delay ]
An example of the qchk command without any flags is as follows:
# qchk
Queue Dev Status Job Files User PP% Blks Cp Rnk
------ ---- ------- --- ---------------- ------------ --- ---- -- ---
lpforu lp0 READY

- The lpq command reports the status of the specified job or all jobs
associated with the specified UserName and JobNumber variables.
The lpq command syntax is as follows:
lpq [ + [ Number ] ] [ -l | -W ] [-P Printer ] [JobNumber] [UserName]
The following is an example of the lpq command without any flags.
# lpq
Queue Dev Status Job Files User PP% Blks Cp Rnk
------ ---- ------- --- ---------------- ------------ --- ---- -- ---
lpforu lp0 READY

- The lpr command uses a spooling daemon to print the named File
parameter when facilities become available.
The lpr command syntax is as follows:
lpr [ -f ] [ -g ] [ -h ] [ -j ] [ -l ] [ -m ] [ -n ] [ -p ] [ -r ] [ -s
] [ -P Printer ] [ -# NumberCopies ] [ -C Class ] [ -J Job ] [ -T Title
] [ -i [ NumberColumns ] ] [ -w Width ] [ File ... ]
The following is an example of using the lpr command to print the file
/etc/passwd.
# lpr /etc/passwd
# lpstat
Queue Dev Status Job Files User PP % Blks Cp Rnk
------ ---- -------- --- ---------------- -------- ---- -- ---- -- ---
lpforu lp0 RUNNING 3 /etc/passwd root 1 100 1 1 1



Example: 
--------

>>>> Stopping the Print Queue

In the following scenario, you have a job printing on a print queue, but you
need to stop the queue so that you can put more paper in the printer.

# lpstat -vlpforu

Queue Dev Status Job Files User PP % Blks Cp Rnk
------ ---- -------- --- ---------------- -------- ---- -- ---- -- ---
lpforu lp0 RUNNING 3 /etc/passwd root 1 100 1 1 1

Disable the print queue using the enq command as shown in the following
example. 

# enq -D -P 'lpforu:lp0'

Checking the printer queue using the qchk command as shown in the
following example. 

# qchk -Plpforu
Queue Dev Status Job Files User PP % Blks Cp Rnk
------ ---- -------- --- ---------------- -------- ---- -- ---- -- ---
lpforu lp0 DOWN 3 /etc/passwd root 1 100 1 1 1


>>>> Starting the Print Queue

You have replaced the paper, and you now want to restart the print queue so
that it will finish your print job. Here is how you would do this.

# lpstat -vlpforu
Queue Dev Status Job Files User PP % Blks Cp Rnk
------ ---- -------- --- ---------------- -------- ---- -- ---- -- ---
lpforu lp0 DOWN 3 /etc/passwd root 1 100 1 1 1

# enq -U -P 'lpforu:lp0'

# qchk -P lpforu
Queue Dev Status Job Files User PP % Blks Cp Rnk
------ ---- -------- --- ---------------- -------- ---- -- ---- -- ---
lpforu lp0 RUNNING 3 /etc/passwd root 1 100 1 1


- Adding a local print queue:

# smitty, or smitty printer, or smitty mkpq
or use
# mkque 
# mkquedev


There is a n:1 relation between queues and a device: 
multiple queues can be associated to one device.


- Displaying queue configuration information:

# enq -q -P<QUEUENAME>
for example

# enq -q -PAMSPM00028

# smitty lsallq
# lsallq -c



_ Deleting a queue:

# smitty rmpq
or
# rmvirprt
# rmquedev
# rmque

- Enabling and disabling a queue:

This is the same as saying starting and stopping a queue.

# smitty qstop
# smitty qstart

Or use the "qadm" command to bring printers, queues, and the spooling system up or down.

Example:

To bring down the PCL-mv200 queue, enter one of the following commands:

# qadm -D PCL-mv200
# disable PCL-mv200


- Printing job management:

System5: lp
BSD    : lpr
AIX    : qprt

1. To submit a printjob, use either lp. lpr, or qprt. All jobs will go to the system default queue
unless the PRINTER or LPDEST variables are set. You can also specify on the command line which 
queue ti ose.
Use -d with lp or use -P with qprt and lpr.
All the printcommands lp, lpr, and qprt, actually call the "enq" command, which places
the print request in a queue.

To print multiple copies, use the "qprt -N #" or "lp -n #" command.
For lpr use just a dash followed by the number of copies, like "lpr - #".

Examples:

# qprt -P funjet /tmp/testfile
# lpr -P funjet /tmp/testfile
# lp -d funjet /tmp/testfile


- Checking status of jobs:

# smitty qstatus
# smitty qchk

System5: lpstat
BSD    : lpq
AIX    : qchk

- Cancelling a printjob:

System5: cancel
BSD    : lprm
AIX    : qcan

For example to cancel Job Number 127 on whatever queue the job is on, run

# qcan -x 127 
# cancel 127

To cancel all jobs queued on printer lp0, enter

# qcan -X -Plp0
# cancel lp0


- Demons:

System5 print service demon: lpsched
AIX print spooler demon    : qdaemon

Only one subsystem can be active at a time.

To switch between subsystems, you can use smitty or the switch.prt script.

# switch.prt -s System5
# switch.prt -s AIX


- System files associated with printing:

/etc/qconfig		describes the queues and devices available for use by printing commands
/var/spool		contains files and dirs used by printing programs and daemons
/var/spool/lpd/qdir	contains info about files queued to print
/var/spool/qdaemon	contains copies of the files spooled to print
/var/spool/lpd/stat	where the info on status of jobs is stored
/var/spool/lpd/pio	holds virtual printer defenitions



73. Apache:
===========

Apache webserver can be found on almost any flavour of Unix systems. We describe some apache features
on Redhat Linux and SuZE Linux.


73.1 Apache on Redhat:
----------------------

The Apache HTTP Server is a robust, commercial-grade open source Web server developed by the Apache 
Software Foundation (http://www.apache.org/). Red Hat Linux 8.0 includes the Apache HTTP Server version 2.0 
as well as a number of server modules designed to enhance its functionality. 

The default configuration file installed with the Apache HTTP Server works without alteration 
for most situations. This chapter, however, outlines how to customize the Apache HTTP Server 
configuration file (/etc/httpd/conf/httpd.conf) for situations where the default configuration does not 
suit your needs. 

- Apache HTTP Server 2.0
Red Hat Linux 8.0 ships with version 2.0 of the Apache HTTP Server. There are important differences 
between version 2.0 and version 1.3 - which shipped with earlier releases of Red Hat Linux. 
This section reviews some of the new features of Apache HTTP Server 2.0 and outlines important changes. 
If you need to migrate a version 1.3 configuration file to the new format, refer to the Section called 
Migrating Apache HTTP Server 1.3 Configuration Files. 

- Features of Apache HTTP Server 2.0
The arrival of Apache HTTP Server 2.0 brings with it a number of new features. Among them are the following: 

. New Apache API - The Apache HTTP Server has a new, more powerful set of Application Programing Interfaces 
  (APIs) for modules. 
  Caution 
  Modules built for Apache HTTP Server 1.3 will not work without being ported to the new API. 
  If you are unsure whether or not a particular module has been ported, consult with the package maintainer 
  before upgrading. 
. Filtering - Modules for Apache HTTP Server 2.0 have the ability to act as content filters. 
  See the Section called Modules and Apache HTTP Server 2.0 for more on how filtering works. 
. IPv6 Support - Apache HTTP Server 2.0 supports next generation IP addressing. 
. Simplified Directives - A number of confusing directives have been removed while others have been simplified. 
  See the Section called Configuration Directives in httpd.conf for more information about specific directives. 
. Multilingual Error Responses - When using Server Side Include (SSI) documents, customizable error 
  response pages can be delivered in multiple languages. 
. Multiprotocol Support - Apache HTTP Server 2.0 has the ability to serve multiple protocols. 

- Packaging Changes in Apache HTTP Server 2.0
Under Red Hat Linux 8.0 the Apache HTTP Server package has been renamed. Also, some related packages 
have been renamed, deprecated, or incorporated into other packages. 
Below is a list of the packaging changes: 

.The apache, apache-devel and apache-manual packages have been renamed as httpd, httpd-devel and httpd-manual 
 respectively. 

.The mod_dav package has been incorporated into the httpd package. 

.The mod_put and mod_roaming packages have been removed, since their functionality is a subset of that 
 provided by mod_dav. 

.The mod_auth_any and mod_bandwidth packages have been removed. 

.The version number for the mod_ssl package is now synchronized with the httpd package. This means that the 
 mod_ssl package for Apache HTTP Server 2.0 has a lower version number than mod_ssl package for 
 Apache HTTP Server 1.3. 

- File System Changes in Apache HTTP Server 2.0
The following changes to the file system layout occur when upgrading to Apache HTTP Server 2.0: 

. A new configuration directory, "/etc/httpd/conf.d/", has been added. - This new directory is used to store 
  configuration files for individually packaged modules, such as mod_ssl, mod_perl, and php. The server is 
  instructed to load configuration files from this location by the directive Include conf.d/*.conf within 
  the Apache HTTP Server configuration file, /etc/httpd/conf/httpd.conf. 

Warning 
It is vital that this line be inserted when migrating an existing configuration. 
 
. The ab and logresolve programs have been moved. - These utility programs have been moved from the 
  /usr/sbin/ directory and into the /usr/bin/ directory. This will cause scripts with absolute paths for 
  these binaries to fail. 

. The dbmmanage command has been replaced. - The dbmmanage command has been replaced by htdbm. 

. The logrotate configuration file has has been renamed. - The logrotate configuration file has been renamed 
  from /etc/logrotate.d/apache to /etc/logrotate.d/httpd. 

- After Installation
After you have installed the httpd package, the Apache HTTP Server's documentation is available by 
installing the httpd-manual package and pointing a Web browser to http://localhost/manual/ or you can 
browse the Apache documentation available on the Web at http://httpd.apache.org/docs-2.0/. 

The Apache HTTP Server's documentation contains a full list and complete descriptions of all 
configuration options. For your convenience, this chapter provides short descriptions of the configuration 
directives used by Apache HTTP Server 2.0. 

The version of the Apache HTTP Server included with Red Hat Linux includes the ability to set up secure Web servers 
using the strong SSL encryption provided by the mod_ssl and openssl packages. As you look through the 
configuration files, be aware that it includes both a non-secure and a secure Web server. 
The secure Web server runs as a virtual host, which is configured in the /etc/httpd/conf.d/ssl.conf file. 


- Starting and Stopping httpd (Apache)

The the httpd RPM installs the /etc/rc.d/init.d/httpd Bourne script, which is accessed using the 
/sbin/service command. 

 To start your server, as root type: 
 # /sbin/service httpd start
 
 Note 
  If you are running the Apache HTTP Server as a secure server, you will be prompted to type your password. 
 
 To stop your server, type the command: 
 # /sbin/service httpd stop
 

The command restart is a shorthand way of stopping and then starting your server. The restart command explicitly 
stops and then starts your server. You will be prompted for your password if you are running the Apache HTTP 
Server as a secure server. The restart command looks like the following: 

 # /sbin/service httpd restart
 
If you just finished editing something in your httpd.conf file, you do not need to explicitly stop and 
start your server. Instead, you can use the reload command. 

 Note 
  If you are running the Apache HTTP Server as a secure server, you will not need to type your password when 
  using the reload option as the password will remain cached across reloads. 
 
The reload command looks like the following example: 

 # /sbin/service httpd reload
 
By default, the httpd process will not start automatically when your machine boots. You will need to configure 
the httpd service to start up at boot time using an initscript utility, such as /sbin/chkconfig, /sbin/ntsysv, 
or the Services Configuration Tool program. 

Please refer to the chapter titled Controlling Access to Services in Official Red Hat Linux Customization Guide 
for more information regarding these tools. 

Note 
If you are running the Apache HTTP Server as a secure server, you will be prompted for the secure server's 
password after the machine boots, unless you generated a specific type of server key file. 

- Configuration Directives in httpd.conf

The Apache HTTP Server configuration file is /etc/httpd/conf/httpd.conf. The httpd.conf file is well-commented 
and mostly self-explanatory. Its default configuration will work for most situations, however you should 
become familiar some of the more important configuration options. 

If you need to configure the Apache HTTP Server, edit httpd.conf and then either reload, restart, 
or stop and start the httpd process. How to reload, stop and start the Apache HTTP Server is covered in the 
Section called Starting and Stopping httpd. 

- Default Modules
The Apache HTTP Server is distributed with a number of modules. By default the following modules are installed 
and enabled with the httpd package on Red Hat Linux: 

mod_access
mod_auth
mod_auth_anon
mod_auth_dbm
mod_auth_digest
mod_include
mod_log_config
mod_env
mod_mime_magic
mod_cern_meta
mod_expires
mod_headers
mod_usertrack
mod_unique_id
mod_setenvif
mod_mime
mod_dav
mod_status
mod_autoindex
mod_asis
mod_info
mod_cgi
mod_dav_fs
mod_vhost_alias
mod_negotiation
mod_dir
mod_imap
mod_actions
mod_speling
mod_userdir
mod_alias
mod_rewrite
 

Additionally, the following modules are available by installing additional packages: 

mod_auth_mysql
mod_auth_pgsql
mod_perl
mod_python
mod_ssl
php
squirrelmail
 
- Using Virtual Hosts
You can use the Apache HTTP Server's virtual hosts capability to run different servers for different IP addresses, 
different host names, or different ports on the same server. If you are interested in using virtual hosts, 
complete information is provided in the Apache documentation on your machine or on the Web at 
http://httpd.apache.org/docs-2.0/vhosts/. 

Note 
You cannot use name-based virtual hosts with your Red Hat Linux Advanced Server, because the SSL handshake 
occurs before the HTTP request which identifies the appropriate name-based virtual host. If you want to use 
name-based virtual hosts, they will only work with your non-secure Web server. 

Virtual hosts are configured within the httpd.conf file, as described in the Section called Configuration 
Directives in httpd.conf. Please review that section before you start to change the virtual hosts configuration 
on your machine. 

The Secure Web Server Virtual Host
The default configuration of your Web server runs a non-secure and a secure server. Both servers use the same 
IP address and host name, but they listen on different ports, and the secure server is a virtual host configured. 
This configuration enables you to serve both secure and non-secure documents in an manner. Setting up the secure 
HTTP transmission is very resource intensive, so generally you will be able to serve far fewer pages per second 
with a secure server. You need to consider this when you decide what information to include on the secure server 
and the non-secure server. 

The configuration directives for your secure server are contained within virtual host tags in the 
/etc/httpd/conf.d/ssl.conf file. If you need to change anything about the configuration of your secure server, 
you will need to change the configuration directives inside the virtual host tags. 

By default, both the secure and the non-secure Web servers share the same DocumentRoot. To change the DocumentRoot 
so that it is no longer shared by both the secure server and the non-secure server, change one of the DocumentRoot 
directives. The DocumentRoot either inside or outside of the virtual host tags in httpd.conf defines the 
DocumentRoot for the non-secure Web server. The DocumentRoot within the virtual host tags in 
conf.d/ssl.conf define the document root for the secure server. 

The secure the Apache HTTP Server server listens on port 443, while your non-secure Web server listens on port 80. 
To stop the non-secure Web server from accepting connections find the line which reads: 

Then comment out any line in httpd.conf which reads Listen 80. 

Setting Up Virtual Hosts
To create a virtual host, you will need to alter the virtual host lines, provided as an example in httpd.conf 
or create your own virtual host section. 

The virtual host example lines read as follows: 

#<VirtualHost  *>
#    ServerAdmin webmaster@dummy-host.example.com
#    DocumentRoot /www/docs/dummy-host.example.com
#    ServerName dummy-host.example.com
#    ErrorLog logs/dummy-host.example.com-error_log
#    CustomLog logs/dummy-host.example.com-access_log common
#</VirtualHost>
 

Uncomment all of the lines, and add the correct information for the virtual host. 
In the first line, change * to your server's IP address. Change the ServerName to a valid DNS name to use 
for the virtual host. 

You will also need to uncomment one of the NameVirtualHost lines below: 

NameVirtualHost *

Next change the IP address to the IP address, and port if necessary, for the virtual host. When finished it will 
look similar to the following example: 

NameVirtualHost 192.168.1.1:80 
 
If you set up a virtual host and want it to listen on a non-default port, you will need to set up a virtual host 
for that port and add a Listen directive for corresponding to that port. 

Then add the port number to the first line of the virtual host configuration as in the following example: 

<VirtualHost ip_address_of_your_server:12331>

This line would create a virtual host that listens on port 12331. 
You must restart httpd to start a new virtual host. See the Section called Starting and Stopping httpd for 
instructions on how to start and stop httpd. 

 
73.2 Apache on SuSE:
--------------------

- Using Apache
To display static web pages with Apache, simply place your files in the correct directory. In SUSE LINUX, 
the correct directory is /srv/www/htdocs. A few small example pages may already be installed there. 
Use these pages to check if Apache was installed correctly and is currently active. Subsequently, you can 
simply overwrite or uninstall these pages. Custom CGI scripts are installed in /srv/www/cgi-bin. 

During operation, Apache writes log messages to the file /var/log/httpd/access_log or /var/log/apache2/access_log. 
These messages show which resources were requested and delivered at what time and with which method 
(GET, POST, etc.). Error messages are logged to /var/log/apache2. 

- Active Contents
Apache provides several possibilities for the delivery of active contents. Active contents are HTML pages 
that are generated on the basis of variable input data from the client, such as search engines that respond 
to the input of one or several search strings (possibly interlinked with logical operators like AND or OR) 
by returning a list of pages containing these search strings.

Apache offers three ways of generating active contents:

Server Side Includes (SSI) 
These are directives that are embedded in an HTML page by means of special comments. Apache interprets 
the content of the comments and delivers the result as part of the HTML page. 

Common Gateway Interface (CGI) 
These are programs that are located in certain directories. Apache forwards the parameters transmitted by the 
client to these programs and returns the output of the programs. This kind of programming is quite easy, 
especially since existing command-line programs can be designed in such a way that they accept input 
from Apache and return their output to Apache.

Module 
Apache offers interfaces for executing any modules within the scope of request processing. Apache gives these 
programs access to important information, such as the request or the HTTP headers. Programs can take part 
in the generation of active contents as well as in other functions (such as authentication). The programming 
of such modules requires some expertise. The advantages of this approach are high performance and possibilities 
that exceed those of SSI and CGI.

While CGI scripts are executed directly by Apache (under the user ID of their owner), modules are controlled 
by a persistent interpreter that is embedded in Apache. In this way, separate processes do not need to be 
started and terminated for every request (this would result in a considerable overhead for the process management, 
memory management, etc.). Rather, the script is handled by the interpreter running under the ID of the web server.

However, this approach has a catch. Compared to modules, CGI scripts are relatively tolerant of careless 
programming. With CGI scripts, errors, such as a failure to release resources and memory, do not have a 
lasting effect, because the programs are terminated after the request has been processed. This results in the 
clearance of memory that was not released by the program due to a programming error. With modules, the 
effects of programming errors accumulate, as the interpreter is persistent. If the server is not restarted 
and the interpreter runs for several months, the failure to release resources, such as database connections, 
can be quite disturbing.

Server Side Includes: SSI
Server-side includes are directives that are embedded in special comments and executed by Apache. 
The result is embedded in the output. For example, the current date can be printed with 
<!--#echo var="DATE_LOCAL" -->. The # at the end of the opening comment mark "<!--" shows Apache that this 
is an SSI directive and not a simple comment.

SSIs can be activated in several ways. The easiest approach is to search all executable files for SSIs. 

Another approach is to specify certain file types to search for SSI.

Common Gateway Interface: CGI
CGI is the abbreviation for Common Gateway Interface. With CGI, the server does not simply deliver 
a static HTML page, but executes a program that generates the page. This enables the generation of pages 
representing the result of a calculation, such as the result of the search in a database. By means of 
arguments passed to the executed program, the program can return an individual response page for every request.

The main advantage of CGI is that this technology is quite simple. The program merely must exist in a 
specific directory to be executed by the web server just like a command-line program. The server sends 
the program output on the standard output channel (stdout) to the client.

GET and POST
Input parameters can be passed to the server with GET or POST. Depending on which method is used, the server 
passes the parameters to the script in various ways. 

> With POST, the server passes the parameters to the program 
  on the standard input channel (stdin). The program would receive its input in the same way when 
  started from a console.

> With GET, the server uses the environment variable QUERY_STRING to pass the parameters to the program. 
  An environment variable is a variable made available globally by the system (such as the variable PATH, 
  which contains a list of paths the system searches for executable commands when the user enters a command).

Languages for CGI
Theoretically, CGI programs can be written in any programming language. Usually, scripting languages 
(interpreted languages), such as Perl or PHP, are used for this purpose. If speed is critical, 
C or C++ may be more suitable.

In the simplest case, Apache looks for these programs in a specific directory (cgi-bin). This directory 
can be set in the configuration file.

If necessary, additional directories can be specified. In this case, Apache searches these directories 
for executable programs. However, this represents a security risk, as any user will be able to 
let Apache execute programs (some of which may be malicious). If executable programs are restricted 
to cgi-bin, the administrator can easily see who places which scripts and programs in this directory 
and check them for any malicious intent.

Generating Active Contents with Modules
A variety of modules is available for use with Apache. The term "module" is used in two different senses. 

> First, there are modules that can be integrated in Apache for the purpose of handling specific functions, 
  such as modules for embedding programming languages. These modules are introduced below.

> Second, in connection with programming languages, modules refer to an independent group of functions, 
  classes, and variables. These modules are integrated in a program to provide a certain functionality, 
  such as the CGI modules available for all scripting languages. These modules facilitate the programming 
  of CGI applications by providing various functions, such as methods for reading the request parameters 
  and for the HTML output.

mod_perl
Perl is a popular, proven scripting language. There are numerous modules and libraries for Perl, including 
a library for expanding the Apache configuration file. The home page for Perl is http://www.perl.com/. 
A range of libraries for Perl is available in the Comprehensive Perl Archive Network (CPAN) at http://www.cpan.org/.


Setting up mod_perl
To set up mod_perl in SUSE LINUX, simply install the respective package (see Section 15.6. "Installation"). 
Following the installation, the Apache configuration file will include the necessary entries 
(see /etc/apache2/mod_perl-startup.pl). Information about mod_perl is available at http://perl.apache.org/.

mod_perl versus CGI
In the simplest case, run a previous CGI script as a mod_perl script by requesting it with a different URL. 
The configuration file contains aliases that point to the same directory and execute any scripts it contains 
either via CGI or via mod_perl. All these entries already exist in the configuration file. The alias entry for 
CGI is as follows:

ScriptAlias /cgi-bin/ "/srv/www/cgi-bin/"
The entries for mod_perl are as follows:

<IfModule mod_perl.c> 
# Provide two aliases to the same cgi-bin directory, 
# to see the effects of the 2 different mod_perl modes. 
# for Apache::Registry Mode 
ScriptAlias /perl/          "/srv/www/cgi-bin/" 
# for Apache::Perlrun Mode 
ScriptAlias /cgi-perl/      "/srv/www/cgi-bin/" 
</IfModule> 

The following entries are also needed for mod_perl. These entries already exist in the configuration file.

#
# If mod_perl is activated, load configuration information
#
<IfModule mod_perl.c>
Perlrequire /usr/include/apache/modules/perl/startup.perl
PerlModule Apache::Registry

#
# set Apache::Registry Mode for /perl Alias
#
<Location /perl>
SetHandler  perl-script
PerlHandler Apache::Registry
Options ExecCGI
PerlSendHeader On
</Location>

#
# set Apache::PerlRun Mode for /cgi-perl Alias
#
<Location /cgi-perl>
SetHandler  perl-script
PerlHandler Apache::PerlRun
Options ExecCGI
PerlSendHeader On
</Location>

</IfModule>

These entries create aliases for the Apache::Registry and Apache::PerlRun modes. The difference between these 
two modes is as follows:

Apache::Registry 
All scripts are compiled and kept in a cache. Every script is applied as the content of a subroutine. 
Although this is good for performance, there is a disadvantage: the scripts must be programmed extremely 
carefully, as the variables and subroutines persist between the requests. This means that you must reset 
the variables to enable their use for the next request. If, for example, the credit card number of a customer 
is stored in a variable in an online banking script, this number could appear again when the next customer 
uses the application and requests the same script.

Apache::PerlRun 
The scripts are recompiled for every request. Variables and subroutines disappear from the namespace between 
the requests (the namespace is the entirety of all variable names and routine names that are defined at a 
given time during the existence of a script). Therefore, Apache::PerlRun does not necessitate painstaking 
programming, as all variables are reinitialized when the script is started and no values are kept from previous 
requests. For this reason, Apache::PerlRun is slower than Apache::Registry but still a lot faster than CGI 
(in spite of some similarities to CGI), because no separate process is started for the interpreter.

mod_php4
PHP is a programming language that was especially developed for use with web servers. In contrast to other languages 
whose commands are stored in separate files (scripts), the PHP commands are embedded in an HTML page 
(similar to SSI). The PHP interpreter processes the PHP commands and embeds the processing result in the HTML page.

The home page for PHP is http://www.php.net/. For PHP to work, install mod_php4-core and, in addition, 
apache2-mod_php4 for Apache 2. 

mod_python
Python is an object-oriented programming language with a very clear and legible syntax. An unusual but convenient 
feature is that the program structure depends on the indentation. Blocks are not defined with braces (as in C and 
Perl) or other demarcation elements (such as begin and end), but by their level of indentation. The package to 
install is apache2-mod_python.

More information about this language is available at http://www.python.org/. For more information about mod_python, 
visit the URL http://www.modpython.org/.

mod_ruby
Ruby is a relatively new, object-oriented high-level programming language that resembles certain aspects of Perl 
and Python and is ideal for scripts. Like Python, it has a clean, transparent syntax. On the other hand, Python 
has adopted abbreviations, such as $.r for the number of the last line read in the input file - a feature that 
is welcomed by some programmers and abhorred by others. The basic concept of Ruby closely resembles Smalltalk.



74. Distributed shell:
======================

Note 1:
-------

DSH - distributed shell
dsh is a program which runs a single command on multiple computers at the same time. It was designed 
as a cluster tool for beowulf-style supercomputers.
 
The link address is: http://dsh.sf.net/ 

Note 2:
-------

dsh12003 Sep 17Debian-Beowulf/DancerDancer Tools reference
NAME 

dsh - Distributed shell, or dancer's shell 
SYNOPSIS 

dsh [-m machinename | -a | -g groupname ] [-r remoteshellname ] [-c | -w | -i | -F forklimit ] -- commandline 
DESCRIPTION 

dsh executes command remotely on several different machines at the same time. An utility to effectively do a 
for a in $(seq 1 10); do rsh $a command; done in bourne shell. 

OPTIONS 

The options available are as follows. 
--verbose | -v 
Give verbose output of the execution process. 

--quiet | -q 
Makes output quieter. 

--machine | -m [machinename[,machinename]*] 
Adds machinename to the list of machines that the command is exeuted. The syntax of machinename allows 
username@machinename where remote shell is invoked with the option to make it of username. 
From version 0.21.4, it is possible to specify in the format of "username@machinename,username@machinename,
username@machinename" so that multiple hosts can be specified with comma-delimited values. 

--all | -a 
Add all machines found in /etc/dsh/machines.list to the list of machines that the specified command is executed. 

--group groupname | -g groupname 
Add all machines found in /etc/dsh/group/ groupname to the list of machines that the specified command is executed. 
If groupname is on the form @netgroup then the machines in the given netgroup is used to specify the list of machines 
to execute on. 

--file machinefile | -f machinefile 
Add all machines found in the specified file to the list of machines that the specified command is executed. 
The file should list one machine specification per line (with the same syntax as the machinename argument). 
Lines starting with "#" are ignored. 
From version 0.21.4, Specifying the same machine several times using any of the machine specification options 
will result in multiple invocations merged into one. 

--remoteshell shellname | -r shellname 
Execute remote shell shellname as the remote shell. Usually any of "rsh", "remsh" or "ssh" are available 

--remoteshellopt rshoption | -o rshoption 
Add one option rshoption to the list of options passed on to the remote shell. 

--help | -h 
Output help message and exits. 

--wait-shell | -w 
Executes on each machine and waits for the execution finishing before moving on to the next machine. 

--concurrent-shell | -c 
Executes shell concurrently. 

--show-machine-names | -M 
Prepends machine names on the standard output. Useful to be used in conjunction with the --concurrent-shell option 
so that the output is slightly more parsable. 

--duplicate-input | -i 
Duplicates the input to dsh process to individual process that are remotely invoked. Needs to have --concurrent-shell set. 
Due to limitations in current implementation, it is only useful for running shell. Terminate the shell session 
with ctrl-D. 

--bufsize | -b [ buffer-size in bytes ] 
Sets the buffer size used in replicating input for --duplicate-input option. 

--version | -V 
Outputs version information and exits. 

--num-topology | -N 
Changes the current topology from 1. 1 is the default behavior of spawning the shell from one node to every node. 
Changing the number to a value greater than 2 would result in dsh being spawned on other machines as well. 

--forklimit | -F fork limit 
Similar to -c with a limit on the number of simultaneous connections. dsh will wait before creating new connection 
if the limit is reached. Useful when the number of nodes to be accessed is going somewhere above 200, 
and using -N option is not possible. 

EXIT STATUS 

The first non-zero exit code of child processes is returned, or zero if none returned non-zero exit code. 
1 if error is found in command-line specifications. 2 if signal is received from child processes. 


EXAMPLES 

dsh -a w 
Shows list of users logged in on all workstations. 


dsh -r ssh -a -- w 
Shows list of users logged in on all workstations, and use ssh command to connect. 
(It should be of note that when using ssh, ssh-agent is handy.) 

dsh -r ssh -m node1 -m node2 -c -- 'echo $HOSTNAME $(cat/proc/loadavg )' 
Shows the load average of machines node1 and node2. 


FILES 

/etc/dsh/machines.list | $(HOME)/.dsh/machines.list 
List of machine names to be used for when -a command-line option is specified. 


/etc/dsh/group/ groupname | $(HOME)/.dsh/group/ groupname 
List of machine names to be used for when -g groupname command-line option is specified. 


/etc/dsh/dsh.conf | $(HOME)/.dsh/dsh.conf 
Configuration file containing the day-to-day default. 


Note 3:
-------

PSSP's distributed shell commands "dsh" and "dshbak" are now standard in AIX 5.2. They run commands in parallel 
on multiple hosts, and format the output. The dsh commands greatly simplify managing server farms. 

The set of nodes to which commands are sent can be set on the command line or by the contents of a file named 
by the DSH_LIST environment variable. 

Here are a couple simple examples how these commands can be used. (Assume DSH_LIST has been set to the name of the 
file containing the list of servers. In this case, just three servers: dodgers, surveyor and pioneer) 

Check the clock setting on all servers: 

# dsh date
dodgers: Fri Jun  4 14:46:06 PDT 2004
surveyor: Fri Jun 4 14:16:18 PDT 2004
pioneer: Fri Jun  4 14:32:28 PDT 2004

Identify servers running fix IX37151 

# dsh "instfix -ik IX37659"
dodgers:    There was no data for IX37659 in the fix database
surveyor:    All filesets for IY37659 were found
pioneer:     All filesets for IY37659 were found

Check the hardware error logs on all servers starting 6/4/04 

# dsh "errpt -s 0604000004" 

Or check the OS level on each server: 


# dsh "lslpp -L bos.rte | grep bos.rte"

You can also use "dshbak" to group common output from the # dsh command. This makes it easier to identify 
differences when you have a lot of servers. For example, we can consolidate the output of the above instfix command 
as follows. 


# dsh "lslpp -L bos.rte"  | dshbak
HOST: dodgers
---------------------
There was no data for IX37659 in the fix database.

HOST: surveyor, pioneer
----------------------------------
All filesets for IY37659 were found

Both commands are located in the /opt/csm/bin directory. They require a little customization. 
Check the AIX documentation for more information. 




=================
CLUSTER SECTIONS:
=================


========================================
75. General Parallel File System (GPFS):
========================================


Only AIX and Linux (pSeries) related.

General Parallel File System (GPFS) is a high performance "shared-disk file system" that can provide data access 
from nodes in a cluster environment. Parallel and serial applications can readily access shared files 
using standard UNIXr file system interfaces, and the same file can be accessed concurrently from multiple nodes. 
GPFS is designed to provide high availability through logging and replication, and can be configured for failover 
from both disk and server malfunctions.

GPFS operates often within the context of a HACMP cluster, but you can build just GPFS "clusters" as well.


75.1 Creating a 2 node GPFS Cluster:
====================================

Suppose we have two nodes named node2 and node3. Our goal is to create a single GPFS filesystem,
named "/my_gpfs", consisting of 2 disks used for data and metadata. These disks are housed by two
DS4300 storage subsystems. A tiebreaker disk, in a seperate DS4100, will be used to maintain node quorom
during single nodes failures. Additionally, a "filesystem descriptor" disk for /my_gpfs is located
at the same site.

Servers: 2 Nodes= 2 x lpar; per lpar 1 cpu, 2GB RAM, 2 x FC adapter, 2 x Ethernet adapter
Storage: 2 x DS4300 for GPFS and data, 1 x DS4100 for tiebreaker disk 

Suppose further that the nodes uses the following IP addresses:
Node2: 10.1.1.32
Node3: 10.1.1.33

The Ethernet adapters per Server, are Aggregated, or configured in NIB (backup standby mode).


  Note : What are Tiebreaker disks?

  GPFS can use two types of quorum mechanisms in order to determine service availability:
  - Disk quorom
  - Node quorom

  In case availability of either of these resources is less or equal to 50%, GPFS file system services are
  automatically stopped.

  When node quorom is not met, GPFS stops its cluster-wide services and access to all filesystems
  within the cluster is no longer possible. If less than 50% of disks serving a GPFS file system fail,
  disk quorom, that is the number of "filesystem descriptors" for that particular file system, 
  is no longer met and the filesystem will be unmounted.

  To eliminate the need of a tiebreaker node, as from GPFS 2.3, a new node quorom mechanism was introduced
  for a two node cluster. Its called a tiebreaker disk. 
  If one of the two nodes goes down, we still have "enough" node qourom to keep the GPFS system running.
  Basically, a tiebreaker disk replaces a "tiebreaker node".


-- Preparations:
-- -------------

1. The systems have AIX >= 5.3ML2 installed, and gpfs.base.xxxx installed
2. Make sure names resolution is ok, either by DNS or by /etc/hosts
3. Sync the system clocks, for example by NTP
4. Make sure rcp, ssh, scp is working (via ./rhosts etc.. or ssh protocols)
5. A distributed shell (DSH) is installed on each node.
6. During cluster setup some configuration files may be created and used with GPFS commands.
   These files reside in a subdirectories in /var/mmfs.

example:

root@starboss:/var/mmfs/etc#cat mmfs.cfg
#
#   WARNING:   This is a machine generated file.  Do not edit!
#   Use the mmchconfig command to change configuration parameters.
#
clusterName cluster_name.starboss
clusterId 729741152660153204
clusterType lc
autoload no
useDiskLease yes
maxFeatureLevelAllowed 912
tiebreakerDisks gpfs3nsd;gpfs4nsd
[zd110l13]
takeOverSdrServ yes



--  Creating the GPFS cluster:
-- ---------------------------

The first step is to create a GPFS cluster named TbrCl using the command:

# mmcrcluster -n /var/mmfs/conf/nodefile -p node2 -s node3 -C TbrCl -A

A file called "nodefile" contains the cluster node information, describing the function of each node:

  # Node2 can be a file system manager and is relevant for GPFS quorum
  node2:manager-quorom 
  # Node3 can be a file system manager and is relevant for GPFS quorum
  node3:manager-quorom

Each node can fullfill the function of a file system manager and is relevant for maintaining node quorom.
A GPFS cluster designates a primary cluster manager (node2) and appoints a backup (node3) in case the
primary fails. Cluster services will be started automatically during node boot (-A). After successfully
creating the cluster, you can verify your setup:

# mmlscluster 

  GPFS cluster information
  ========================

  GPFS cluster name:		TbrCl.node2
  GPFS cluster id:		720858653441148399
  GPFS UID domain:		TbrCl.node2
  Remote shell command:		/usr/bin/rsh
  Remote file copy command:	/usr/bin/rcp

  GPFS cluster configuration servers:
  -----------------------------------
  Primary server:		node2
  Secondary server: 		node3

  Node number Node name IP address    Full node name    Remarks
  -------------------------------------------------------------
  1           node2     10.1.1.32     node2              quorom node
  2           node3     10.1.1.33     node3              quorom node



The GPFS daemon has to be started on all nodes:

# mmstartup -a

With GPFS you can administer the whole cluster from any cluster node. After starting GPFS services you
should examine the state of the cluster:

# mmgetstate -aL

  Node number Node name Quorom    Nodes up  Total nodes GPFS state
  -------------------------------------------------------------
  1           node2     2         2         2           active    
  2           node3     2         2         2           active


At this point, the cluster software is running, but you haven't done anything yet on the filesystems.



-- Configuring GPFS disks
-- ----------------------

Before starting with the configuration of GPFS disks, you have to make sure that each cluster node has
access to each SAN attached disk when running in a shared disk environment. With AIX 5L, you can use
the lspv command to verify your disks (hdisk) are properly configured:

# lspv

hdisk2   none     none
hdisk3   none     none
hdisk4   none     none
hdisk5   none     none

If you look for LUN related information (e.g. volume names) issue the following command against a
dedicated hdisk:

# lsattr -El hdisk2

..
.... (in the output, you will also see SAN stuff)
..


Its very important to keep a well balanced disk configuration when using GPFS because this makes sure
you get optimal performance by distributing I/O requests evenly among storage subsystems and attached
data disks. Keep in mind that all GPFS disks belonging to a particular file system should be of same size.


GPFS uses a mechanism called Network Shared Disk (NSD) to provide file system access to cluster nodes,
which do not have direct physical access to file system disks. A diskless node accesses an NSD via the
cluster network and I/O operations are handled as if they run against a directly attached disk from
an operating systems perspective. A special device driver handles data shipping using the cluster network.
NSDs can also be used in a purely SAN based GPFS configuration where each node can directly access
any disk. In case a node looses direct disk access, it automatically switches to NSD-mode, sending I/O
requests via network to other direct direct disk attached nodes. This mechanism increases file system
availability, and should normally be used.

When using NSD, a primary and a backup server are assigned to each NSD. In case a node looses its
direct disk attachment, it contacts the primary NSD server, or backup server in case the primary
is not available.

In order to establish NSD you need to create "descriptor files" in order to describe each 
disk functionality. In our example, we will use the following file:

# cat /var/mmfs/conf/diskfile          

  #Description of disk attributes
  #<disk name>:<primary NSD server>:<2ndary NSD server>:<disk usage>:<failure group>:<NSD name>

  #Data and metadata disk for /my_gpfs, site A, DS4300_1
  hdisk2:node2:node3:dataAndMetadata:1:

  #Data and metadata disk for /my_gpfs, site B, DS4300_2
  hdisk3:node3:node2:dataAndMetadata:2:

  #File system descriptor disk for /my_gpfs, site C, DS4100
  hdisk4:::descOnly:3:

  #Tiebreaker disk, site C, DS4100
  hdisk5:::descOnly:-1:

Here, our cluster uses 4 disks with GPFS. Filesystem "/my_gpfs" uses hdisk2 and hdisk3 for data and metadata.
Therefore these disks will use the NSD mechanism to provide file system data access in case direct disk access
fails on one of the cluster nodes.
Node2 is the primary NSD server for hdisk2 with node3 being its backup. The same is true for hdisk3, but then
the other way around.
Each of these disks belongs to a different "failure group" (1=site A, 2=site B) which basically enables
replication of file system data and metadata between the two sites.

After successfully creating the "disk descriptor file", the following command is used to define the NSDs:


# mmcrnsd -F /var/mmfs/conf/diskfile -v yes


GPFS assigns a Physical Volume ID PVID to each of the disks. This information is written to sector 2
on the AIX5L hdisk. Since GPFS uses its own PVIDs, do not confuse them with AIX5L PVIDs.

After a successful creation of the NSDs, you can verify your setup using the mmlsnsd command:


# mmlsnsd -aL

File system    Disk name     NSD Volume ID     Primary node         Backup node
-------------------------------------------------------------------------------
(free disk)    gpfs1nsd      099CAF2043A04625  node2                node3
(free disk)    gpfs2nsd      099CAF2043A04627  node3                node2
(free disk)    gpfs3nsd      099CAF2043A04628  (directly attached)
(free disk)    gpfs4nsd      099CAF2043A04629  (directly attached)

During NSD creation, the diskfile was rewritten. Each hdisk stanza is commented out, and a
equivalent NSD stanza is inserted.

  #<disk name>:<primary NSD server>:<2ndary NSD server>:<disk usage>:<failure group>:<NSD name>

  #Data and metadata disk for /my_gpfs, site A, DS4300_1
  #hdisk2:node2:node3:dataAndMetadata:1:
  gpfs1nsd:::dataAndMetadata:1

  #Data and metadata disk for /my_gpfs, site B, DS4300_2
  #hdisk3:node3:node2:dataAndMetadata:2:
  gpfs2nsd:::dataAndMetadata:2

  #File system descriptor disk for /my_gpfs, site C, DS4100
  #hdisk4:::descOnly:3:
  gpfs3nsd:::descOnly:3

  #Tiebreaker disk, site C, DS4100
  #hdisk5:::descOnly:-1:
  gpfs4nsd:::descOnly:-1


After issuing the mmcrnsd command, we have made the disks available and ready to create GPFS filesystems.

`
-- Activating tiebreaker mode
-- --------------------------

When using a two node cluster with tiebraker disks, the cluster configuration must be switched
to tiebreaker mode. Ofcourse you need to know which disks are being used as tiebreaker disks.
Up to 3 disks are allowed. In our example, gpfs4nsd (that is hdisk5) is the only tiebreaker disk.
With the following command sequence, tiebreaker mode is turned on:

# mmshutdown -a
# mmstartup -a

A 2 node cluster running in tiebreaker mode can easily be identified by running the following command:

# mmgetstate -aL


  Node number Node name   Quorom    Nodes up  Total nodes GPFS state
  ---------------------------------------------------------------
  1           node2       1*        2         2           active    
  2           node3       1*        2         2           active


If the quorum information is displayed as "1*", this is a 2 node tiebreaker disk cluster.
Another nice command to check the status of the cluster is "mmlsconfig".

# mmlsconfig

  Configuration data for cluster TbrCl.node2:
  -------------------------------------------
  ClusterName TbrCl.node2
  ClusterId 8262362723390
  ClusterType 1c
  Multinode yes
  autoload yes
  useDiskLease yes
  MaxFeatureLevelAllowed 809
  tiebreakerDisks gpfs4nsd



-- Creating a GPFS Filesystem
-- --------------------------

GPFS generally maintains at least 3 filesystem descriptors, or quorum, per filesystem.
Best would be, to have the descriptors distributed over many disks. But you might have
only 2 disks, resulting in 2 copies on one disk, and 1 copy on the other disk.
That would be an unbalanced situation. GPFS always verifies if more than 50% of the
filesystem disks are available, and if not, it will unmount the filesystem.

Before we can create the /my_gpfs filesystem we need to prepare a file named "fsdisks_mygpfs"
describing all disks belonging to the filesystem.
In our example, we use only 2 disks for the filesystem, but we like to have a balanced situation
with at least 3 descriptor area's. For this, we can use "#hdisk4:::descOnly:3:"
as shown before as an entry in the "nsd diskfile".

Our "fdisk_mygpfs" looks like this:

  #<disk name>:<primary NSD server>:<2ndary NSD server>:<disk usage>:<failure group>:<NSD name>

  #Data and metadata disk for /my_gpfs, site A, DS4300_1
  gpfs1nsd:::dataAndMetadata:1

  #Data and metadata disk for /my_gpfs, site B, DS4300_2
  gpfs2nsd:::dataAndMetadata:2

  #File system descriptor disk for /my_gpfs, site C, DS4100
  gpfs3nsd:::descOnly:3


The next step is to create the file system:

# mmcrfs /my_gpfs /dev/my_gpfs -F /var/mmfs/conf/fdisk_mygpfs -A yes -m2 -M2 -r2 -R2 -v yes


The mountpoint is /my_gpfs and a device called /dev/my_gpfs is created. The option -F is used to specify
a configuration file describing the filesystem's NSDs. We want this filesystem to be mounted automatically
during startup (-A yes). When designing our cluster, we decided to use data and metadata replication (-r2,-m2)
to provide high availability.

If you intend to create several filesystems within your cluster, repeat all the steps as shown above.



-- mounting a GPFS Filesystem
-- --------------------------

Filesystem "/my_gpfs" will be mounted on each of the cluster nodes using the command:

# dsh -a mount -t mmfs

The command dsh is the Distributed Shell, wich should be available on your AIX53 systems.
Your GPFS filesystem is also registered in /etc/filesystems. Also, standard AIX commands can be used against
the GPFS filesystems, like for example:

# dsh -w node2,node3 df -k /my_gpfs

Filesystem /my_gpfs is now available to both nodes with all three file system descripters being well
balanced across failure groups and disks.

# mmlsdisk my_gpfs

disk            driver     sector   failure   holds    holds 
name            type       size     group     metadata data  status    availability  disk id  remarks
-----------------------------------------------------------------------------------------------------
gpfs1nsd        nsd        512      1         yes      yes   ready     up             1       desc
gpfs2nsd        nsd        512      2         yes      yes   ready     up             2       desc
gpfs3nsd        nsd        512      3         no       no    ready     up             3       desc





Notes:
------

Note 1: SDD driver

Subsystem Device Driver, SDD, is a pseudo driver designed to support the multipath configuration environments
in the IBM Totalstorage Enterprise Storage Server, the IBM TotalStorage DS family, and the IBM System Storage
SAN Volume Controller.  
You can see this driver installed, for example, in HACMP and GPFS systems.
 
At this time, SSD version 1.6.1.0 is not supported by VIOS. Ofcourse, this might change later.

Note 2: pv listing:

In a gpfs cluster, a lspv might show output like the following example:

root@zd110l13:/root# lspv
hdisk0          00cb61fe0b562af0                    rootvg          active
hdisk1          00cb61fe0fb40619                    rootvg          active
hdisk2          00cb61fe33429fa6                    vge0corddap01   active
hdisk3          00cb61fe3342a096                    vge0corddap01   active
hdisk4          00cb61fe3342a175                    gpfs3nsd
hdisk5          00cb61fe33536125                    gpfs4nsd

root@zd110l13:/root# mmlsnsd -aL

 File system   Disk name    NSD volume ID      Primary node             Backup node
---------------------------------------------------------------------------------------------
 gpfsfs0       gpfs3nsd     0A208FB64650A409   zd110l13                 zd110l14.nl.eu.abnamro.com
 gpfsfs0       gpfs4nsd     0A208FB64650A40D   zd110l13                 zd110l14.nl.eu.abnamro.com


Note 3: Other examples:

Other Examples of registration of a GPFS fileystem in /etc/filesystems:

..
..
/data/documentum/dmadmin:
        dev             = /dev/gpfsfs0
        vfs             = mmfs
        nodename        = -
        mount           = mmfs
        type            = mmfs
        account         = false
        options         = rw,mtime,atime,dev=gpfsfs0
..
..


root@zd110l13:/etc# mmlsdisk /dev/gpfsfs0

disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
gpfs3nsd     nsd         512       1 yes      yes   ready         up           system
gpfs4nsd     nsd         512       2 yes      yes   ready         up           system






75.2 GPFS commands:
===================


75.2.1. The mmcrcluster Command:
--------------------------------

Name
mmcrcluster - Creates a GPFS cluster from a set of nodes.

Synopsis
mmcrcluster -n NodeFile -p PrimaryServer [-s SecondaryServer] [-r RemoteShellCommand] 
               [-R RemoteFileCopyCommand] [-C ClusterName] [-U DomainName] [-A] [-c ConfigFile]

Description
Use the mmcrcluster command to create a GPFS cluster.

Upon successful completion of the mmcrcluster command, the /var/mmfs/gen/mmsdrfs and the /var/mmfs/gen/mmfsNodeData 
files are created on each of the nodes in the cluster. Do not delete these files under any circumstances. 
For further information, see the General Parallel File System: Concepts, Planning, and Installation Guide.

You must follow these rules when creating your GPFS cluster:

While a node may mount file systems from multiple clusters, the node itself may only be added to a single cluster 
using the mmcrcluster or mmaddnode command. 
The nodes must be available for the command to be successful. If any of the nodes listed are not available 
when the command is issued, a message listing those nodes is displayed. You must correct the problem on each node 
and issue the mmaddnode command to add those nodes. 
You must designate at least one node as a quorum node. You are strongly advised to designate the cluster 
configuration servers as quorum nodes. How many quorum nodes altogether you will have depends on whether 
you intend to use the node quorum with tiebreaker algorithm. or the regular node based quorum algorithm. 
For more details, see the General Parallel File System: Concepts, Planning, and Installation Guide and 
search for designating quorum nodes.

Parameters
-A 
Specifies that GPFS daemons are to be automatically started when nodes come up. The default is not to start 
daemons automatically. 
-C ClusterName 
Specifies a name for the cluster. If the user-provided name contains dots, it is assumed to be a fully 
qualified domain name. Otherwise, to make the cluster name unique, the domain of the primary configuration 
server will be appended to the user-provided name. 
If the -C flag is omitted, the cluster name defaults to the name of the primary GPFS cluster configuration server.

-c ConfigFile 
Specifies a file containing GPFS configuration parameters with values different than the documented defaults. 
A sample file can be found in /usr/lpp/mmfs/samples/mmfs.cfg.sample. See the mmchconfig command for a detailed 
description of the different configuration parameters. 
The -c ConfigFile parameter should only be used by experienced administrators. Use this file to only set up 
parameters that appear in the mmfs.cfg.sample |file. Changes to any other values may be ignored by GFPS. 
When in doubt, use the mmchconfig command instead.

-n NodeFile 
NodeFile consists of a list of node descriptors, one per line, to be included in the GPFS cluster. 
Node descriptors are defined as: 

NodeName:NodeDesignationswhere: 

NodeName is the hostname or IP address to be used by GPFS for node to node communication. 
The hostname or IP address must refer to the communications adapter over which the GPFS daemons communicate. 
Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command 
to that original address. You may specify a node using any of these forms:

Format Example 
Short hostname   k145n01 
Long hostname    k145n01.kgn.ibm.com 
IP address       9.119.19.102 

NodeDesignations is an optional, '-' separated list of node roles. 
manager | client   - Indicates whether a node is part of the pool of nodes from which configuration and 
                     file system managers are selected. The default is client. 
quorum | nonquorum - Indicates whether a node is to be counted as a quorum node. The default is nonquorum.

You must provide a descriptor for each node to be added to the GPFS cluster.

-p PrimaryServer 
Specifies the primary GPFS cluster configuration server node used to store the GPFS configuration data. 
This node must be a member of the GPFS cluster. 
-R RemoteFileCopy 
Specifies the fully-qualified path name for the remote file copy program to be used by GPFS. The default value is 
/usr/bin/rcp. 
The remote copy command must adhere to the same syntax format as the rcp command, but may implement an 
alternate authentication mechanism.

-r RemoteShellCommand 
Specifies the fully-qualified path name for the remote shell program to be used by GPFS. The default value is 
/usr/bin/rsh. 
The remote shell command must adhere to the same syntax format as the rsh command, but may implement an 
alternate authentication mechanism.

-s SecondaryServer 
Specifies the secondary GPFS cluster configuration server node used to store the GPFS cluster data. 
This node must be a member of the GPFS cluster. 
It is suggested that you specify a secondary GPFS cluster configuration server to prevent the loss of 
configuration data in the event your primary GPFS cluster configuration server goes down. When the GPFS daemon 
starts up, at least one of the two GPFS cluster configuration servers must be accessible.

If your primary GPFS cluster configuration server fails and you have not designated a secondary server, 
the GPFS cluster configuration files are inaccessible, and any GPFS administrative commands that are issued fail. 
File system mounts or daemon startups also fail if no GPFS cluster configuration server is available.

-U DomainName 
Specifies the UID domain name for the cluster. 
A detailed description of the GPFS user ID remapping convention is contained in UID Mapping for GPFS In a 
Multi-Cluster Environment at www.ibm.com/servers/eserver/clusters/library/wp_aix_lit.html.

Exit status

0 
Successful completion. 
1 
A failure has occurred. 

Security
You must have root authority to run the mmcrcluster command.

You may issue the mmcrcluster command from any node in the GPFS cluster.

A properly configured .rhosts file must exist in the root user's home directory on each node in the GPFS cluster. 
If you have designated the use of a different remote communication program on either the mmcrcluster or the 
mmchcluster command, you must ensure:

Proper authorization is granted to all nodes in the GPFS cluster. 
The nodes in the GPFS cluster can communicate without the use of a password, and without any extraneous messages.


Example 1:
----------

To create a GPFS cluster made of all of the nodes listed in the file /u/admin/nodelist, using node k164n05 
as the primary server, and node k164n04 as the secondary server, issue:

# mmcrcluster  -n /u/admin/nodelist -p k164n05 -s k164n04

where /u/admin/nodelist has the these contents:

k164n04.kgn.ibm.com:quorum
k164n05.kgn.ibm.com:quorum
k164n06.kgn.ibm.com

The output of the command is similar to:

Mon Aug  9 22:14:34 EDT 2004: 6027-1664 mmcrcluster: Processing node
                              k164n04.kgn.ibm.com
Mon Aug  9 22:14:38 EDT 2004: 6027-1664 mmcrcluster: Processing node 
                              k164n05.kgn.ibm.com
Mon Aug  9 22:14:42 EDT 2004: 6027-1664 mmcrcluster: Processing node 
                              k164n06.kgn.ibm.com
mmcrcluster: Command successfully completed
mmcrcluster: 6027-1371 Propagating the changes to all affected.
                       nodes. This is an asynchronous process.

To confirm the creation, enter: 

# mmlscluster

The system displays information similar to:

GPFS cluster information
========================
  GPFS cluster name:         k164n05.kgn.ibm.com
  GPFS cluster id:           680681562214606028
  GPFS UID domain:           k164n05.kgn.ibm.com
  Remote shell command:      /usr/bin/rsh
  Remote file copy command:  /usr/bin/rcp

GPFS cluster configuration servers:
-------------------------------------
  Primary server:    k164n05.kgn.ibm.com
  Secondary server:  k164n04.kgn.ibm.com

 Node number  Node name  IP address      Full node name       Remarks

--------------------------------------------------------------------------
       1      k164n04    198.117.68.68   k164n04.kgn.ibm.com  quorum node
       2      k164n05    198.117.68.69   k164n05.kgn.ibm.com  quorum node
       3      k164n06    198.117.68.70   k164n06.kgn.ibm.com  


Example 2:
----------

# mmcrcluster  -n /home/root/nodelist -p zcnodeb -s n5nodea -r /usr/bin/rsh 
  -R /usr/bin/rcp -C MDLPR -A

Where the -C option determines the clustername.

You can start the cluster (GPFS daemon) by using

# mmstartup -a

Check if all nodes are registered in the cluster

# mmlscluster




75.2.2 Other GPFS commands:
---------------------------

The most common gpfs commands, will be illustrated by examples.


-- List cluster info: mmlscluster
-- ------------------------------

# mmlscluster

The system displays information similar to:

GPFS cluster information
========================
  GPFS cluster name:         k164n05.kgn.ibm.com
  GPFS cluster id:           680681562214606028
  GPFS UID domain:           k164n05.kgn.ibm.com
  Remote shell command:      /usr/bin/rsh
  Remote file copy command:  /usr/bin/rcp

GPFS cluster configuration servers:
-------------------------------------
  Primary server:    k164n05.kgn.ibm.com
  Secondary server:  k164n04.kgn.ibm.com

 Node number  Node name  IP address      Full node name       Remarks

--------------------------------------------------------------------------
       1      k164n04    198.117.68.68   k164n04.kgn.ibm.com  quorum node
       2      k164n05    198.117.68.69   k164n05.kgn.ibm.com  quorum node
       3      k164n06    198.117.68.70   k164n06.kgn.ibm.com  



-- Retrieving the Cluster status:
-- ------------------------------

# mmgetstate -aL

  Node number Node name Quorom    Nodes up  Total nodes GPFS state
  -------------------------------------------------------------
  1           node2     2         2         2           active    
  2           node3     2         2         2           active


-- Retreiving config data of the Cluster:
-- --------------------------------------

# mmlsconfig

  Configuration data for cluster TbrCl.node2:
  -------------------------------------------
  ClusterName TbrCl.node2
  ClusterId 8262362723390
  ClusterType 1c
  Multinode yes
  autoload yes
  useDiskLease yes
  MaxFeatureLevelAllowed 809
  tiebreakerDisks gpfs4nsd


root@zd110l13:/root#mmlsconfig
Configuration data for cluster cluster_name.zd110l13:
-----------------------------------------------------
clusterName cluster_name.zd110l13
clusterId 729741152660153204
clusterType lc
autoload no
useDiskLease yes
maxFeatureLevelAllowed 912
tiebreakerDisks gpfs3nsd;gpfs4nsd
[zd110l13]
takeOverSdrServ yes

File systems in cluster cluster_name.zd110l13:
----------------------------------------------
/dev/gpfsfs0


root@zd110l13:/var/adm/ras#df -k | grep /dev/gpfsfs0
/dev/gpfsfs0   2097152000 2009668608    5%   101193     5% /data/documentum/dmadmin



-- Change the status of a disk, and listing status: mmchdisk and mmlsdisk
-- ----------------------------------------------------------------------

You can even simulate the loss of a NSD disk from a Cluster, for example

# mmchdisk my_gpfs stop -d "gpfs1nsd"
# mmlsdisk my_gpfs -L

disk            driver     sector   failure   holds    holds 
name            type       size     group     metadata data  status    availability  disk id  remarks
-----------------------------------------------------------------------------------------------------
gpfs1nsd        nsd        512      1         yes      yes   ready     down           1       desc
gpfs2nsd        nsd        512      2         yes      yes   ready     up             2       desc
gpfs3nsd        nsd        512      3         no       no    ready     up             3       desc

We have used the example of the 2 node cluster of section 74.1 here. Since the quorom is still met,
even with one disk "down", the service is still working.


-- Changes GPFS cluster configuration data. 
-- ----------------------------------------

The mmchcluster command serves different purposes: 

Change the primary or secondary GPFS cluster data server. 
Synchronize the primary GPFS cluster data server. 
Change the remote shell and remote file copy programs to be used by the nodes in the cluster. 

To change the primary GPFS server for the cluster, enter: 

# mmchcluster -p k145n03

 
-- Changes the attributes of a GPFS file system
-- --------------------------------------------

Use the mmchfs command to change the attributes of a GPFS file system.

With the mmchfs command, you can for example change the number of inodes of GPFS filesystem, like
for example


# mmchfs gpfsfs0 -F 856064:856064

Now list the properties of the gpfsfs0 filesystem:


# mmdf /dev/gpfsfs0
disk                disk size  failure holds    holds              free KB             free KB
name                    in KB    group metadata data        in full blocks        in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system
gpfs3nsd              7340032        1 yes      yes         5867008 ( 80%)        434992 ( 6%)
gpfs1nsd            314572800        1 yes      yes       268067328 ( 85%)      17170032 ( 5%)
gpfs2nsd            115343360        1 no       no                0 (  0%)             0 ( 0%)
                -------------                         -------------------- -------------------
(pool total)        437256192                             273934336 ( 63%)      17605024 ( 4%)

                =============                         ==================== ===================
(total)             437256192                             273934336 ( 63%)      17605024 ( 4%)

Inode Information
-----------------
Number of used inodes:          177011
Number of free inodes:          679053
Number of allocated inodes:     856064
Maximum number of inodes:       856064
                               2048006


# mmdf /dev/gpfsfs0
# mmchfs gpfsfs0 -F 2457612:2457612
                    1228806


mmchfs ommand syntax:

mmchfs Device [-A {yes | no | automount}] [-E {yes | no}] [-D {nfs4 | posix}] 
              [-F MaxNumInodes[:NumInodesToPreallocate]] [-k {posix | nfs4 | all}] 
              [-K {no | whenpossible | always}] [-m DefaultMetadataReplicas] 
              [-o MountOptions] [-Q {yes | no}]
              [-r DefaultDataReplicas] [-S {yes | no} ] [-T Mountpoint]
              [-V] [-z {yes | no}]



To change the default replicas for metadata to 2 and the default replicas for data to 2 for new files 
created in the fs0 file system, enter:

# mmchfs fs0 -m 2 -r 2

To confirm the change, enter:

# mmlsfs fs0 -m -r

The system displays information similar to:

flag value          description
---- -------------- -----------------------------------
 -m  2              Default number of metadata replicas
 -r  2              Default number of data replicas


With the mmchfs command, you can for example also change the number of inodes of GPFS filesystem, like
for example

# mmchfs gpfsfs0 -F 856064:856064



More examples:


-- Add a node to the cluster
-- -------------------------

The mmaddnode command adds nodes to a GPFS cluster.
Use the mmaddnode command to add nodes to an existing GPFS cluster. On each new node a mount point directory
and character mode device is created for each GPFS filesystem.

Example:
To add the nodes "k164n06" and "k164n07" as quorom nodes, designating "k164n06" to be available as 
manager node, use the following command:

# mmaddnode -N k164n06:quorom-manager,k164n07:quorom


-- Mounting and unmounting GPFS file
-- ----------------------------------

Use the mmmount and mmumount to mount or unmount GPFS filesystem on one or more nodes in the cluster.

Examples:

- To mount all GPFS filesystems on all of the nodes in the cluster:

# mmmount all -a

- To mount filesystem "fs2" read-only on the local node, use

# mmmount fs2 -o ro

- To mount fs1 on all NSD server nodes, use

# mmmount fs1 -N nsdnodes  

- To unmount fs1 on all nodes of the cluster, use

# mmumount fs1 -a



-- Creates cluster-wide names for Network Shared Disks (NSDs) used by GPFS
-- -----------------------------------------------------------------------

mmcrnsd -F DescFile [-v {yes |no}]

The mmcrnsd command is used to create cluster-wide names for NSDs used by GPFS.

This is the first GPFS step in preparing a disk for use by a GPFS file system. A disk descriptor file supplied 
to this command is rewritten with the new NSD names and that rewritten disk descriptor file can then be supplied 
as input to the mmcrfs command.

The name created by the mmcrnsd command is necessary since disks connected at multiple nodes may have differing 
disk device names in /dev on each node. The name uniquely identifies the disk. This command must be run 
for all disks that are to be used in GPFS file systems. The mmcrnsd command is also used to assign a 
primary and backup NSD server that can be used for I/O operations on behalf of nodes that do not have 
direct access to the disk.

To identify that the disk has been processed by the mmcrnsd command, a unique NSD volume ID is written on 
sector 2 of the disk. All of the NSD commands (mmcrnsd, mmlsnsd, and mmdelnsd) use this unique 
NSD volume ID to identify and process NSDs.

After the NSDs are created, the GPFS cluster data is updated and they are available for use by GPFS.

Examples:

To create your NSDs from the descriptor file nsdesc containing: 

 sdav1:k145n05:k145n06:dataOnly:4
 sdav2:k145n04::dataAndMetadata:5:ABC

enter:

# mmcrnsd -F nsdesc 



-- GPFS and inittab
-- ----------------

Usually, the following enry should be in place in /etc/inittab

mmfs:2:once:/usr/lpp/mmfs/bin/mmautoload >/dev/console 2>&1




75.3 Installing GPFS:
=====================

Installing GPFS V. 2.3 or v. 3.1


Installing GPFS on AIX 5L nodes
It is suggested you read Planning for GPFS and the GPFS FAQs at 
publib.boulder.ibm.com/infocenter/clresctr/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html.

Do not attempt to install GPFS if you do not have the prerequisites listed in Hardware requirements 
and Software requirements.

Ensure that the PATH environment variable on each node includes /usr/lpp/mmfs/bin.

The installation process includes:

-Files to ease the installation process 
-Verifying the level of prerequisite software 
-Installation procedures

>> Files to ease the installation process

Creation of a file that contains all of the nodes in your GPFS cluster prior to the installation of GPFS, 
will be useful during the installation process. Using either host names or IP addresses when constructing 
the file will allow you to use this information when creating your cluster through the mmcrcluster command.

For example, create the file /tmp/gpfs.allnodes, listing the nodes one per line: 

k145n01.dpd.ibm.com 
k145n02.dpd.ibm.com 
k145n03.dpd.ibm.com 
k145n04.dpd.ibm.com 
k145n05.dpd.ibm.com 
k145n06.dpd.ibm.com 
k145n07.dpd.ibm.com 
k145n08.dpd.ibm.com 


>> Verifying the level of prerequisite software

It is necessary to verify you have the correct levels of the prerequisite software installed. If the correct level 
of prerequisite software is not installed, see the appropriate installation manual before proceeding with your 
GPFS installation: 

1. AIX 5L Version 5 Release 2 with the latest level of service available 

   # WCOLL=/tmp/gpfs.allnodes dsh "oslevel"

   Output similar to this should be displayed: 
   5.2.0.10

2. AIX 5L Version 5 Release 3 with the latest level of service available 

   # WCOLL=/tmp/gpfs.allnodes dsh "oslevel"

   Output similar to this should be displayed: 
   5.3.0.0
   If you are utilizing NFS V4, at a minimum your output should include: 
   5.3.0.10


>>Installation procedures

The installation procedures are generalized for all levels of GPFS. Ensure you substitute the correct 
numeric value for the modification (m) and fix (f) levels, where applicable. The modification and fix 
level are dependent upon the level of PTF support.

Follow these steps to install the GPFS software using the installp command:

1. Electronic license agreement 
2. Creating the GPFS directory 
3. Creating the GPFS installation table of contents file 
4. Installing the GPFS man pages 
5. Installing GPFS on your network 
6. Existing GPFS files 
7. Verifying the GPFS installation


--1. Electronic license agreement

The GPFS software license agreements is shipped and viewable electronically. The electronic license agreement 
must be accepted before software installation can continue.

For additional software package installations, the installation cannot occur unless the appropriate 
license agreements are accepted. When using the installp command, use the -Y flag to accept licenses 
and the -E flag to view license agreement files on the media.

--2. Creating the GPFS directory

To create the GPFS directory:

On any node create a temporary subdirectory where GPFS installation images will be extracted. For example: 

# mkdir  /tmp/gpfslpp

Copy the installation images from the CD-ROM to the new directory, by issuing: 

# bffcreate -qvX -t /tmp/gpfslpp -d /dev/cd0 all

This will place the following GPFS images in the image directory :

gpfs.base 
gpfs.docs 
gpfs.msg.en_US


--3. Creating the GPFS installation table of contents file

Make the new image directory the current directory: 

# cd /tmp/gpfslpp

Use the inutoc command to create a .toc file. The .toc file is used by the installp command. 

# inutoc .

--4. Installing the GPFS man pages

In order to use the GPFS man pages you must install the gpfs.docs image. The GPFS manual pages will be 
located at /usr/share/man/.

Installation consideration:
The gpfs.docs image need not be installed on all nodes if man pages are not desired or local file system space 
on the node is minimal.

--5. Installing GPFS on your network

Install GPFS according to these directions, where localNode is the name of the node on which you are running:

If you are installing on a shared file system network, ensure the directory where the GPFS images can be found 
is NFS exported to all of the nodes planned for your GPFS cluster (/tmp/gpfs.allnodes). 

Ensure an acceptable directory or mountpoint is available on each target node, such as /tmp/gpfslpp. 
If there is not, create one: 

# WCOLL=/tmp/gpfs.allnodes dsh "mkdir /tmp/gpfslpp"

If you are installing on a shared file system network, to place the GPFS images on each node in your network, 
issue: 

# WCOLL=/tmp/gpfs.allnodes dsh "mount localNode:/tmp/gpfslpp /tmp/gpfslpp"

Otherwise, issue: 

# WCOLL=/tmp/gpfs.allnodes dsh "rcp localNode:/tmp/gpfslpp/gpfs* /tmp/gpfslpp"
# WCOLL=/tmp/gpfs.allnodes dsh "rcp localNode:/tmp/gpfslpp/.toc /tmp/gpfslpp"

Install GPFS on each node: 

# WCOLL=/tmp/gpfs.allnodes dsh "installp -agXYd /tmp/gpfslpp gpfs" 

--6. Existing GPFS files

If you have previously installed GPFS on your system, during the install process you may see 
messages similar to:

Some configuration files could not be automatically merged into the
system during the installation.  The previous versions of these files
have been saved in a configuration directory as listed below.  Compare
the saved files and the newly installed files to determine if you need
to recover configuration data.  Consult product documentation to
determine how to merge the data.

Configuration files which were saved in /lpp/save.config:
  /var/mmfs/etc/gpfsready
  /var/mmfs/etc/gpfsrecover.src
  /var/mmfs/etc/mmfsdown.scr
  /var/mmfs/etc/mmfsup.scr

If you have made changes to any of these files, you will have to reconcile the differences with the 
new versions of the files in directory /var/mmfs/etc. This does not apply to file /var/mmfs/etc/mmfs.cfg 
which is automatically maintained by GPFS.

--7. Verifying the GPFS installation

Use the lslpp command to verify the installation of GPFS file sets on each node:

lslpp -l gpfs\* 

Output similar to the following should be returned:

  Fileset                      Level  State      Description         
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
gpfs.base              2.3.0.0  COMMITTED  GPFS File Manager
gpfs.docs.data         2.3.0.0  COMMITTED  GPFS Server Manpages
gpfs.msg.en_US         2.3.0.0  COMMITTED  GPFS Server Messages - U.S. English
Path: /etc/objrepos
gpfs.base              2.3.0.0  COMMITTED  GPFS File Manager


Example:

root@zd110l14:/root#lslpp -L "*gpfs*"
  Fileset                      Level  State  Type  Description (Uninstaller)
  ----------------------------------------------------------------------------
  gpfs.base                 3.1.0.11    C     F    GPFS File Manager
  gpfs.docs.data             3.1.0.4    C     F    GPFS Server Manpages and
                                                   Documentation
  gpfs.msg.en_US            3.1.0.10    C     F    GPFS Server Messages - U.S.
                                                   English


State codes:
 A -- Applied.
 B -- Broken.
 C -- Committed.
 E -- EFIX Locked.
 O -- Obsolete.  (partially migrated to newer version)
 ? -- Inconsistent State...Run lppchk -v.

Type codes:
 F -- Installp Fileset
 P -- Product
 C -- Component
 T -- Feature
 R -- RPM Package
 E -- Interim Fix


root@zd110l14:/root#lslpp -l gpfs\*
  Fileset                      Level  State      Description
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  gpfs.base                 3.1.0.11  COMMITTED  GPFS File Manager
  gpfs.msg.en_US            3.1.0.10  COMMITTED  GPFS Server Messages - U.S.
                                                 English

Path: /etc/objrepos
  gpfs.base                 3.1.0.11  COMMITTED  GPFS File Manager

Path: /usr/share/lib/objrepos
  gpfs.docs.data             3.1.0.4  COMMITTED  GPFS Server Manpages and
                                                 Documentation



75.4 GPFS error messages:
=========================


Note 1:
-------

The MMFS log
GPFS writes both operational messages and error data to the MMFS log file. The MMFS log can be found 
in the /var/adm/ras directory on each node. The MMFS log file is named mmfs.log.date.nodeName, where date 
is the time stamp when the instance of GPFS started on the node and nodeName is the name of the node. 
The latest mmfs log file can be found by using the symbolic file name /var/adm/ras/mmfs.log.latest. 
The MMFS log from the previous instance of GPFS can be found by using the symbolic file name 
/var/adm/ras/mmfs.log.previous. All other files have a timestamp and node name appended to the file name.

Example:

root@zd110l13:/var/adm/ras#cat mmfs.log.latest
Sun May 20 22:10:37 DFT 2007 runmmfs starting
Removing old /var/adm/ras/mmfs.log.* files:
Loading kernel extension from /usr/lpp/mmfs/bin . . .
GPFS: 6027-500 /usr/lpp/mmfs/bin/aix64/mmfs64 loaded and configured.
Sun May 20 22:10:39 2007: GPFS: 6027-310 mmfsd64 initializing. {Version: 3.1.0.11   Built: Apr  6 2007 09:38:56} ...
Sun May 20 22:10:44 2007: GPFS: 6027-1710 Connecting to 10.32.143.184 zd110l14.nl.eu.abnamro.com
Sun May 20 22:10:44 2007: GPFS: 6027-1711 Connected to 10.32.143.184 zd110l14.nl.eu.abnamro.com
Sun May 20 22:10:44 2007: GPFS: 6027-300 mmfsd ready
Sun May 20 22:10:44 DFT 2007: mmcommon mmfsup invoked
Sun May 20 22:10:44 DFT 2007: mounting /dev/gpfsfs0
Sun May 20 22:10:44 2007: Command: mount gpfsfs0 323816
Sun May 20 22:10:46 2007: Command: err 0: mount gpfsfs0 323816
Sun May 20 22:10:46 DFT 2007: finished mounting /dev/gpfsfs0



At GPFS startup, files that have not been accessed during the last ten days are deleted. 
If you want to save old files, copy them elsewhere.

This example shows normal operational messages that appear in the MMFS log file:

Tue Aug 31 16:02:43 edt 2004 runmmfs starting
Removing old /var/adm/ras/mmfs.log.* files:
mv: 0653-401 Cannot rename /var/adm/ras/mmfs.log.previous to /var/adm/ras/mmfs.log.previous.save:
             A file or directory in the path name does not exist.
Loading kernel extension from /usr/lpp/mmfs/bin . . .
/usr/lpp/mmfs/bin/vcmdummy64 loaded and configured
/usr/lpp/mmfs/bin/aix64/mmfs64 loaded and configured.
Tue Aug 31 16:02:44 2004: GPFS: 6027-310 mmfsd64 initializing. {Version: 3.7.0.0 
    Built: Aug 30 2004 17:10:20} ...
Tue Aug 31 16:02:54 2004: GPFS: 6027-1710 Connecting to 198.16.0.9 k154gn09
Tue Aug 31 16:02:55 2004: GPFS: 6027-1711 Connected to 198.16.0.9 k154gn09
Tue Aug 31 16:02:55 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.2 k154gn02
Tue Aug 31 16:02:55 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.18 k155gn02
Tue Aug 31 16:02:55 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.49 kolt1g_r1b32
Tue Aug 31 16:02:55 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.17 k155gn01
Tue Aug 31 16:02:55 2004: GPFS: 6027-1710 Connecting to 198.16.0.10 k154gn10
Tue Aug 31 16:02:55 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.35
Tue Aug 31 16:02:55 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.5
Tue Aug 31 16:02:57 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.23
Tue Aug 31 16:02:57 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.6
Tue Aug 31 16:02:57 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.21
Tue Aug 31 16:03:00 edt 2004 /var/mmfs/etc/gpfsready invoked
Tue Aug 31 16:03:00 2004: GPFS: 6027-300 mmfsd ready
Tue Aug 31 16:03:00 2004: GPFS: 6027-1709 Accepted and connected to 198.16.0.10 k154gn10
Tue Aug 31 16:03:00 edt 2004: mounting /dev/fs3
Tue Aug 31 16:03:00 2004: Command: mount fs3 594128 

Depending on the size and complexity of your system configuration, the amount of time to start GPFS varies. 
Taking your system configuration into consideration, after a reasonable amount of time if you cannot access 
the file system look in the log file for error messages.

The GPFS log is a repository of error conditions that have been detected on each node, as well as 
operational events such as file system mounts. The GPFS log is the first place to look when attempting 
to debug abnormal events. Since GPFS is a cluster file system, events that occur on one node may affect 
system behavior on other nodes, and all GPFS logs may have relevant data.



Note 2:
-------

GPFS for AIX 5L V2.2 in an HACMP Cluster
Problem Determination Guide

The operating system error log facility
GPFS records file system or disk failures using the error logging facility provided by the 
operating system: syslog facility on Linux and errpt facility on AIX. For the remainder of this book, 
the error logging facility will be referred to as 'the error log'.

These failures can be viewed by issuing this command: 

errpt -a
The error log contains information about several classes of events or errors. These classes are:

MMFS_ABNORMAL_SHUTDOWN 
MMFS_DISKFAIL 
MMFS_ENVIRON 
MMFS_FSSTRUCT 
MMFS_GENERIC 
MMFS_LONGDISKIO 
MMFS_PHOENIX 
MMFS_QUOTA 
MMFS_SYSTEM_UNMOUNT 
MMFS_SYSTEM_WARNING
MMFS_ABNORMAL_SHUTDOWN

The MMFS_ABNORMAL_SHUTDOWN error log entry means that GPFS has determined that it must shutdown all operations 
on this node because of a problem. This is most likely caused by some interaction with the Group Services component. 
Group services failures may result in abnormal shutdown, as well as possible loss of quorum. 
Insufficient memory on the node to handle critical recovery situations can also cause this error. 
In general there will be other error log entries from GPFS or some other component associated with this error log entry.

MMFS_DISKFAIL
The MMFS_DISKFAIL error log entry indicates that GPFS has detected the failure of a disk and forced the disk 
to the stopped state. Unable to access disks describes the actions taken in response to this error. 
This is ordinarily not a GPFS error but a failure in the disk subsystem or the path to the disk subsystem. 
the book AIX 5L System Management Guide: Operating System and Devices and search on logical volume. 
Follow the problem determination and repair actions specified.

MMFS_ENVIRON
MMFS_ENVIRON error log entry records are associated with other records of the MMFS_GENERIC or MMFS_SYSTEM_UNMOUNT types. 
They indicate that the root cause of the error is external to GPFS and usually in the network that supports GPFS. 
Check the network and its physical connections. The data portion of this record supplies the return code provided 
by the communications code.

MMFS_FSSTRUCT
The MMFS_FSSTRUCT error log entry indicates that GPFS has detected a problem with the on-disk structure of 
the file system. The severity of these errors depends on the exact nature of the inconsistent data structure. 
If it is limited to a single file, EIO errors will be reported to the application and operation will continue. 
If the inconsistency affects vital metadata structures, operation will cease on this file system. 
These errors are often associated with an MMFS_SYSTEM_UNMOUNT error log entry and will probably occur on all nodes. 
If the error occurs on all nodes, some critical piece of the file system is inconsistent. This may occur as a 
result of a GPFS error or an error in the disk system. Issuing the mmfsck command may repair the error:

Issue the mmfsck -n command to collect data. 
Issue the mmfsck -y command off-line to repair the file system.
If the file system is not repaired after issuing the mmfsck command, contact the IBM Support Center.

MMFS_GENERIC
The MMFS_GENERIC error log entry means that GPFS self diagnostics have detected an internal error, or that 
additional information is being provided with an MMFS_SYSTEM_UNMOUNT report. If the record is associated with an 
MMFS_SYSTEM_UNMOUNT report, the event code fields in the records will be the same. The error code and return code 
fields may describe the error. See Messages for a listing of codes generated by GPFS.

If the error is generated by the self diagnostic routines, service personnel should interpret the return and error 
code fields since the use of these fields varies by the specific error. Errors caused by the self checking logic 
will result in the shutdown of GPFS on this node.

MMFS_GENERIC errors may result from an inability to reach a critical disk resource. These errors may look different 
depending on the specific disk resource that has become unavailable, like logs and allocation maps. 
This type of error will usually be associated with other error indications. Other errors generated by disk subsystems, 
high availability components, and communications components at the same time as, or immediately preceding, 
the GPFS error should be pursued first because they may be the cause of these errors. MMFS_GENERIC error indications 
without an associated error of those types represent a GPFS problem that requires the IBM Support Center. 
See Information to collect before contacting the IBM Support Center.

MMFS_LONGDISKIO
The MMFS_LONGDISKIO error log entry indicates that GPFS is experiencing very long response time for disk requests. 
This is a warning message and may indicate that your disk system is overloaded or that a failing disk is requiring 
many I/O retries. Follow your operating system's instructions for monitoring the performance of your I/O subsystem 
on this node. The data portion of this error record specifies the disk involved. 
There may be related error log entries from the disk subsystems that will pinpoint the actual cause of the problem. 
See the book AIX 5L Performance Management Guide.

MMFS_PHOENIX
MMFS_PHOENIX error log entries reflect a failure in GPFS interaction with Group Services. Go to the book 
Reliable Scalable Cluster Technology: Administration Guide. Search for diagnosing group services problems. 
Follow the problem determination and repair action specified. These errors are usually not GPFS problems, 
although they will disrupt GPFS operation.

MMFS_QUOTA
The MMFS_QUOTA error log entry is used when GPFS detects a problem in the handling of quota information. 
This entry is created when the quota manager has a problem reading or writing the quota file. If the quota manager 
cannot read all entries in the quota file when mounting a file system with quotas enabled, the quota manager 
shuts down, but file system manager initialization continues. Client mounts will not succeed and will return 
an appropriate error message.

In order for GPFS quota accounting to work properly, the system administrator should ensure that the user and group 
information is consistent throughout the nodeset, such as the /etc/passwd and /etc/group files are identical across 
the nodeset. Otherwise, unpredictable and erroneous quota accounting will occur.

It may be necessary to run an off-line quota check (mmcheckquota) to repair or recreate the quota file. 
If the quota file is corrupted, mmcheckquota will not restore it. The file must be restored from the backup copy. 
If there is no backup copy, an empty file may be set as the new quota file. This is equivalent to recreating 
the quota file. To set an empty file or use the backup file, issue the mmcheckquota command with the 
appropriate operand:

-u UserQuotaFilename for the user quota file 
-g GroupQuotaFilename for the group quota file
Reissue the mmcheckquota command to check the file system inode and space usage.

MMFS_SYSTEM_UNMOUNT
The MMFS_SYSTEM_UNMOUNT error log entry means that GPFS has discovered a condition which may result in 
data corruption if operation with this file system continues from this node. GPFS has marked the file system 
as disconnected and applications accessing files within the file system will receive ESTALE errors. 
This may be the result of:

The loss of a path to all disks containing a critical data structure. 
An internal processing error within the file system.
See File system forced unmount. Follow the problem determination and repair actions specified.

MMFS_SYSTEM_WARNING
The MMFS_SYSTEM_WARNING error log entry means that GPFS has detected a system level value approaching its 
maximum limit. This may occur as a result of the number of inodes (files) reaching its limit. Issue the mmchfs 
command to increase the number of inodes for the file system so there is at least a minimum of 5% free.

Error log entry example
This is an example of an error log entry which indicates loss of the Group Services subsystem:

LABEL:          MMFS_ABNORMAL_SHUTD
IDENTIFIER:     1FB9260D

Date/Time:       Thu May 16 14:39:07 EDT 
Sequence Number: 759
Machine Id:      000196364C00
Node Id:         k145n01
Class:           S
Type:            PERM
Resource Name:   mmfs            

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
COMPONENT ID
5765B9500 
PROGRAM
mmfsd64 
DETECTING MODULE
/fs/mmfs/ts/phoenix/PhoenixInt.C
MAINTENANCE LEVEL
2.2.0.0 
LINE
        4409
RETURN CODE
         668
REASON CODE
0000 0000 
EVENT CODE
           0



Note 3:
-------

IY35279: MMFSD64 CORE DUMPS IN CLEANOLDSHAREDMEMORY__FV() 

A fix is available 
Download fix packs
 


APAR status
Closed as program error.

Error description: 
When starting gpfs, mmfsd64 on the 64-bit kernel may segfault
with a stack trace similar to:

cxiMapShSeg__Fv() at 0x1003579d4
CleanOldSharedMemory__Fv() at 0x1000025dc
mainBody__FiPPc(??, ??) at 0x100334c20
main(??, ??) at 0x10000257c

Local fix 

Problem summary 
When starting gpfs, mmfsd64 on the 64-bit kernel may segfault
with a stack trace similar to:

cxiMapShSeg__Fv() at 0x1003579d4
CleanOldSharedMemory__Fv() at 0x1000025dc
mainBody__FiPPc(??, ??) at 0x100334c20
main(??, ??) at 0x10000257c
SYMPTOM STRING

Problem conclusion 
Make sure to update the current cpu's ppda rather than another
cpu's ppda
Temporary fix 
Comments 
APAR information 
APAR number IY35279 
Reported component name AIX 5L POWER 
Reported component ID 5765E6100 
Reported release 510 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2002-10-02 
Closed date 2002-10-02 
Last modified date 2002-11-07 
 

Note 4:
-------

IY56448: WHEN CLLSIF OUTPUT IS NOT CORRECT, MMCOMMON DOES NOT HANDLE 

A fix is available 
Obtain fix for this APAR


APAR status
Closed as program error.

Error description 
from GPFS log:
sort: 0653-655 Cannot open /var/mmfs/tmp/cllsifOutput.mmcommon.2
82794

mmcommon: 6027-1271 Unexpected error from getNodeGODMdata: sort
/var/mmfs/tmp/cllsifOutput.mmcommon.282794. Return code: 2

Could not run command /usr/lpp/mmfs/bin/mmcommon getNodeDataForD
aemon hacmp 2>/var/mmfs/tmp/mmcommon..6Qeya
GPFS: 6027-311 mmfsd64 is shutting down.
Reason for shutdown: Could not initialize cluster config

Local fix 
correct cluster infomation so that cllsif is correct.

Problem summary 
WHEN CLLSIF OUTPUT IS NOT CORRECT, MMCOMMON DOES NOT HANDLE
Problem conclusion 
add checks for invalid data from HACMP, RPD, or SDR when
getNodeData is called
Temporary fix 
Comments 
APAR information 
APAR number IY56448 
Reported component name GPFS FOR AIX 
Reported component ID 5765F6400 
Reported release 220 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2004-05-03 
Closed date 2004-05-03 
Last modified date 2004-06-24 
 


 
Note 5: Troubleshooting: Some possible GPFS problems
----------------------------------------------------

http://book.opensourceproject.org.cn/enterprise/cluster/ibmcluster/opensource/7819/ddu0070.html


8.5 Troubleshooting: Some possible GPFS problems
Troubleshooting of a GPFS file system can be complex due its distributed nature. In this section, 
we describe the most common problems you may find when running GPFS and possible solutions. 
For further information on trouble shooting, refer to IBM General Parallel File System for Linux: 
Problem Determination Guide, GA22-7842.

8.5.1 Authorization problems

ssh and scp (or rsh and rcp) are used by GPFS administration commands to perform operations on other nodes. 
In order for these commands to be run, the sshd daemon must be running and configured to accept the connections 
from the other root users on the other nodes.

The first thing to check is the connection authorization from one node to other nodes and for extraneous messages 
in the command output. You can find information on OpenSSH customization in Appendix B, "Common facilities" 
on page 275. Check that all nodes can connect to all others without any password prompt.

You can also check if your GPFS cluster has been configured correctly to use the specified remote shell 
and remote copy commands by issuing the mmlscluster command, as in Example 8-17. Verify the contents 
of the remote shell command and remote file copy command fields.

Example 8-17: mmlscluster command 
 

[root@storage001 root]# mmlscluster

GPFS cluster information
========================
  Cluster id:  gpfs1035415317
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Primary network:           myrinet
  Secondary network:         ether

GPFS cluster data repository servers:
-------------------------------------
  Primary server:    storage001-myri0.cluster.com
  Secondary server:  (none)

Nodes in nodeset 1:
-------------------
   1  storage001-myri0 10.2.1.141     storage001-myri0.cluster.com  10.0.3.141
   2  node001-myri0  10.2.1.1         node001-myri0.cluster.com    10.0.3.1
   3  node002-myri0  10.2.1.2         node002-myri0.cluster.com    10.0.3.2
   4  node003-myri0  10.2.1.3         node003-myri0.cluster.com    10.0.3.3
   5  node004-myri0  10.2.1.4         node004-myri0.cluster.com    10.0.3.4
[root@storage001 root]#

 

8.5.2 Connectivity problems
Another reason why SSH may fail is that connectivity to a node has been lost. Error messages from mmdsh may indicate 
such a condition. For example:

mmdsh: node001 rsh process had return code 1.

There are many things that could cause this problem: cable failures, network cardproblems, switch failures, 
and so on. You can start by checking if the affected node is powered on. If the node is up, check the node connectivity 
and verify the sshd daemon is running on the remote node. If not, restart the daemon by issuing:

# service sshd start

Sometimes you may see a mmdsh error message due to the lack of an mmfsd process on some of the nodes, 
as in Example 8-18. Make sure the mmfsd is running on all nodes, using lssrc -a, as in Example 8-19.

Example 8-18: mmcrfs command 
 

[root@storage001 root]# mmcrfs /gpfs gpfs0 -F DescFile -v yes -r 1 -R 2
GPFS: 6027-624 No disks
GPFS: 6027-441 Unable to open disk 'gpfs2nsd'.
No such device
GPFS: 6027-538 Error accessing disks.
mmdsh: node001 rsh process had return code 19.
mmcommon: Unexpected error from runRemoteCommand_Cluster: mmdsh. Return code: 1
mmcrfs: tscrfs failed. Cannot create gpfs0
[root@storage001 root]#

 
 

Example 8-19: Verifying mmfsd is running 
 

# lssrc -a
Subsystem        Group        PID     Status
 cthats          cthats       843     active
 cthags          cthags       943     active
 ctrmc           rsct         1011    active
 ctcas           rsct         1018    active
 IBM.HostRM      rsct_rm      1069    active
 IBM.FSRM        rsct_rm      1077    active
 IBM.CSMAgentRM  rsct_rm      1109    active
 IBM.ERRM        rsct_rm      1110    active
 IBM.AuditRM     rsct_rm      1148    active
 mmfs            aixmm        1452    active
 IBM.SensorRM    rsct_rm              inoperative
 IBM.ConfigRM    rsct_rm              inoperative

 
 

8.5.3 NSD disk problems
In this section, we describe the two most common problems related to NSD and disks. These are not the only problems
 you might face, but they are the most common.

The disk has disappeared from the system
Sometimes you may face a disk failure and the disk appears to have disappeared from the system. 
This can happen if somebody simply removes an in-use hot-swap disk from the server or in the case of 
a particularly nasty disk failure.

In this situation, GPFS loses connectivity to the disk and, depending on how the file system was created, 
you may or may not lose access to the file system.

You can verify whether the disk is reachable by the operating system using mmlsnsd -m, as shown in Example 8-20. 
In this situation, the GPFS disk gpfs1nsd is unreachable. This could mean that the disk has been turned off, 
has been removed from its bay, or has failed for some other reason.

Example 8-20: mmlsnsd command 
 

[root@storage001 root]# mmlsnsd -m

 NSD name     PVID               Device       Node name    Remarks
-----------------------------------------------------------------------
 gpfs1nsd     0A0000013BF15AFD   -            node-a       (error) primary node
 gpfs2nsd     0A0000023BF15B0A   /dev/sdb1    node-b       primary node
 gpfs3nsd     0A0000033BF15B26   /dev/sdb1    node-c       primary node
 gpfs4nsd     0A0000013BF2F4EA   /dev/sda9    node-a       primary node
 gpfs5nsd     0A0000023BF2F4FF   /dev/sda3    node-b       primary node
 gpfs6nsd     0A0000033BF2F6E1   /dev/sda6    node-c       primary node
[root@storage001 root]#


 
 

To correct this problem, you must first verify whether the disk is correctly attached and that it is not dead. After that, you can verify whether the driver for the disk is operational, and reload the driver using the rmmod and insmod commands. If the disk had only been removed from its bay or turned off, reloading the driver will activate the disks again, and then you can enable them again following the steps in "The disk is down and will not come up" on page 241. If the disk had any kind of hardware problem that will require replacing the disk, refer to 8.1.3, "Replacing a failing disk in an existing GPFS file system" on page 230.

The disk is down and will not come up
Occasionally, disk problems will occur on a node and, even after the node has been rebooted, the disk connected to it does not come up again. In this situation, you will have to manually set the disk up again and then run some recovery commands in order to restore access to your file system.

For our example, we see that the gpfs0 file system has lost two of its three disks: gpfs1nsd and gpfs3nsd. In this situation, we have to recover the two disks, run a file system check, and then re-stripe the file system.

Because the file system check and re-stripe require access to the file system, which is down, you must first re-activate the disks. Once the file system is up again, recovery may be undertaken. In Example 8-21, we verify which disks are down using the mmlsdisk command, re-activate the disks by using the mmchdisk command, and then verify the disks again with mmlsdisk.

Example 8-21: Reactivating disks 
 

[root@storage001 root]# mmlsdisk gpfs0
disk         driver   sector failure holds    holds
name         type       size   group metadata data  status        availability
------------ -------- ------ ------- -------- ----- ------------- ------------
gpfs1nsd     nsd         512       1 yes      yes   ready         down
gpfs2nsd     nsd         512       2 yes      yes   ready         up
gpfs3nsd     nsd         512       3 yes      yes   ready         down

[root@storage001 root]# mmchdisk gpfs0 start -d "gpfs1nsd;gpfs3nsd"
Scanning file system metadata, phase 1 ...
Scan completed successfully.
Scanning file system metadata, phase 2 ...
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning user file metadata ...
  77 % complete on Tue Nov 27 00:13:38 2001
 100 % complete on Tue Nov 27 00:13:39 2001
Scan completed successfully.

[root@storage001 root]# mmlsdisk gpfs0
disk         driver   sector failure holds    holds
name         type       size   group metadata data  status        availability
------------ -------- ------ ------- -------- ----- ------------- ------------
gpfs1nsd     nsd         512       1 yes      yes   ready         up
gpfs2nsd     nsd         512       2 yes      yes   ready         up
gpfs3nsd     nsd         512       3 yes      yes   ready         up
[root@storage001 root]#


 
 

Now that we have the three disks up, it is time to verify the file system consistency. Additionally, because some operations could have occurred on the file system when only one of the disks was down, we must re-balance it. We show the output of the mmfsck and mmrestripefs commands in Example 8-22. The mmfsck command has some important options you may need to use, like -r, for read-only access, and -y, to automatically correct problems found in the file system.

Example 8-22: mmfsck and mmrestripefs commands 
 

[root@storage001 root]# mmfsck gpfs0
Checking "gpfs0"
Checking inodes
Checking inode map file
Checking directories and files
Checking log files
Checking extended attributes file
Checking file reference counts
Checking file system replication status

       33792 inodes
          14   allocated
           0   repairable
           0   repaired
           0   damaged
           0   deallocated
           0   orphaned
           0   attached

      384036 subblocks
        4045   allocated
           0   unreferenced
           0   deletable
           0   deallocated

         231 addresses
           0   suspended

File system is clean.
# mmrestripefs gpfs0 -r
Scanning file system metadata, phase 1 ...
Scan completed successfully.
Scanning file system metadata, phase 2 ...
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning user file metadata ...
  72 % complete on Tue Nov 27 00:19:24 2001
 100 % complete on Tue Nov 27 00:19:25 2001
Scan completed successfully.

[root@storage001 root]# mmlsdisk gpfs0 -e
All disks up and ready

[root@storage001 root]# mmlsdisk gpfs0
disk         driver   sector failure holds    holds
name         type       size   group metadata data  status        availability
------------ -------- ------ ------- -------- ----- ------------- ------------
gpfs1nsd     nsd         512       1 yes      yes   ready         up
gpfs2nsd     nsd         512       2 yes      yes   ready         up
gpfs3nsd     nsd         512       3 yes      yes   ready         up
[root@storage001 root]#





==========
76. HACMP:
==========


76.1: Overview Cluster solutions and terminology on AIX:
========================================================


-- CSM: (Management of Cluster)
-- ----------------------------

What is Cluster Systems Management (CSM)?
Cluster Systems Management (CSM) software provides a distributed system management solution that allows 
a system administrator to set up and maintain a cluster of nodes that run the AIXr or Linuxr operating system. 
CSM simplifies cluster administration tasks by providing management from a single point-of-control. 
CSM can be used to manage homogeneous clusters of servers that run Linux, homogeneous servers that run AIX, 
or mixed clusters which include both AIX and Linux.

You can use the following hardware for your CSM management server, install server, and nodes:

IBM System x: System x, IBM xSeriesr, IBM BladeCenterr*, and IBM eServer 325, |326, and 326m hardware |
IBM System p: System p, IBM pSeries, IBM BladeCenter*, System p5, IBM eServer OpenPower
*The BladeCenter JS models use the POWER architecture common to all System p servers.

The management server is the machine that is designated to operate, monitor, and maintain the rest of the cluster. 
Install servers are the machines that are used to install the nodes. By default, the management server 
is the install server. Managed nodes are instances of the operating system that you can manage in the cluster. 
Managed devices are the non-node devices for which CSM supports power control and remote console access. 
For hardware and software support information, see Planning for CSM software.

Communicating with CSM:
CSM offers you several options for issuing commands to the cluster:

-Command line interface 
-Distributed Command Execution Manager (DCEM) 
-IBMr Web-based System Manager 
-SMIT


-- GPFS:
-- -----

Introducing General Parallel File System

GPFS is a high-performance cluster file system for AIX 5L, Linux and mixed clusters that provides users 
with shared access to files spanning multiple disk drives. By dividing individual files into blocks 
and reading/writing these blocks in parallel across multiple disks, GPFS provides very high bandwidth; 
in fact, GPFS has won awards and set world records for performance. In addition, GPFS's multiple data paths 
can also eliminate single points of failure, making GPFS extremely reliable. GPFS currently powers many of 
the world's largest scientific supercomputers and is increasingly used in commercial applications requiring 
high-speed access to large volumes of data such as digital media, engineering design, business intelligence, 
financial analysis and geographic information systems. GPFS is based on a shared disk model, providing lower 
overhead access to disks not directly attached to the application nodes, and using a distributed protocol 
to provide data coherence for access from any node. 

IBM's General Parallel File System (GPFS) provides file system services to parallel and serial applications. 
GPFS allows parallel applications simultaneous access to the same files, or different files, from any node 
which has the GPFS file system mounted while managing a high level of control over all file system operations. 
GPFS is particularly appropriate in an environment where the aggregate peak need for data bandwidth exceeds 
the capability of a distributed file system server.

GPFS allows users shared file access within a single GPFS cluster and across multiple GPFS clusters. 
A GPFS cluster consists of: 

AIX 5LT nodes, Linuxr nodes, or a combination thereof (see GPFS cluster configurations). A node may be: 
An individual operating system image on a single computer within a cluster. 
A system partition containing an operating system. Some System p5T and pSeriesr machines allow multiple 
system partitions, each of which is considered to be a node within the GPFS cluster.

Network shared disks (NSDs) created and maintained by the NSD component of GPFS 
All disks utilized by GPFS must first be given a globally accessible NSD name. 
The GPFS NSD component provides a method for cluster-wide disk naming and access. 

On Linux machines running GPFS, you may give an NSD name to: 
 Physical disks 
 Logical partitions of a disk 
 Representations of physical disks (such as LUNs)

On AIXr machines running GPFS, you may give an NSD name to: 
 Physical disks 
 Virtual shared disks 
 Representations of physical disks (such as LUNs)

A shared network for GPFS communications allowing a single network view of the configuration. 
A single network, a LAN or a switch, is used for GPFS communication, including the NSD communication.


-- PSSP: (predecessor to Cluster Systems Management (CSM))
-- -------------------------------------------------------

Parallel System Support Programs (PSSP)

The PSSP 3.5 software is a comprehensive suite of applications to manage a system as a full-function 
parallel processing system. It provides administrative tasks that help increase productivity by enabling 
administrators to view, monitor, and operate the system from the control workstation, a single point of control. 
The PSSP software is discussed in terms of functional entities called components of PSSP. Most functions 
are base components of PSSP while others are optional; they come with the PSSP software, but you can choose 
whether to install and use them.

With PSSP 3.5, AIX 5L 5.1 or 5.2 must be on the control workstation. Note that your control workstation 
must be at the highest AIX level in the system. If you have any HMC-controlled servers in your system, 
AIX 5L 5.1 or 5.2 must be on each HMC-controlled server node. Other nodes can have AIX 5L 5.1 and PSSP 3.4, 
or AIX 4.3.3 with PSSP 3.4 or PSSP 3.2. However, you can only run with the 64-bit AIX kernel and switch 
between 64-bit and 32-bit AIX kernel mode on nodes with PSSP 3.5.

Parallel System Support Programs (PSSP) for AIXr
PSSP is the systems management predecessor to Cluster Systems Management (CSM) and does not support 
IBM System p servers or AIX 5L V5.3. New cluster deployments should use CSM and existing PSSP customers 
with software maintenance will be transitioned to CSM at no charge. 


-- Tivoli Workload Scheduler LoadLeveler
-- -------------------------------------

Used for dynamic workload scheduling, Tivoli Workload Scheduler LoadLeveler is a distributed network-wide 
job management facility designed to dynamically schedule work such as maximize resource utilization 
and minimize job completion time. Jobs are scheduled based on job priority, job requirements, 
resource availability and user-defined rules to match processing needs with resources. LoadLeveler provides 
consolidated accounting and reporting and supports IBM servers including IBM System p and System x environments. 


-- Engineering Scientific Subroutine Library (ESSL) and Parallel ESSL 
-- ------------------------------------------------------------------

ESSL is a collection of state-of-the-art mathematical subroutines specifically tuned to IBM hardware 
and offering significant performance improvement to any math-intensive scientific or engineering applications. 
Parallel ESSL extends the function of ESSL to support parallel applications that use the Message Passing 
Interface included in IBM Parallel Environment. ESSL and Parallel ESSL support C, C++ and Fortran applications. 


-- Parallel Environment (PE)
-- -------------------------

Parallel Environment for AIX 5L is a comprehensive development and execution environment for parallel 
applications (distributed-memory, message-passing applications running across multiple nodes). 
It is designed to help organizations develop, test, debug, tune and run high-performance parallel 
applications in C, C++ and Fortran on IBM System p and System x clusters. Parallel Environment runs 
on AIX 5L V5.2 and V5.3.  

-- HACMP:
-- ------

HACMP is designed to provide high availability for critical business applications and data through 
system redundancy and failover. HACMP constantly monitors the status of servers, networks and applications 
to detect failures or performance degradation and can respond by automatically restarting a troubled 
application on designated backup hardware, taking care of all network or storage connections in the process. 
With HACMP, clients can scale up to 32 nodes and mix and match system sizes and performance levels as well 
as network adapters and disk subsystems to satisfy specific application, network and disk performance needs. 

HACMP/XD extends HACMP's high availability capabilities across geographic sites with remote data 
mirroring (replication) and failover using this mirrored data; this combination can maintain application 
and data availability even if an entire site is disabled by a disaster. HACMP/XD provides IP-based data 
mirroring and also supports hardware-based mirroring products such as 
IBM Enterprise Storage Systems Metro-Mirror (formerly PPRC). 

-- RSCT:
-- -----

Reliable Scalable Cluster Technology. Since HACMP 5.1, HACMP relies on RSCT. So, in modern HACMP, RSCT is
a neccessary component or subsystem. For example, HACMP uses the heartbeat facility of RSCT.
RSCT is a standard component in AIX5L.

Reliable Scalable Cluster Technology, or RSCT, is a set of software components that together provide a 
comprehensive clustering environment for AIXr and Linuxr. RSCT is the infrastructure used by a variety 
of IBMr products to provide clusters with improved system availability, scalability, and ease of use. 
RSCT includes the following components: 

- Resource Monitoring and Control (RMC) subsystem. This is the scalable, reliable backbone of RSCT. 
  It runs on a single machine or on each node (operating system image) of a cluster and provides a common 
  abstraction for the resources of the individual system or the cluster of nodes. You can use RMC for 
  single system monitoring or for monitoring nodes in a cluster. In a cluster, however, RMC provides global 
  access to subsystems and resources throughout the cluster, thus providing a single monitoring and management 
  infrastructure for clusters. 
- RSCT core resource managers. A resource manager is a software layer between a resource 
  (a hardware or software entity that provides services to some other component) and RMC. A resource manager 
  maps programmatic abstractions in RMC into the actual calls and commands of a resource. 
- RSCT cluster security services, which provide the security infrastructure that enables RSCT components 
  to authenticate the identity of other parties. 
- Topology Services subsystem, which, on some cluster configurations, provides node and network failure detection. 
  Group Services subsystem, which, on some cluster configurations, provides cross-node/process coordination.


RSCT is the "glue" that holds the nodes together in a cluster. It is a group of low-level components 
that allow clustering technologies, such as High-Availability Cluster Multiprocessing (HACMP) and 
General Parallel File System (GPFS), to be built easily. 

RSCT technology was originally developed by IBM for RS/6000 SP systems (Scalable POWERparallel). 
As time passed, it became apparent that these capabilities could be used on a growing number of general 
computing applications, so they were moved into components closer to the operating system (OS), such as 
Resource Monitoring and Control (RMC), Group Services, and Topology Services. 

The components were originally packaged as part of the RS/6000 SP Parallel System Support Program (PSSP) 
and called RSCT. RSCT is now packaged as part of AIX 5L Version 5.1 and later. 

RSCT is also included in Cluster Systems Management (CSM) for Linux. Now, Linux nodes (with appropriate 
hardware and software levels) running CSM 1.3 for Linux can be part of the management domain cluster 1600, 
and RSCT (with RMC) is the common interface for clustering. For more information about this heterogeneous 
cluster, see An Introduction to CSM 1.3 for AIX 5L, SG24-6859. 

RSCT includes these components: 

-Resource Monitoring and Control (RMC) 
-Resource managers (RM) 
-Cluster Security Services (CtSec) 
-Group Services 
-Topology Services

Group Services and Topology Services

Group Services and Topology Services, although included in RSCT, are not used in the management 
domain structure of CSM. These two components are used in peer domain clusters for applications, 
such as High-Availability Cluster Multiprocessing (HACMP) and General Parallel File System (GPFS), 
providing node and process coordination and node and network failure detection. Therefore, for these 
applications, a .rhosts file may be needed (for example, for HACMP configuration synchronization). 

These services are often referred to as hats and hags: 
high availability Group Services daemon (hagsd) 
and high availability Topology Services daemon (hatsd). 

- What are management domains and peer domains?
In order to understand how the various RSCT components are used in a cluster, you should be aware 
that nodes of a cluster can be configured for either manageability or high availability.

>> You configure a set of nodes for manageability using the Clusters Systems Management (CSM) product as 
described in IBMr Cluster Systems Management: Administration Guide. The set of nodes configured for manageability 
is called a management domain of your cluster.

>>You configure a set of nodes for high availability using RSCT's Configuration resource manager. 
The set of nodes configured for high availability is called an RSCT peer domain of your cluster. 
For more information, refer to Creating and administering an RSCT peer domain.



-- HPSS:	 
-- -----

High Performance Storage System
What is High Performance Storage System? HPSS is software that manages petabytes of data on disk and robotic tape 
libraries. HPSS provides highly flexible and scalable hierarchical storage management that keeps recently 
used data on disk and less recently used data on tape. HPSS uses cluster, LAN and/or SAN technology to aggregate 
the capacity and performance of many computers, disks, and tape drives into a single virtual file system 
of exceptional size and versatility. This approach enables HPSS to easily meet otherwise unachievable demands 
of total storage capacity, file sizes, data rates, and number of objects stored. HPSS provides a variety of user 
and filesystem interfaces ranging from the ubiquitous vfs, ftp, samba and nfs to higher performance pftp, 
client API, local file mover and third party SAN (SAN3P). HPSS also provides hierarchical storage management 
(HSM) services for IBM General Parallel File System (GPFS). 


-- C-SPOC:
-- -------

The Cluster Single Point of Control (C-SPOC) utility lets system administrators perform administrative tasks 
on all cluster nodes from any node in the cluster.


-- HA Network Server:
-- ------------------

The High Availability Network Server (HA Network Server) is a complete solution that quickly and automatically 
configures certain network services in a high availability environment. HA Network Server solution is designed 
to enhance the HACMP product by offering a set of scripts that set up highly available network services 
such as Domain Name System (DNS), Dynamic Host Configuration Protocol (DHCP), Network File System (NFS), 
and printing services. This is possible by using the framework offered in HACMP to monitor and act upon 
potential problems with network services in order to extend high availability beyond just hardware recovery. 
Making these services highly available means there is no down time in services that are critical to running 
a business. This solution is now available by download.

HA Network Server components
The HA Network Server solution is comprised of three network service plug-ins providing for DNS, DHCP, 
and print services (HACMP already contains integrated support for high availability NFS (HANFS)). 
Each of these plug-ins is available on this Web site as a downloadable tar file. These example scripts start 
and stop the network service processes, verify that configuration files are present and stored in a 
shared filesystem, and assist the HACMP monitoring functions that check on the health of the network service process. 
These scripts are provided as examples that may be customized for your environment.

A setup program is also provided with each of these plug-ins to assist with the setup after downloading the plug-in. 
Since several prerequisites must be completed by the user before setup begins, please read the README file that is 
included within the plug-in tar file. After download and tar file expansion, the README will be located in 
/usr/es/sbin/cluster/plug-ins/<network_service>, where <network_service> will be dns, dhcp, or printserver 
depending on which plug-in was downloaded.


  

76.2 Overview architecture:
===========================

HACMP is an "High Availability" solution, and it's an IBM cluster technology, based on RSCT and additional daemons
and implementations, like, for example, the concept of a "Resource Group".

In an HACMP Cluster, most relevant hardware adapters in a system are doubled. For example, multiple
network adapters and multiple FC cards, are typical in a Cluster node, to avoid Single Points Of Failure (SPOFs).
 
Two main implementations are possible (we limit ourselves here to a 2-node Cluster):

- One node runs and owns an application (asssociated with a Resource Group), and in case of whatever
  failure, another node can take "ownership" of the Resource Group and starts running the application.
  Implementations is partly done with the aid of start- and stop scripts belonging to this application.

- But if you have a suitable application, it's also posible that both nodes runs the same application at the same time
  and thus parallel processing takes place.

So, many HACMP implementations, acts like an "active - passive" cluster, in which one node runs the app, and the
other node takes the role of "failover" node, Which is not to say that the failover node can't actively run other 
applications as well.
But do not forget, that when the right type of applications are used, real parallel processing
could be implemented.



         ------------------------------------------ public network
             | |                             | |
             | |                             | |
        ------------                    -------------
        |cluster   |                    |cluster    |
        |system    |Ethernet            |system     |
        |pSeries   |--------------------|pSeries    |
        |          |          heartbeat |           |
        |          |Or                  |           |
        |          |Serial Link         |           |
        |          |--------------------|           |
        |FC  FC    |                    |  FC    FC |
        ------------                    -------------
          |  |                             |    |
          |  |   ---------------------------    |
          |  |   |                              |
          |  ----|-------------------------     |
          |      |                        |     |
          --------  Resource Group:       -------- Resource Group:
          |hdisk1|  -Application_01       |hdisk1| -Application_02
          --------  -Volume Group(s)      -------- -Volume Group(s)
          --------  -File System(s)       -------- -File System(s)
          |hdisk2|                        |hdisk2|
          --------                        --------
          --------                        --------
          |hdisk3|                        |hdisk3|
          --------                        --------

A "Resource group" is a group of associated "resources", known under one name. 
It can consist of an Application, Volume Group(s), File System(s) and other resources.
You can define a Resource Group from smitty: smitty hacmp

Resource Groups can be available from a single node or, in the case of concurrent applications,
available simultaneously from multiple nodes.

The components in a Resource Group move together from one node to another node,
in the case of a node failure.

Fallover and Fallback:

- Fallover: Represents the movement of a resource group from one node to the backup node
  in response to a failure on that node.
- Fallback: Represents the movement of a resource group from the backup node to the previous
  node, when it becomes available.


Key tasks in setting up an HACMP Cluster are:
- define the right Resource Group(s) and failover (fallover and fallback) policy
- create the right start and stop scripts for the application(s)
- setup the right IP parameters, like IP addresses and takeover methodology, per node

To illustrate the above, it probably nice to take a look at this (very simple) thread from the Internet:

  thread:

  Q:

  Hi All, 
  We have 2 servers running HACMP 4.3.1 in 
  non-concurrent rotating mode with IP Take Over 
  Facility Enabled. We have only one resourse group 
  running on Server A. In case of Failure, Services 
  Transfer to Server B(backup Server with same 
  configuration). 

  Now I have question is it possible to create another 
  resource group B active on Server B when Resource 
  Group A is Active On Server A. i.e both resource group 
  keep active on Different Server and both servers act 
  as a backup for each other. 

  Any practical implementation? 

  A:

  The short answer is "yes". We have that scenario on our servers running 
  Peoplesoft. One system is "primary" for HR and one system is primary for 
  Financials. However, each system functions as a backup for the other 
  application in case of a failure. 

  Sorry - I'm not an HA expert as we had a contractor actually come in and do 
  the work for us - but it is possible, as you asked. 




76.3 Application Servers:
=========================

To put the application under HACMP control, you create an application server resource that associates 
a user-defined name with the names of specially written scripts to start and stop the application. 
By defining an application server, HACMP can start another instance of the application on the takeover node 
when a fallover occurs. This protects your application so that it does not become a single point of failure. 
An application server can also be monitored with the application monitoring feature and the Application 
Availability Analysis tool. 

After you define the application server, you can add it to a resource group. A resource group is a set of 
resources that you define so that the HACMP software can treat them as a single unit.

HACMP can monitor applications that are defined to application servers, in one of two ways: 

-Process monitoring detects the termination of a process, using RSCT Resource Monitoring and Control (RMC) capability. 
-Custom monitoring monitors the health of an application based on a monitor method that you define. 


76.4 Daemons:
=============

Cluster Services:
 
Notice that if you list the daemons in the AIX System Resource Controller (SRC), you will see ES appended 
to their names. The actual executables do not have the ES appended; the process table shows the executable 
by path (/usr/es/sbin/cluster...). 

The following lists the required and optional HACMP/ES daemons: 

-- Cluster Manager daemon (clstrmgr):

This daemon monitors the status of the nodes and their interfaces, and invokes the appropriate scripts 
in response to node or network events. It also centralizes the storage of and publishes updated information 
about HACMP-defined resource groups. The Cluster Manager on each node coordinates information gathered from 
the HACMP global ODM, and other Cluster Managers in the cluster to maintain updated information about the content, 
location, and status of all HACMP resource groups. This information is updated and synchronized among all nodes 
whenever an event occurs that affects resource group configuration, status, or location.
All cluster nodes must run the clstrmgr daemon.

-- Cluster SMUX Peer daemon (clsmuxpd):

This daemon maintains status information about cluster objects. This daemon works in conjunction with 
the Simple Network Management Protocol (snmpd) daemon. All cluster nodes must run the clsmuxpd daemon.
Note: The clsmuxpd daemon cannot be started unless the snmpd daemon is running.

-- Cluster Information Program daemon (clinfo):

This daemon provides status information about the cluster to cluster nodes and clients and invokes 
the /usr/es/sbin/cluster/etc/clinfo.rc script in response to a cluster event. The clinfo daemon is optional 
on cluster nodes and clients.

-- Cluster Lock Manager daemon (cllockd):
This daemon provides advisory locking services. The cllockd daemon is required on cluster nodes only if 
those nodes are part of a concurrent access configuration.

- Cluster Topology Services daemon (topsvcsd):
This daemon monitors the status of network adapters in the cluster. 
All cluster nodes must run the topsvcsd daemon.

-- Cluster Event Management daemon (emsvcsd):
This daemon matches information about the state of system resources with information about resource conditions 
of interest to client programs (applications, subsystems, and other programs).The emsvcsd daemon runs on each node 
of a domain.

-- Event Management AIX Operating System Resource Monitor (emaixos):
This daemon acts as a resource monitor for the event management subsystem and provides information about 
the operating system characteristics and utilization. The emaixos daemon is started automatically by Event Management

-- Cluster Group Services daemon (grpsvcsd):
This daemon manages all of the distributed protocols required for cluster operation. 
All cluster nodes must run the grpsvcsd daemon.

-- Cluster Globalized Server Daemon daemon (grpglsmd):
This daemon operates as a grpsvcs client; its function is to make switch adapter membership global across 
all cluster nodes. All cluster nodes must run the grpglsmd daemon. 

- Group Services Concurrent Logical Volume Manager (gsclvmd).
When extended concurrent Volume Groups are used, this process manages concurrent Volumes.

- high availability Group Services daemon (hagsd) 

- high availability Topology Services daemon (hatsd). 


The AIX System Resource Controller (SRC) controls the HACMP/ES daemons (except for cllockd, which is a 
kernel extension). It provides a consistent interface for starting, stopping, and monitoring processes 
by grouping sets of related programs into subsystems and groups. In addition, it provides facilities for 
logging of abnormal terminations of subsystems or groups and for tracing of one or more subsystems. 

 

The HACMP/ES daemons are collected into the following SRC subsystems and groups: 

Daemon 				Subsystem	Group 
/usr/es/sbin/cluster/clstrmgr	clstrmgrES	cluster 
/usr/es/sbin/cluster/clinfo	clinfoES	cluster 
/usr/es/sbin/cluster/clsmuxpd	clsmuxpdES	cluster 
/usr/es/sbin/cluster/cllockd	cllockdES	lock 
/usr/sbin/rsct/bin/emsvcs	emsvcs		emsvcs 
/usr/sbin/rsct/bin/topsvcs	topsvcs		topsvcs 
/usr/sbin/rsct/bin/hagsglsmd	grpglsm		grpsvcs 
/usr/sbin/rsct/bin/emaixos	emsvcs		emsvcs 
/usr/es/sbin/cluster/clcomd	clcomdES	clcomd

When using the SRC commands, you can control the clstrmgr, clinfo, and clsmuxpd daemons by specifying 
the SRC cluster group. 

The required and optional HACMP and RSCT daemons are:

- clcomdES	Cluster communication daemon
- clstrmgrES	Cluster manager
- clinfoES	Cluster information daemon
- rmcd		RSCT resource Monitoring and Control daemon 
- hatsd		RSCT Topology Services subsystem (includes hats_nim* which send and receives heartbeats)
- hagsd		RSCT group services subsystem
- grpglsmd	main function is to make switch adapter membership global accross all cluster nodes.

Starting with hacmp 5.3, the cluster manager process is always running. It can be in one of two states,
as displayed by the command

# lssrc -ls clstrmgrES

ST_INIT (start event has executed)
ST_NOTCONFIGURED (start event has not executed)

# lssrc -ls clstrmgrES

Current state: ST_STABLE
sccsid = "@(#)36   1.135.1.62   src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 52haes_r540, 
                                                                            r540s001a 6/29/06 08:59:13"
i_local_nodeid 0, i_local_siteid -1, my_handle 1
ml_idx[1]=0     ml_idx[2]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 9
local node vrmf is 5400
cluster fix level is "0"
The following timer(s) are currently active:
Current DNP values
DNP Values for NodeId - 1  NodeName - n5101l01
    PgSpFree = 1849678  PvPctBusy = 2  PctTotalTimeIdle = 99.147538
DNP Values for NodeId - 2  NodeName - zd101l01
    PgSpFree = 2095773  PvPctBusy = 0  PctTotalTimeIdle = 98.956015
root@n5101l01:/root#




76.5 Understanding Cluster Service Startup:
===========================================
 
You start cluster services on a node by executing the HACMP/ES /usr/es/sbin/cluster/etc/rc.cluster script. 
Or use the Start Cluster Services SMIT screen, described in this section. 

Using smitty:
-------------

To start the HACMP cluster (the HACMP Cluster Manager) on the cluster nodes, there are two methods.

1. The first method is the most convenient; however, it can only be used if rsh is enabled. It allows the 
Cluster Manager to be started on both nodes with a single command:

% smitty hacmp

Cluster System Management
-> HACMP Cluster Services
-> Start Cluster Services

 
2. Alternatively, it is possible to use a slightly different SMIT path to start the Cluster Manager 
on the local node. Of course, this requires logging into each node independently to activate both Cluster Managers. 

% smitty hacmp

Cluster Services

-> Start Cluster Services

Take the defaults and press <Enter>.


Using scripts:
--------------

The rc.cluster script initializes the environment required for HACMP/ES 
by setting environment variables and then calls the /usr/es/sbin/cluster/utilities/clstart script 
to start the HACMP/ES daemons. The clstart script is the HACMP/ES script that starts all the cluster services. 
The clstart script calls the SRC startsrc command to start the specified subsystem or group. 
The following figure illustrates the major commands and scripts called at cluster startup: 

rc.cluster -> clstart -> startsrc

The HACMP/ES daemons are started in the following order: 

-RSCT daemons (Group Services, Topology Services, then Event Management) 
-Cluster Manager 
-Cluster SMUX daemon 
-Cluster Information Program daemon (optional) 

Using the C-SPOC utility, you can start cluster services on any node (or on all nodes) in a cluster 
by executing the C-SPOC /usr/es/sbin/cluster/sbin/cl_rc.cluster command on a single cluster node. 
The C-SPOC cl_rc.cluster command calls the rc.cluster command to start cluster services on the nodes specified 
from the one node. The nodes are started in sequential order, not in parallel. The output of the command 
run on the remote node is returned to the originating node. Because the command is executed remotely, 
there can be a delay before the command output is returned. 

The following example shows the major commands and scripts executed on all cluster nodes when cluster 
services are started in clusters using the C-SPOC utility. 


        NODE A           NODE B  
        cl_rc.cluster
             |        \rsh
             |         \
           rc.cluster    rc.cluster 
             |             | 
             |             |
           clstart        clstart
             |             |
             |             |
           startsrc       startsrc


-- Automatically Restarting Cluster Services 
You can optionally have cluster services start whenever the system is rebooted. If you specify the -R flag 
to the rc.cluster command, or specify "restart or both" in the Start Cluster Services SMIT screen, 
the rc.cluster script adds the following line to the /etc/inittab file. 

hacmp:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot> /dev/console 2>&1 
# Bring up Cluster 

At system boot, this entry causes AIX to execute the /usr/es/sbin/cluster/etc/rc.cluster script to start HACMP/ES. 

WARNING: Be aware that if the cluster services are set to restart automatically at boot time, you may face 
problems with node integration after a power failure and restoration, or you may want to test a node after 
doing maintenance work before having it rejoin the cluster. 

-- Starting Cluster Services with IP Address Takeover Enabled 
If IP address takeover is enabled, the /usr/es/sbin/cluster/etc/rc.cluster script calls the /etc/rc.net script 
to configure and start the TCP/IP interfaces and to set the required network options. 

-- Editing the rc.cluster File to Turn Deadman Switch Off 
In HACMP/ES, the Deadman Switch (DMS) is controlled by RSCT Topology Services. If, in a rare case, you want 
to turn the DMS off, you must edit the rc.cluster file as follows: 

There is a -D flag in clstart, located in /usr/es/sbin/cluster/utilities 
In the /usr/es/sbin/cluster/etc/rc.cluster file, find a call to "clstart" at about line #486. 
Edit this call to include the -D flag. 


76.6 Understanding Stopping Cluster Services:
=============================================
 
You stop cluster services on a node by executing the HACMP/ES /usr/es/sbin/cluster/utilities/clstop script. 
Use the HACMP for AIX Stop Cluster Services SMIT screen, described in the section Stopping Cluster Services 
to build and execute this command. The clstop script stops an HACMP/ES daemon or daemons. The clstop script 
starts all the cluster services or individual cluster services by calling the SRC command stopsrc. 

The following figure illustrates the major commands and scripts called at cluster shutdown: 

clstop -> stopsrc

Using the C-SPOC utility, you can stop cluster services on a single node or on all nodes in a cluster 
by executing the C-SPOC /usr/es/sbin/cluster/sbin/cl_clstop command on a single node. The C-SPOC cl_clstop 
command performs some cluster-wide verification and then calls the clstop command to stop cluster services 
on the specified nodes. The nodes are stopped in sequential order, not in parallel. The output of the command 
run on the remote node is returned to the originating node. Because the command is executed remotely, 
there can be a delay before the command output is returned. 

        NODE A           NODE B  
        cl_clstop
             |       \rsh
             |        \
           clstop       clstop
             |             | 
             |             |
           stopsrc      stopsrc



Starting and stopping using smitty:

To start cluster services, use

smit cl_admin -> Manage HACMP Services -> Start Cluster Services

To stop cluster services, use

smit cl_admin -> Manage HACMP Services -> Stop Cluster Services


76.7 Resource Groups:
=====================

If you consider the question of how the failover node takes control of a Resource Group, we can consider
the following options:

- Cascading resource groups:
  It defines a list of all the nodes that can control the Resource Group, and each node has a takeover
  priority. In case of a failure of the active node, the higest priority node aquires the Resource Group.
  If that node is unavailable, the next-highest node takes over, and so on.
  There are some differentiations if a lesser-higher node has taken over a RG, but a higher node
  becomes available. It's possible to define a Cascading method with fallback to the higher node,
  or to define it without fallback (CWOF).

- Rotating resource groups:
  
- Concurrent Access resource groups
- Custom Access resource groups




76.8: Cluster logfiles:
=======================

Cluster log files
HACMP for AIX scripts, daemons, and utilities write messages to the log files shown below.

HACMP log files Log file name Description 

/var/adm/cluster.log 	Contains time-stamped, formatted messages generated by HACMP for AIX scripts and daemons. 
			In this log file, there is one line written for the start of each event, and one line written 
			for the completion. 
/tmp/hacmp.out 		Contains time-stamped, formatted messages generated by the HACMP for AIX scripts. 
			In verbose mode, this log file contains a line-by-line record of each command executed 
			in the scripts, including the values of the arguments passed to the commands. By default, 
			the HACMP for AIX software writes verbose information to this log file; however, you can 
			change this default. Verbose mode is recommended. 
system error log 	Contains time-stamped, formatted messages from all AIX subsystems, including the HACMP 
			for AIX scripts and daemons. 

/usr/sbin/cluster/
history/cluster.mmdd 	Contains time-stamped, formatted messages generated by the HACMP for AIX scripts. 
			The system creates a new cluster history log file every day that has a cluster event 
			occurring. It identifies each day's file by the file name extension, where mm indicates 
			the month and dd indicates the day. 
/tmp/cm.log 		Contains time-stamped, formatted messages generated by HACMP for AIX clstrmgr activity. 
			Information in this file is used by IBM Support personnel when the clstrmgr is in debug mode. 
			Note that this file is overwritten every time cluster services are started; 
			so, you should be careful to make a copy of it before restarting cluster services on a 
			failed node. 
/tmp/cspoc.log 		Contains time-stamped, formatted messages generated by HACMP for AIX C-SPOC commands. 
			Because the C-SPOC utility lets you start or stop the cluster from a single cluster node, 
			the /tmp/cspoc.log is stored on the node that initiates a C-SPOC command. 
/tmp/dms_logs.out 	Stores log messages every time HACMP for AIX triggers the deadman switch. 
/tmp/emuhacmp.out 	Contains time-stamped, formatted messages generated by the HACMP for AIX Event Emulator. 
			The messages are collected from output files on each node of the cluster, and cataloged 
			together into the /tmp/emuhacmp.out log file. In verbose mode (recommended), this log file 
			contains a line-by-line record of every event emulated. Customized scripts within the event 
			are displayed, but commands within those scripts are not executed. 

/var/hacmp/clverify
/clverify.log		Contains messages when the cluster verification has run.





76.9 Oracle 10g, Oracle RAC 10g, and HACMP:
===========================================

Note 1:
-------

thread:

Q:

Hi Guys , I need some technical guidance regarding HACMP and Oracle Clusterware. I am designing an 
Oracle maximum Availability architecture for a client on 4 Nodes of IBM 570 PSeries servers on 
Oracle 10G RAC. The configuration includes IBM HACMP and Oracle Clusterware. No I need do know if I can 
fully rely on Oracle Clusterware as my Clusterware or I can configure both IBM HACMP and Oracle Clusterware 
for some services. Can these two clusterware coexist ?? 

A:

1) HACMP and Oracle Clusterware can co-exist
2) HACMP is optional
3) Oracle Clusterware is required for RAC whether or not you use HACMP.

A:

Yes they can co-exist. But my question is why complicate things. You cannot have a RAC cluster without 
the Oracle Clusterware. Meaning if you install HACMP you will have to install Oracle Clusterware also 
on top of this. Why complicate the stack... keep it simple.. we have been using Oracle clusteware on AIX 
without HACMP without any issues so far.


thread:

Q:

Any suggestions on how to provide a cold failover solution on two P5 
Series boxes with an Oracle database? With RAC being pricey, I don't 
think our business will be open to purchasing RAC licenses. Our UNIX 
Admin is adamant about using HACMP. Without RAC in place, how does HACMP 
interact with Oracle? From what I understand, since both nodes will be 
sharing the same disk storage, it should be as simple as starting the 
database on the second node with customized scripts in the event of a 
failure--is this true? HACMP apparantly does some sort of export from the 
primary node to the secondary node in the even of a failure, then runs 
customized scripts to start applications, etc....Seems too simplistic to 
me--am I missing something? 

I've also heard that if RAC is used for a cold failover solution, then the 
price is discounted. 

I'm struggiling with providing solutions to the business, knowing that new 
hardware and a network upgrade are going to incur a cost. 

Any thoughts, suggestions, etc would be much appreciated. 

A:





76.10 Other notes on HACMP:
===========================


Filesets and compatibility list HACMP versions - AIX versions:

Note 1:
-------

HACMP Version Compatibility Matrix 

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347

Document Author:  
Shawn Bodily

Document ID: 
TD101347 

Doc. Organization: 
Advanced Technical Support 
 
Document Revised: 
03/06/2007 

Product(s) covered: 
HACMP 
 

Abstract: This document provides a HACMP Version Compatibility Matrix. 


HACMP 	Version Supported? 	AIX Level(s) MISC 
1.2 	NO 			3.2.5   
2.1 	NO 			3.2.5   
3.1.0 	NO 			3.2.5   
3.1.1 	NO 			3.2.5  
4.1.0 	NO 			4.1.X   
4.1.1 	NO 			4.1.X  
4.2 	NO 			4.1.4, 4.2.X  
4.2.1 	NO 			4.1.5, 4.2.X  
4.2.2 	NO 			4.1.5, 4.2.1, 4.3.X  
4.3 	NO 			4.3.2, 4.3.3  
4.3.1 	NO 			4.3.2, 4.3.3  
4.4 	NO 			4.3.3  
4.4.1 	NO 			4.3.3, 5.1  
4.5 	NO 			5.1, 5.2  
5.1 	NO-09/01/2006 		5.1, 5.2,5.3  
5.2 	Y-9/30/2007 		5.1, 5.2,5.3  
5.3 	Y-9/30/2008 		5.2(ML4), 5.3(ML2) AIX 5.2 RSCT 2.3.6 or higher AIX 5.3 RSCT 2.4.2 or higher  
5.4 	Yes 			5.2 (TL8), 5.3(TL4) AIX 5.2 RSCT 2.3.9 or higher AIX 5.3 RSCT 2.4.5. or higher 
 
 
Cross Reference Chart 

		AIX 4.3.3 AIX 5.1 AIX 5.1(64-bit) AIX 5.2 AIX 5.3 
HACMP 4.5 	No Yes No Yes No 
HACMP/ES 4.5 	No Yes Yes Yes No 
HACMP/ES 5.1 	No Yes Yes Yes Yes 
HACMP/ES 5.2 	No Yes Yes Yes Yes 
HACMP/ES 5.3 	No No No Yes Yes 
HACMP/ES 5.4 	No No No Yes Yes 
 

Note 2:
-------

HACMP 5.1 requires:
- AIX 5L v5.1 ML5 with RSCT v2.2.1.30 or higher
- AIX 5L v5.2 ML2 with RSCT v2.3.1.0 or higher
- c-spoc vpath support requires SDD 1.3.1.3 or higher

HACMP 5.2:
AIX 
Each cluster node must have one of the following installed: 
AIX 5L v5.1 plus the most recent maintenance level (minimum ML 5) 
AIX 5L v5.2 plus the most recent maintenance level (minimum ML 2) 

HACMP 5.3 is supported on AIX 5.2 and 5.3
- AIX 5.2 ML06 or later with RSCT 2.3.6 or later
- AIX 5.3 ML02 or later with RSCT 2.4.2 or later


Note 3: HACMP FAQ:
------------------


I have installed HACMP, now what? 
 
Why does HACMP require so many subnets for IP address takeover? 
 
Does HACMP have any limits? 
 
How can I avoid the nameserver as a single point-of-failure? 
 
What is a config_too_long event? 
 
Do all cluster nodes need to be at the same version of HACMP and AIX 5L operating system? 
 
Why do I need a non-IP heartbeat network? 
 
Can I put different types of processors, communications adapters, or disk subsystems in the same cluster? 
 
What kinds of applications are best suited for a high availability environment? 
 
Can I use Etherchannel with HACMP? 
 
Can I use an existing Enhanced Concurrent Mode volume group for disk heartbeat? Or do I need to define a new one? 
 
 

Question: I have installed HACMP, now what?

Answer: Before HACMP can manage and keep your application highly available, you need to tell HACMP about 
your cluster and the application. There are 4 steps:

Step 1) Define the nodes that will keep your application highly available

The local node (the one where you are configuring HACMP) is assumed to be one of the cluster nodes 
and you must give HACMP the name of the other nodes that make up the cluster. Just enter a hostname or IP address 
for each node. 

Step 2) Define the application you want to keep highly available 
There are 3 things you need to tell HACMP about the application: 
name-provide a name 
start script-specify a script for HACMP to use to start the application 
stop script-specify a script for HACMP to use to stop the application 

Step 3) Verify and synchronize the cluster 
HACMP will discover all the networks and disks connected to the nodes. A verification step will ensure 
that the cluster configuration will be able to keep the application highly available. When successful the 
configuration will be copied to the rest of the nodes in the cluster. 

Step 4) Manage the application 
When you start HACMP it will begin managing the application and keeping it highly available. You can also use 
the maintenance facilities provided by HACMP to move the application between nodes for maintenance purposes. 

To see just how easy it is to configure HACMP, look for Using the SMIT Assistant in Chapter 11 of the 
Installation Guide. View the online documentation for HACMP. HACMP for Linux does not include the advanced 
discovery and verification features available on AIX 5L. When configuring HACMP for Linux you must manually 
define the cluster, networks and network interfaces. Any changes to the configuration require HACMP for Linux 
to be restarted on all nodes. 


Question: Why does HACMP require so many subnets for IP address takeover?

Answer: HACMP (using RSCT) determines adapter state by sending heartbeats across a specific network interface
-as long as heartbeat messages can be sent through an interface, the interface is considered alive. 
Prior to AIX 5L V5, AIX did not allow more than one interface to own a subnet route but in AIX 5L V5.1 multiple 
interfaces can have a route to the same subnet. This is sometimes referred to as multipath routing or 
route striping and when this situation exists, AIX 5L will multiplex outgoing packets destined for a particular 
subnet across all interfaces with a route to that subnet. This interferes with RSCT's ability to reliably 
send heartbeats to a specific interface. Therefore the subnetting rules for boot, service and persistent labels 
are such that there will never be a duplicate subnet route created by the placement of these addresses.

HACMP V5 includes a new feature whereby you may be able to avoid some of the subnet requirements 
by configuring HACMP to use a different set of IP alias addresses for heartbeat. With this feature you provide 
a base or starting address and HACMP calculates a set of addresses in proper subnets-when cluster services 
are active, HACMP adds these addresses as IP alias addresses to the interfaces and then uses these alias 
addresses exclusively for heartbeat traffic. You can then assign your "regular" boot, service and persistent 
labels in any subnet, but be careful: although this feature avoids multipath routing for heartbeat, 
multipath routing may adversely affect your application. Heartbeat via IP Aliasing is discussed in Chapter 2 
of the Concepts and Facilities Guide and Chapter 3 of the Administration and Troubleshooting Guide. 
View the online documentation for HACMP.


Question: Does HACMP have any limits?

Answer: The functional limits for HACMP (e.g. number of nodes and networks) can be found in Chapter 1 
of the Planning and Installation Guide. View the online documentation for HACMP.


Question: How can I avoid the nameserver as a single point-of-failure?

Answer: 1) Make the nodes look at /etc/hosts first before the nameserver by creating a 
/etc/netsvc.conf file with the following entry:

hosts=local,bind 

where local tells it to look at /etc/hosts first and then the nameserver

2) Remove /etc/resolv.conf (or modify name to save it for later use) so it looks for name resolution 
in /etc/hosts first.

For information on updating the /etc/hosts file and nameserver configuration, Installation Guide. 
View the online documentation for HACMP. 


Question: What is a config_too_long event?

Answer: The config_too_long event is an informational event run by HACMP whenever a cluster event runs 
for longer that a preset time. This can occur when:

an AIX 5L command (e.g. fsck) is taking a long time to complete, or has hung 
there was an un-recoverable error encountered - in this case there will be an "EVENT FAILED" indication 
in hacmp.out 

If the config_too_long event is run, you should check the hacmp.out file to determine the cause and if manual 
intervention is required. For more information on recovery after an event failure, refer to Recover from HACMP 
Script Failure in Chapter 18 of the Administration and Troubleshooting Guide. 


Question: Do all cluster nodes need to be at the same version of HACMP and AIX 5L operating system?

Answer: No, though there are some restrictions when running mixed mode clusters.

Mixed levels of AIX 5L on cluster nodes do not cause problems for HACMP as long as the level of AIX 5L 
is adequate to support the level of HACMP being run on that node. All cluster operations are supported 
in such an environment. The HACMP install and update packaging will enforce the minimum level of AIX 5L 
required on each system.

Similarly for Linux on POWER, different levels of the operating system should not cause problems as long as 
the minimum supported version is installed. Mixing different platforms-AIX 5L, RedHat and SUSE-within the 
same cluster is not supported.

As a matter of practicality, it is recommended that all nodes be at the same levels of operating system 
and HACMP whenever possible. Keeping, the operating system, HACMP and the application at the same level 
on all nodes will make the administration of the cluster easier and less error prone, and will go a long way 
towards reducing the frustration of the administrators. The Planning Guide has advice for effectively managing 
different installation and migration scenarios.


Question: Why do I need a non-IP heart beat network?

Answer: The purpose of the non-IP heartbeat link is often misunderstood. The requirement comes from the following: 
HACMP heartbeats on IP networks are sent as UDP datagrams. This means that if a node or network is congested, 
the heartbeats can be discarded. If there were only IP networks, and if this congestion went on long enough, 
the node would be seen as having failed and HACMP would initiate a takeover. Since the node is still alive, 
HACMP takeover can cause both nodes to have the same IP address, and can cause the nodes to both try to own 
and access the shared disks. This situation is sometimes referred to as "split brain" or "partitioned cluster". 
Data corruption is all but inevitable in this circumstance.

HACMP therefore strongly recommends that there be at least one non-IP network connecting a node to at least one 
other node. For clusters with more than two nodes, the most reliable configuration includes two non-IP networks 
on each node. The distance limitations on non-IP links-particularly RS-232-has often made this requirement 
difficult to meet. For such clusters, HACMP disk heartbeating should be strongly considered. Disk heartbeating 
enables the easy creation of multiple non-IP networks without requiring additional hardware or software.


Question: Can I put different types of processors, communications adapters, or disk subsystems in the same cluster?

Answer: In general, yes, as long as the individual components are supported by HACMP. Note that there are some 
combinations which may not be reasonable or desirable. For example, putting two Ethernet adapters that run at 
different speeds on the same network will generally force all adapters on the network to run at the speed of 
the slower one. Likewise, having a low powered processor back up a high-powered processor may result in 
unacceptable performance should HACMP have to run the application on the lower powered one. (But see the 
questions on dynamic LPAR and CUoD for a way of dealing with this). As long as AIX 5L and the hardware support 
the interconnections, HACMP will support them as well.


Question: What kinds of applications are best suited for a high availability environment?

Answer: HACMP detects failures in the cluster then moves or restarts resources in order to keep the application 
highly available. For an application to work well in a high availability environment, the application itself 
must be capable of being managed (start, stop, restart) programmatically (no user intervention required) and must 
have no "hard coded" dependencies on specific resources. For example, if the application relies on the hostname 
of the server (and cannot dynamically accept a change in hostname), then it is practically impossible to 
restart the application on a backup server after a failure.

Question: Can I use Etherchannel with HACMP?

Answer: See Using Etherchannel with HACMP.


Question: Can I use an existing Enhanced Concurrent Mode volume group for disk heartbeat? 
Or do I need to define a new one?

Answer: To achieve the highest levels of availability under the widest range of failure scenarios, the best practice 
would be to configure one disk heartbeat connection per physical disk enclosure (or LUN).

The heartbeat operation itself involves reading and writing messages from a non-data area of the shared disk. 
Although the space used for heartbeat messages does not decrease the space available for the application 
(it is in the reserved area of the disk) there is some overhead when the disk seeks back and forth between 
the reserved area and the application data area.

If you configure the disk heartbeat path using the same disk and vg as is used by the application, the best practice 
is to select a disk which does not have frequently accessed or performance critical application data: 
although the disk heartbeat overhead is small (2-4 seeks/sec), it could potentially impact application performance or,
conversely, excess application access could cause the disk hb connection to appear to go up and down.

Ultimately the decision of which disk and volume group to use for heartbeat depends on what makes sense for 
your shared disk environment and management procedures. For example, using a separate vg just for heartbeat 
isolates the heartbeat from the application data, but adds another volume group that has to be maintained 
(during upgrades, changes, etc) and consumes another LUN.

If you decide on a separate vg for heartbeat, it does not need to be included in an HACMP resource group, 
however, the CSPOC utilities use a resource group node list as the set of nodes to perform operations: 
including the vg in a resource group with just the (sub)set of nodes connected to the disk will let you take 
advantage of the CSPOC functions. You can also define and use a disk which is not part of any volume group, 
though such a setup would have to be manually configured and maintained.

   

Note 5:
-------

thread:

Q:

Hi, 

I?m try to varyon a concurrent vg, but i receive this 
error: 

root@dgij:/ > varyonvg -nc hb_vg 
srcsrqt failed errno : SRC_NSVR 
Subsystem [gsclvmd] is not active 
tellclvmd: request failed rc = -9036 [SRC_NSVR] 
0516-1334 varyonvg: The command /usr/sbin/tellclvmd 
returned an error. 

I try to start the gsclvmd subsystem, but i receive 
this error in errpt: 

A:

You need to install the bos.clvm.rte fileset from the HACMP CD in order to make HACMP start the gsclvmd service 




Note 6: Monitoring an HACMP Cluster:
------------------------------------

HACMP provides the following tools for monitoring a cluster:
- HAView 
- Cluster Monitoring with Tivoli
- the "/usr/es/bin/cluster/clstat" command
- WebSMIT clstat 


# /usr/es/sbin/cluster/clstat -a -o

clstat - HACMP Cluster Status Monitor
-------------------------------------

Cluster: manet_monet (1089262563)
Fri Jul 9 14:53:04 2004
State: UP Nodes: 2
SubState: STABLE

Node: manet State: UP
Interface: manete_boot (0) Address: xxx.xxx.xxx.xxx
State: DOWN
Interface: manete_stby (0) Address: xxx.xxx.xxx.xxx
State: UP
Interface: manet_tty0_01 (1) Address: 0.0.0.0
State: UP
Interface: manete_rep_svc (0) Address: xxx.xxx.xxx.xxx
State: UP
Resource Group: cas1 State: On line

Node: monet State: UP
Interface: monete_boot (0) Address: xxx.xxx.xxx.xxx
State: UP
Interface: monete_stby (0) Address: xxx.xxx.xxx.xxx
State: UP
Interface: monet_tty0_01 (1) Address: 0.0.0.0
State: UP


Other example:

clstat - monitors the status of an IBM HACMP cluster 
Description

Monitors the status of an HACMP cluster.

To monitor the status of HACMP in a terminal (ASCII mode):

root@n5101l01:/root#clstat -a -o

                clstat - HACMP Cluster Status Monitor
                -------------------------------------

Cluster: scenter_pr     (1192098110)
Thu Oct 25 10:21:31 2007
                State: UP               Nodes: 2
                SubState: STABLE

        Node: n5101l01          State: UP
           Interface: n5101l01-boot (2)         Address: 10.17.4.11
                                                State:   UP
           Interface: n5101l01_hb01 (0)         Address: 0.0.0.0
                                                State:   UP
           Interface: n5101l01_hb02 (1)         Address: 0.0.0.0
                                                State:   UP
           Interface: sonriso (2)               Address: 10.17.3.100
                                                State:   UP
           Resource Group: scenter_pr                   State:  On line

        Node: zd101l01          State: UP
           Interface: zd101l01-boot (2)         Address: 10.17.4.10
                                                State:   UP
           Interface: zd101l01_hb01 (0)         Address: 0.0.0.0
                                                State:   UP
           Interface: zd101l01_hb02 (1)         Address: 0.0.0.0
                                                State:   UP






Note 6: Starting and stopping GPFS:
-----------------------------------

Starting and stopping GPFS
Before starting GPFS:

Ensure you have: 
Verified the installation of all prerequisite software. 
Verified there is no conflicting software installed. 
Properly configured and tuned your system for use by GPFS. This should be done prior to configuring GPFS. 
Completed all of the GPFS configuration considerations including running the mmconfig command.
For details see the General Parallel File System for AIX 5L in an HACMP Cluster: Concepts, Planning, 
and Installation Guide. 
If you are using the Data Management API (DMAPI) for GPFS to manage the data in your file system, you may 
customize the shell script gpfsready to synchronize the initialization of the GPFS daemon and the data management 
application. The script is invoked by the GPFS daemon as file systems are starting to be mounted, and can be used 
to verify the data management application is ready to handle mount events from the file system. For further 
information regarding the script, see the General Parallel File System: Data Management API Guide and search 
for initializing the Data Management application.

Start the daemons on all of the nodes in the nodeset by issuing the mmstartup command:

# mmstartup -C set1

Check the messages recorded in /var/adm/ras/mmfs.log.latest on one node for verification:

mmfsd initializing ...
GPFS: 6027-300 mmfsd ready

This indicates successful start-up of a quorum of nodes.

If GPFS does not start, see the General Parallel File System for AIX 5L in an HACMP Cluster: 
Problem Determination Guide and search for the GPFS daemon will not come up.

See the mmstartup Command for complete usage information.

If it becomes necessary to stop GPFS, you can do so from the command line by issuing the mmshutdown command:

# mmshutdown -C set1

The system displays information similar to:

Wed Aug 16 17:27:01 EDT 2000: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems
k145n08:  forced unmount of /fs2
k145n08:  forced unmount of /fs1
k145n05:  forced unmount of /fs2
k145n05:  forced unmount of /fs1
Wed Aug 16 17:27:06 EDT 2000: 6027-1344 mmshutdown: Shutting down GPFS daemons
k145n08:  Shutting down!
k145n08:  0513-044 The mmfs Subsystem was requested to stop.
k145n05:  Shutting down!
k145n05:  0513-044 The mmfs Subsystem was requested to stop.
Wed Aug 16 17:27:10 EDT 2000: 6027-1345 mmshutdown: Finished

See the mmshutdown Command for complete usage information.



Note 7: Other remarks:
----------------------

7.1.
----

The main HACMP / RSCT services are:

- clcomdES	Cluster communication daemon
- clstrmgrES	Cluster manager
- clinfoES	Cluster information daemon

- rmcd		RSCT resource Monitoring and Control daemon 
- hatsd		RSCT Topology Services
- hagsd		RSCT group services

The following lines are added to inttab when you initially install hacmp. 

- It will start the clcomdES and clstrmgrES
subsystems if they are not already running.

hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init > /dev/console 2>&1

- HACMP is configured for IP address takeover

harc:2:wait:/usr/es/sbin/cluster/etc/harc.net #HACMP newtwork startup



To start the cluster communication daemon:

# startsrc -s clcomdES


7.2.
----

To install HACMP:

From installation media

# smitty install_all


7.3.
----

The most commonly used shared storage in HACMP is:

> Fiber Attach Storage Server (FAStT)
> Enterprise Storage Servers (ESS/Shark)
> Serial Architecture Storage (SSA)

Devices supported:

- Traditional SCSI disks and enclosures
- SSA disks and enclosures
- FAStT / DS4xxx storage servers
- 2105 Enterprise Storage Servers nd DS8xxx and 6xxx
- Some 3rd party storage devices


The "cluster.es" and "cluster.cspoc" images which contain the HACMP runtime executable, 
are required and must be installed on all servers.


7.4.
----

Dynamic Node Priority:

# lssrc -ls clstrmgrES


7.5.
----

Shared Logical Volume:

Shared logical volume access can be made available in any of the following data accessing modes:

- Non-concurrent access mode
- Concurrent access mode
- Enhanced concurrent access mode

In a non-concurrent access configuration, only one cluster node can access the shared data at a time.
If the resource group containing the shared disk space moves to another node, the new node will activate
the disks and check the current state of the volume groups, logical volumes, and filesystems.

In a concurrent access configuration, data on the disks is available to all nodes concurrently.
Concurrent access mode is not supported for filesystems; 
instead you must use raw logical volumes or physical disks.

7.6.
----

>> Is my shared volume group online?

The following sequence will determine if the sharedvg volume group is currently online (often useful 
in application start scripts): 

if lsvg -o | grep -q -w sharedvg ; then
    echo sharedvg is online
else
    echo sharedvg is offline
fi


Note the use of the -w option on the grep invocation - this ensures that if you have a sharedvg and a sharedvg2 
volume group then the grep only finds the sharedvg line (if it exists). 
If you need to do something if the volume group is offline and don't need to do anything if it is online 
then use this: 

if lsvg -o | grep -q -w sharedvg ; then
    :	# null commmand if the volume group is online
else
    echo sharedvg is offline
fi


Some people don't like the null command in the above example. They may prefer the following alternative: 

lsvg -o | grep -q -w sharedvg
if [ $? -ne 0 ] ; then
    echo sharedvg is offline
fi

Although we're not particularily keen on the null command in the first approach, we really don't like the use 
of $? in if tests since it is far to easy for the command generating the $? value to become separated from the 
if test (a classic example of how this happens is if you add an echo command immediately before the if command 
when you're debugging the script). If we find ourselves needing to test the exit status of a command in an if test 
then we either use the command itself as the if test (as in the first approach) or we do the following: 

lsvg -o | grep -q -w sharedvg
rval=$?
if [ $rval -ne 0 ] ; then
    echo sharedvg is offline
fi

In our opinion (your's may vary), this makes it much more obvious that the exit status of the grep command is 
important and must be preserved. 



>>Starting a non-root process from within an application start script


A common requirement in an application start script is the need to start a program and/or shell script 
which is to be run by a non-root userid. This snippet does the trick: 

su - dbadmin -c "/usr/local/db/startmeup.sh"

This will run the startmeup.sh script in a process owned by the dbadmin user. Note that it is possible 
to pass parameters to the script/program as well: 


su - dbadmin -c "/usr/local/db/startmeup.sh PRODDB"

This runs the startmeup.sh script with a parameter indicating which database is to be started. 
A bit of formalism never hurts when it comes time later to do script maintenance. For example, use shell variables 
to specify the username and the command to be invoked: 

DBUSER=dbadmin
DBNAME=PRODDB
STARTCMD="/usr/local/db/startmeup.sh $DBNAME"
su - $DBUSER -c "$STARTCMD"

This makes it easy to change the username, database name or start command (this is particularily important 
if any of these appear more than once within the application start script). 
The double quotes around $STARTCMD in the su command are necessary as the command to be executed must be passed 
as a single parameter to the su command's -c option. 


>> Note 1: Killing processes owned by a user

A common requirement in application stop scripts is the need to terminate all processes owned by a 
particular user. The following snippet terminates all processes owned by the dbadmin user (this could be part 
of an application stop script that corresponds to the previous snippet that started the DB as dbadmin). 

DBUSER=dbadmin
kill ` ps -u $DBUSER -o pid= `

Since a simple kill is rarely enough and a kill -9 is a rather rude way to start a conversation, the following 
sequence might be useful: 

DBUSER=dbadmin
kill ` ps -u $DBUSER -o pid= `
sleep 10
kill -9 ` ps -u $DBUSER -o pid= `

To see how this works, just enter the ps command. It produces output along these lines: 
12276
12348

Note that equal sign in the pid= part is important as it eliminates the normal PID title which would appear 
at the top of the column of output. I.e. without the equal sign, you'd get this: 

  PID
12276
12348

Passing PID to the kill command is just a bad idea as writing scripts which normally produce error messages 
makes it much more difficult to know if things are working correctly. 


>> A more complete example of an application stop script

A common requirement in application stop scripts is the need to terminate all processes owned by a particular user. 
For example, a script along the following lines could be used to first gently and then forcibly terminate 
the database processes started in the previous example: 

#!/bin/ksh

DBUSER=dbadmin
STOPCMD="/usr/local/db/stopdb.sh"

# ask nicely
su - $DBUSER -c "$STOPCMD"

# wait twenty seconds and then get rude
sleep 20
kill ` ps -u $DBUSER -o pid= `

# wait ten more seconds and then get violent
sleep 10
kill -9 ` ps -u $DBUSER -o pid= `

# terminate any processes using our two shared filesystems
fuser -k /dev/sharedlv1
fuser -k /dev/sharedlv2

# make sure that our exit status is 0
exit 0


Good HACMP site: http://www.coredumps.de/doc/ibm/cluster/HAES/haes/gtoc.html



Note 8: Internal logic error in Group Services d
================================================


>> thread:


Q:

Hello,

I get below grpsvcs errors on my cluster nodes (HACMP 5.4 - cluster is UP and STABLE):

463A893D 0425054607 P O grpsvcs Internal logic error in Group Services d


A:

ok, I think I have found it - it is a bug in rsct 2.4.6 and cab be fixed installing fix for APAR IY91960

http://www-1.ibm.com/support/docview.wss?uid=isg1IY91960

it is:
rsct.basic.rte.2.4.6.3.bff
rsct.core.hostrm.2.4.6.1.bff
rsct.core.rmc.2.4.6.3.bff
rsct.core.sec.2.4.6.3.bff
rsct.core.sensorrm.2.4.6.1.bff
rsct.core.utils.2.4.6.3.bff
rsct.opt.saf.amf.2.4.6.1.bff
rsct.opt.storagerm.2.4.6.3.bff


>> thread:


APAR IY26257

APAR status
Closed as program error.

Error description 

detection of bouncing nodes is too slow
Local fix 

Problem summary 
When the RSCT Topology Services daemon exits in one node, it
takes a finite time for the node to be detected as down by
the other nodes on each of the networks being monitored.
This happens because the other nodes need to go through a
process of missing incoming heartbeats from the given node,
and can only declare the node down after enough heartbeats
are missed. If a new instance of the daemon is started
then it is possible for the old instance to be still
thought as alive by other nodes by the time the new
instance starts.

The current behavior may result in other nodes never
detecting the given node as down. This occurs especially
when different networks use different Topology Services
heartbeating tunable values -- which is often the case in
HACMP.

In HACMP/ES, if the cluster is stopped in one node and is
then restarted quickly, it is possible for the cluster to
become "hung", with this node being unable to join the
others in the cluster. The following AIX error log entry
may appear:

  LABEL:          GS_DOM_NOT_FORM_WA
  IDENTIFIER:     AA8DB7B3
  Type:            INFO
  Resource Name:   grpsvcs
  Description: Group Services daemon has not been
               established.

Other nodes may present the entry below:

  LABEL:          GS_ERROR_ER
  IDENTIFIER:     463A893D

The problem is aggravated by the presence of HAGEO
networks, which have very large timeout values.
Problem conclusion 
A number of protocol changes were introduced into the RSCT
Topology Services daemon. With the changes

   - nodes where the Topology Services daemon exits are
     going to be detected as down faster than before the
     fix.

   - nodes where the Topology Services daemon exits and
     is restarted quickly are going to be detected as
     down soon after the new instance starts.

With the fix, error log entries like GS_DOM_NOT_FORM_WA
should no longer occur when restarting the HACMP cluster
on a node. In addition, because the demise of the previous
instance of the Topology Services daemon is detected
sooner, the new instance is allowed to join its adapter
memberships faster.
Temporary fix 
Comments 
APAR information 
APAR number IY26257 
Reported component name RSCT 
Reported component ID 5765D5101 
Reported release 121 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2001-12-11 
Closed date 2001-12-11 
Last modified date 2003-10-03 


>> thread:


Note 9:
-------

Typical 2 node /etc/hosts file:

#
# HACMP - Do not modify!
#
10.17.4.11      n5101l01-boot.nl.eu.abnamro.com n5101l01-boot
10.17.4.10      zd101l01-boot.nl.eu.abnamro.com zd101l01-boot
10.17.3.59      n5101l01.nl.eu.abnamro.com n5101l01
10.17.3.51      zd101l01.nl.eu.abnamro.com zd101l01
10.17.3.100     sonriso.nl.eu.abnamro.com sonriso
#
# End of HACMP
#





======================================================================
79. Notes on Installation and Migration AIX, HP-UX, Linux:
======================================================================


79.1 Migrations AIX 5.1,AIX 5.2,AIX 5.3:
---------------------------------------

-- New and Complete Overwrite 
This method installs AIX 5.3 on a new machine or completely overwrites any BOS version that exists on your system. 
For instructions on installing AIX on a new machine or to completely overwrite the BOS on an existing machine, 
refer to Installing new and complete BOS overwrite or preservation.

-- Preservation 
This method replaces an earlier version of the BOS but retains the root volume group, the user-created logical volumes, 
and the /home file system. The system file systems /usr, /var, /tmp, and / (root) are overwritten. 
Product (application) files and configuration data stored in these file systems will be lost. 
Information stored in other non-system file systems will be preserved. 
For instructions on preserving the user-defined structure of an existing BOS, refer to Installing new and 
complete BOS overwrite or preservation.

-- Migration 
This method upgrades from AIX 4.2, 4.3, 5.1, or 5.2 versions of the BOS to AIX 5.3 (see the release notes 
for restrictions). The migration installation method is used to upgrade from an existing version or release 
of AIX to a later version or release of AIX. A migration installation preserves most file systems, 
including the root volume group, logical volumes, and system configuration files. It overwrites the /tmp file system. 


Installation Steps 		New and Complete Overwrite 	Preservation 	Migration 
Create rootvg 			Yes 				No 		No 
Create file system /,/usr,/var 	Yes 				Yes 		No 
Create file system /home 	Yes 				No 		No 
Save Configuration 		No 				No 		Yes 
Restore BOS 			Yes 				Yes 		Yes 
Install Additional Filesets 	Yes 				Yes 		Yes 
Restore Configuration 		No 				No 		Yes 



Note 1:
-------

thread

Found that the prngd subsystem (used with ssh, random number generator) on AIX 5.1 is incompatible with the 
AIX 5.2 upgrade. BEFORE migration this subsystem should be disabled either in /etc/rc.local or erased completely: 
rmssys -s prngd

It has to be remade after migration (see customization). 
If prngd is not disabled, the final boot after 5.2 installation coredumps with 0C9 and the machine never recovers. 
In this case: 

Boot into maintenance mode (needs first 5.2 CD and SMS console) 
Limited function shell (or getrootfs) 
vi /etc/rc.local to disable prngd 

- Firmware/Microcode upgrade
It is wise to update the firmware/microcode of your system before upgrading the system. Checkout the IBM support 
site Directly via ftp site. 
- Base system
Straightforward like installing from scratch. When asked, select "Migration" instead of "Overwrite" installation. 


Note 2:
-------

thread:

Problem creating users on AIX 5.2
Reply from tcarlson on 6/28/2007 6:14:00 PM  

It appears to working now. Thanks for the replies. 
Not exactly sure what fixed it, but I ran usrck, grpck, and pwdck and it started working again. 


Note 3:
-------

AIX 5.2 Installation Tips (Doc Number=1612) 
  Fix Readme 

April 27, 2005 

--------------------------------------------------------------------------------
This document contains the latest tips for successful installation of AIX 5.2, and will be updated as new tips become available. 
APARs and PTFs mentioned in this document, when available, can be obtained from the following web site. 

http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html 
http://www14.software.ibm.com/webapp/set2/sas/f/genunix3/aixfixes.html

The AIX installation CD-ROMs and the level of AIX pre-installed on new systems may not contain the latest fixes available at the time you install the system, and may contain errors. Some these fixes may be critical to the proper operation of your system. We recommend that you update to the latest service level, which can be obtained from http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html. 
The compare_report command, which is documented in the AIX Commands Reference, can be used to determine which available updates are newer than those installed on your system. 


--------------------------------------------------------------------------------

Reads from Frozen JFS2 Filesystem Hang 
System Firmware for POWER 4 Systems 
Critical Updates for 5200-04 
CD-ROM Installation of JS20 Appears to Hang 
oslevel -r Does not Indicate 5200-04 After Update 
Documentation Library Service Broken Links 
lppchk Error on /usr/lib/perl with HACMP Installed 
License Agreement Failures 
Possible Data Error on DR Memory Remove 
Possible File Corruption Running defragfs 
ksh: clean_up_was: not found 
Installation of devices.artic960 5.2 
ARTIC960Hx SDLC and Bisync Support 
Inventory Scout 
HMC 3.2.0 Upgrade 
TSM 5.1 Not Supported on AIX 5.2 
AIXlink/X.25 Version 2 on AIX 5.2 


--------------------------------------------------------------------------------

Reads from Frozen JFS Filesystem Hang
After application of the 5.2.0.60 level kernels (bos.mp, bos.mp64, bos.up), which are included on the 5/2005 Update CD and in the 5200-06 Recommended Maintenance package, reads from a frozen JFS2 filesystem will no longer be possible. All reads from a frozen filesystem will be blocked until the filesystem is thawed. Because of this, a filesystem level backup, such as a backup using the backup command or through TSM, will appear to hang until the filesystem is thawed. This restriction will be lifted in APAR IY70225. 
Backups using FlashCopy or similar logical volume or device level backups are still possible on a frozen filesystem. 



--------------------------------------------------------------------------------

System Firmware for POWER 4 Systems
The AIX 5200-04 (or later) Recommended Maintenance package exposes a problem in older levels of POWER 4 system firmware that can manifest itself as either a system hang or hang in some diagnostic commands. The commands that expose the problem include, but may not be limited to snap, lscfg, and lsresource. This problem can be resolved by installing the latest system firmware, available at the following web site. 

http://techsupport.services.ibm.com/server/mdownload/ 
http://www14.software.ibm.com/webapp/set2/firmware/gjsn


--------------------------------------------------------------------------------

Critical Updates for 5200-04
When installing the 5200-04 Recommended Maintenance package, it is also recommended that you install the following APARs. 
IY64978  Possible system hang while concurrently renaming and unlinking under JFS. This APAR is currently available.  
IY63366  Loader may fail to find symbol even though the symbol is present in the symbol table. This can cause applications that use dynamically loaded modules to fail. Prior to APAR availability, an emergency fix is available at: 
ftp://service.software.ibm.com/aix/efixes/iy63366/  


Systems running bos.rte.lvm 5.2.0.41 or later should install APAR IY64691. APAR IY64691 fixes a problem with the chvg-B command that can cause data corruption on Big volume groups which were converted from normal volume groups. Prior to APAR availability, obtain the emergency fix for APAR IY64691 from: 
A href="ftp://service.software.ibm.com/aix/efixes/iy64691/">ftp://service.software.ibm.com/aix/efixes/iy64691/ 

Systems running bos.rte.lvm 5.2.0.50 should install APAR IY65001. APAR IY65001 fixes a possible corruption issue with mirrored logical volumes. This APAR also contains the fix for APAR IY64691. Prior to APAR availability, obtain the emergency fix for APAR IY65001 from: 
ftp://service.software.ibm.com/aix/efixes/iy65001/ 

Systems running bos.rte.aio 5.2.0.50 should install APAR IY64737. APAR IY64737 fixes a problem where applications that use Asynchronous I/O (AIO) can cause a system hang. Prior to APAR availability, obtain the emergency fix for APAR IY64737 from: 
ftp://service.software.ibm.com/aix/efixes/iy64737/ 



--------------------------------------------------------------------------------

CD-ROM Installation of JS20 Appears to Hang
Following installation of packages from CD-ROM volume 1, the installation appears to hang for 45 to 60 minutes before prompting for volume 2. During this time, the installation process is verifying the installed packages. The problem will be fixed in a later level of the AIX installation media. 


--------------------------------------------------------------------------------

oslevel -r Does not Indicate 5200-04 After Update
After updating from the 8/2004 Update CD or the 5200-04 Recommended Maintenance package, the 'oslevel -r' command may not indicate 5200-04. This occurs when updating from a level lower than 5200-03 because the bos.pmapi.tools update does not install. Performing a second update from the media will install the bos.pmapi.tools update and correct the problem. 


--------------------------------------------------------------------------------

Documentation Library Service Broken Links
A search of the Documentation Library Service will return hits, but the links result in "404 Not Found" errors. This problem occurs with the sysmgt.websm filesets at the 5.2.0.30 level. This can be resolved using the following command, where $DOC_LANG is any language directory installed under /usr/HTTPServer/htdocs (example: de_DE, en_US, etc.). 
ln -sf /usr/share/man/info /usr/HTTPServer/htdocs/$DOC_LANG/doc_link 


--------------------------------------------------------------------------------

lppchk Error on /usr/lib/perl with HACMP Installed
After installing the cluster.es.server.rte fileset at the 5.2.0.0 level, the lppchk command may return an error due to a missing /usr/lib/perl link. The error can be resolved by doing a force overwrite install of perl.rte from the base AIX media, or by running the following command as root. 
ln -s /usr/opt/perl5/lib /usr/lib/perl 


--------------------------------------------------------------------------------

License Agreement Failures
Installation of some device packages using SMIT may fail due to license agreement failures. This is caused by a missing SMIT "ACCEPT new license agreements" option. To resolve this issue, first install APAR IY52152, which is included in bos.sysmgt.smit 5.2.0.30 or later. 


--------------------------------------------------------------------------------

Possible Data Error on DR Memory Remove
For systems running Dynamic Logical Partitioning (DLPAR), it is imperative that the fix for APAR IY50852 be installed. To determine if APAR IY50852 is installed on your system, use the command: 
instfix -ik IY50852 
When doing a dynamic reconfiguration memory remove operation with DMA operations ongoing, it is possible that data actively being DMAed to pages within the memory being removed may be misdirected to memory that is no longer active in the partition. This could result in a program reading wrong data. 



--------------------------------------------------------------------------------

Possible File Corruption Running defragfs
For systems using JFS2 filesystems, it is imperative that the fix for APAR IY50791 be installed. To determine if APAR IY50791 is installed on your system, use the command: 
instfix -ik IY50791 
When data is synced (written to disk) soon after running the defragfs command on a JFS2 filesystem, incomplete data can be written. Sync operations can be performed with the sync command, but are also performed when unmounting a filesystem or during a system shutdown or reboot. 



--------------------------------------------------------------------------------

ksh: clean_up_was: not found
When attempting to run the clean_up_was shell script as part of installing the Websphere Application Server 5.0.1 trial on the AIX 5.2 Bonus Pack (LCD4-1141-02), the following error may be encountered. 
	ksh: clean_up_was: not found.
This problem is caused by control characters at the end each line within the script. To correct the script, use the following procedure: 
	mv clean_up_was clean_up_was.orig
	tr -d '\015' <clean_up_was.orig >clean_up_was
	chmod 755 clean_up_was


--------------------------------------------------------------------------------

Installation of devices.artic960 5.2
To successfully upgrade to devices.artic960 5.2 from a previous version, it is necessary to install APAR IY48642. 


--------------------------------------------------------------------------------

ARTIC960Hx SDLC and Bisync Support
The devices.pci.14108c00 fileset provides support for SDLC and bisynchronous protocols on the IBM ARTIC960Hx 4-Port Selectable PCI Adapter, (FC 2947). When combined with the installation of the devices.artic960 5.2.0.0 fileset, Enhanced Error Handling (EEH) support is provided. APAR IY44132 provides 64-bit support. 


--------------------------------------------------------------------------------

Inventory Scout
Inventory Scout introduces a new microcode management graphical user interface (GUI). This feature is available on your AIX system by installing an additional fileset, invscout.websm, onto the system, or if a Hardware Management Console (HMC) is attached, using the microcode update function. This GUI is a Web-based System Manager plug-in that surveys the microcode levels of the system, and on POWER4 systems, downloads and installs microcode. Inventory Scout continues to work with the applet found at https://techsupport.services.ibm.com/server/aix.invscoutMDS to survey only. 
This release of Inventory Scout significantly changes the method used to determine the microcode levels of systems, adapters, and devices to compare to the latest available levels. Previously, data was collected and sent to IBM to determine the current state of the system. 

The new microcode management feature does the following: 

Downloads a catalog of currently available levels to the system being examined 
Conducts a microcode survey on the system and compares to the latest available microcode 
On the POWER4 systems, allows you to download and flash to the latest microcode available 
This new microcode survey procedure might cause some problems with some customer techniques used today for surveying systems and might require changes to those procedures. 

This microcode management feature relies on system features that were not present in previous generations of systems. Support for microcode on these systems is limited to survey only. For more information about microcode updates, see http://techsupport.services.ibm.com/server/mdownload. To enable this new Inventory Scout functionality, you will need the following filesets at the specified levels or higher: 

	invscout.com            2.1.0.1
	invscout.ldb            2.1.0.2
	invscout.rte            2.1.0.1
	invscout.websm          2.1.0.1
To obtain the required filesets, order APAR IY44381. Go to the following URL: 

http://www.ibm.com/servers/eserver/support/pseries/aixfixes.html 
If you are using this microcode management feature tool through the HMC, your HMC must be at Release 3, Version 2.2. This can be obtain by ordering APAR IY45844. 

The HMC code can be obtained from http://techsupport.services.ibm.com/server/hmc/. 

Known Problems: 

The following devices supported in POWER4 systems have limitations in the ability to update microcode with this microcode management feature. 
SCSI Enclosure Services (ses) Microcode for 7311-D20, 7038-6M2 & 7028-6C4/6E4 
7040-61D SCSI I/O Drawer 
PCI 4-Channel Ultra3 SCSI RAID Adapter 
CD-ROM and DVD-ROM Drives 
RAID Devices 
SSA devices and adapters 

For more information about these devices, see the Readme files at http://techsupport.services.ibm.com/server/mdownload. 


When updating system firmware from an HMC, the connection between the HMC and the system might get out of sync. This situation can be recovered by going to your server management panel on the HMC and selecting Rebuild Managed System. 

Some adapters and devices do not support concurrent operation with microcode flashing. Such devices must be taken off-line to update microcode. This situation creates a problem when updating microcode for these communications adapters, such as Ethernet adapters used to communicate with the Internet to obtain the microcode updates or communicate with an HMC. In this case, if the adapters are on-line and the update is attempted, the final step of flashing the device is not completed. You can complete the update procedure by taking the device off-line, and going into diagnostic service aids to download microcode to that device. 

Due to the changes in how the survey works, you can no longer concatenate survey results prior to sending them to IBM. 

There is a known system firmware upgrade problem with pSeries 690 or pSeries 670 Servers that have six 7040-61D I/O Drawers and three Integrated Battery Features (IBFs) (battery backup) OR seven or more 7040-61D I/O Drawers, regardless of the number of IBFs. Systems with this configuration should not use the new GUI for microcode management to update the system firmware. For additional information, reference the 7040-681 and/or 7040-671 Readme files which can be found at http://techsupport.services.ibm.com/server/mdownload. 



--------------------------------------------------------------------------------

HMC 3.2.0 Upgrade
The following enhancements delivered for the p670 and p690 systems in May 2003 will not be available unless the Hardware Management Console (HMC) is upgraded to the 3.2.0 or later level. New HMCs delivered after May 2003 will be at the required level. You can obtain the latest HMC update from the Download section of the pSeries Support web pages at http://www.ibm.com/servers/eserver/support/pseries. 
32 partitions 
Distributed RMC 
CUoD - email home 
CUoD memory - permanent 
CUoD memory - try & buy 
CUoD processor - try & buy 
Customer firmware management 
Fast activation of a partition 
Flat Panel display support 
Full HMC command line 
Microcode download from HMC 


--------------------------------------------------------------------------------

TSM 5.1 Not Supported on AIX 5.2
Tivoli Storage Manager (TSM) 5.1 is not compatible with AIX 5.2 and will cause a system crash if installed. TSM 5.1 is shipped in error on the AIX 5L for POWER V5.2 Bonus Pack (LCD4-1141-00) dated 10/2002, and should not be installed. 
Once TSM 5.1 is installed on AIX 5.2, the system will crash on every reboot. To recover from this situation, the system will have to be booted in maintenance mode from the AIX 5.2 installation media or a system backup (mksysb) to uninstall the tivoli.tsm.* filesets. Alternatively, the following line can be uncommented in /etc/inittab by inserting a ':' (colon) at the beginning of the line. 


adsmsmext:2:wait:/etc/rc.adsmhsm > /dev/console 


AIXlink/X.25 Version 2 on AIX 5.2
To avoid a system crash when using TCP/IP over X.25 functionality, it is necessary to install APAR IY45606. 

See: http://www-1.ibm.com/support/docview.wss?uid=isg1SSRVAIX52TIPS081512_450 
 

Note 4:
-------

thread:

Q:

Attention msg during mksysb 

Hi,
I am running AIX 5.2 ML03, I am receving following Attention msg during the
mksysb

****ATTENTION****
The boot image you created might fail to boot because the size exceeds the
system limit. For information about fixes or workarounds,
see/usr/lpp/bos.sysmgt/README.
****ATTENTION****
..
Creating list of files to back up..
Backing up 569000 files.................................


What am I missing in it? any help or hints or tips will be of great value to
me. Thanks

A:

This solution DOES NOT WORK on models 7028, 7029, 7038, 7039, and 7040
systems, see option 4 regarding these models.
If APAR IY40824 (AIX 5.1) or IY40975 (AIX 5.2) was installed prior to making
the backup, then you may boot from the backup and go to the open firmware
prompt. To get to the open firmware prompt, when the system beeps twice after
powering it on, press F8 on the keyboard (or the 8 key on an ASCII terminal).
You can also get to the open firmware prompt from SMS. The open firmware
prompt is also referred to as the "OK" prompt. On some systems there will be
a menu option located on the initial SMS menu. On others, it will be located
under the Multiboot menu. From the open firmware prompt execute the following:

>setenv real-base 1000000
>reset-all

Notes:
a) To use this option, the backup must have this APAR in it and therefore
must be created after installing the APAR.
b) The above commands will have to be executed each time you boot from the
large boot image backup media.
 

Note 5:
-------

Configuring mksysb image on system backup tapes
Use the swmksysb command to ensure that the boot image, BOS Installation/Maintenance image, 
and the table of contents image are created with a tape block_size value of 512.

Bootable mksysb tapes comprise the following images: 
Boot image 
BOS Installation/Maintenance image 
Table of contents image 
System backup image 
The system backup image is the actual backup of the files in the rootvg in all JFS-mounted file systems.
The boot image, BOS Installation/Maintenance image, and the table of contents image must be created with a 
tape block_size value of 512. The mksysb command ensures that the block size is 512 when these images are created. 
There are no restrictions on the block size used for the fourth (system backup image) on the tape. 
The block size of the system, before it was temporarily set to 512, is used for the fourth image on the tape.

The value of the block size must be saved in the /tapeblksz file in the second image on the tape. 
The second and fourth images are stored in backup/restore format. Again, mksysb ensures the correctness 
of the tapes created by using the mksysb command.

If there are problems with the bosinst.data file, the image.data file, or the tapeblksz file, these files 
can be restored from the second image on the tape and checked. These files, as well as commands necessary 
for execution in the RAM file system (when running in maintenance mode after booting from the tape), 
are stored in the second image.


Note 6:
-------

thread

Before you migrate 5.1 -> 5.2, do as an absolute minimum the following:

- errpt, and resolve all serious issues. If you can't, then STOP.
- enough free space rootvg, /, /tmp, /usr, /var
- lppchk -v   If dependencies are not OK, then correct or STOP.
- check firmware. Is the current firmware ok for AIX52? Use "prtconf" or "lsmcode".

  Example:

  To display the system firmware level and service processor (if present), type: 
  # lsmcode -c

  The system displays a message similar to the following: 
  System Firmware level is TCP99256

  If the Firmware version is not current enough, then upgrade or STOP.

Or use
  # lscfg -vp | grep -p Platform

- Always create a mksysb tape.

Note: its quite likely that your apps still need a number of AIX fixes, APARS 
      before they can run on AIX52.


POWER5 Firmware releases:

Release End of Currency 
SF240   To be announced 
SF235   March 2007 
SF230   March 2007 
SF225   Out of Currency 
SF222   Out of Currency 
SF220   Out of Currency 
SF210   Out of Currency 

Firmware lower than SF230 is not usable for upgrades/migration.


Note 7:
-------

thread

Migration AIX 4.3.3 to 5.2, Rebooting don't stop
Reply from Simone on 2/9/2005 10:17:00 AM  

Hi, the Reboot problem during migration to 5.2 is SOLVED. 

As described in the previous messages, we have 4 networks cards and two are unused 
(no cable attached, no tcpip adress defined and no network mask allocated). 

During the reboot, file "/etc/rc.net" is executed (boot stage two). This one call "/usr/lib/methods/cfgif" 
which configure the network (ethernet adapter, server name, default gateway, static routes). 
Because of the two unconfigured cards and the execution of "/usr/lib/methods/cfgif", server do a "SYSTEM DUMP DEV" 
and reboot again. 

To solve this issue, two ways are possible: 
- detach the unconfigured cards (chdev -l en0 -a state=detach) 
- configure the cards 

Please note, that no informations have been founded into IBM documentation about this issue. 

THEN, BEFORE A AIX UPGRADE 4.3.3 TO 5.2, BE SURE TO HAVE ALL cards are correctly configured. 

Thanks 


Note 8:
-------


AIX can use the /etc/rc.local but you need an entry in /etc/inittab as follows:

To add local daemons to the system
startup sequence in a BSD rc.local style use the following command to
create an /etc/inittab entry:

# mkitab -i rcnfs "rclocal:2:wait:/etc/rc.local >/dev/console 2>&1"
# touch /etc/rc.local
# chmod 700 /etc/rc.local

Then put the command lines to start the daemons in /etc/rc.local.




==========================
80. Some IP subnetting:
==========================

Traditional IP Classes are:

The  first byte in the 4 byte address corresponds to:

Class A: 1-126	        0xxxxxxx.yyyyyyyy.yyyyyyy.yyyyyyyy
Class B: 128-191	10xxxxxx.xxxxxxxx.yyyyyyy.yyyyyyyy
Class C: 192-223	110xxxxx.xxxxxxxx.xxxxxxx.yyyyyyyy
Class D: 224		1110----.--------.-------.--------  (not used, or multicast)

Notice the first bits in the first byte in the class address.


Note : 127.aaaaaaaa.aaaaaaa.aaaaaaaa is reserved for debugging/testing purpose, or local host



In an IP address aaaaaaaa.aaaaaaaa.aaaaaaaa.aaaaaaaa, the 8 bits "aaaaaaaa" corresponds  to

ax2macht7+ax2macht6+ax2macht6+ax2macht5+ax2macht4+ax2macht3+ax2macht2+ax2macht1+ax2macht0 

with "a" is a bit, that is, a=0 or a=1

1x128 + 1x64 + 1x32 + 1x16 + 1x8 + 1x4 + 1x2 + 1x1

So, in for example in class A: 0xxxxxxx, means that the first byte is at maximum value of
0x128 + 1x64 + 1x32 + 1x16 + 1x8 + 1x4 + 1x2 + 1x1 = 127, but 127 is reserved, so Class A runs from 1 - 126

Similar for Class B: it can be minimum of 10000000 or 10111111 in the first byte, and thats 128-191
Remember, by design, the first two bits in B, MUST BE "10", and the other 6 bits can vary.


Subnetting:

Class C subnetting:  
                    No of     No of     No of      No of
		    subnets   hosts	subnetbits hostbits
-----------------------------------------------------------
 *255.255.255.128        NA      NA           1        7     * not valid with most routers
  255.255.255.192         2      62           2        6
  255.255.255.224         6      30           3        5
  255.255.255.240        14      14           4        4
  255.255.255.248        30       6           5        3
  255.255.255.252        62       2           6        2


Class B subnetting:
                    No of     No of     No of      No of               
		    subnets   hosts	subnetbits hostbits
-----------------------------------------------------------
255.255.128.0            NA      NA           1       15
255.255.192.0             2   16382           2       14
255.255.224.0             6    8190           3       13
255.255.240.0            14    4094           4       12
255.255.248.0            30    2046           5       11
255.255.252.0            62    1022           6       10
255.255.254.0           126     510           7        9
255.255.255.0           254     254           8        8
255.255.255.128         510     126           9        7
255.255.255.192        1022      62          10        6
255.255.255.224        2046      30          11        5
255.255.255.240        4094      14          12        4
255.255.255.248        8190       6          13        3
255.255.255.252       16382       2          14        2





========================================
81. Notes on TSM:
========================================


81.1 Notes on TSM client dsmc:
==============================

Installing the TSM 5.1 client under IBM AIX 4.3.3 

These instructions will guide you through the installation and configuration of the Tivoli 
Storage Manager client, so you can back up your data using DoIT's Bucky Backup service. 
You should be familiar with the AIX operating system distribution and have root or root-equivalent 
access to the machine you are working with. These instructions and the AIX client are specific to 
the pSeries & RS/6000 architecture.

Data that has been backed up or archived from a TSM v5.1 client cannot be restored or retrieved to any 
previous level client. The data must be restored or retrieved by a v5.1.0 or higher level client. 
Once you migrate to 5.1 you cannot go back to an older client (but you can certainly restore older data). 
This is non-negotiable. You have been warned.

This product installs into /usr/tivoli/tsm/client. It requires 40 to 50 megabytes of space. 
During the installation SMIT will extend the filesystem if you do not have enough space. 
If you do not have space in rootvg, you can symlink /usr/tivoli/tsm/client into a directory 
where you do have enough space.

You must have registered a node and have received confirmation of your node name. Make sure you know 
the password that you specified when applying for the node.

You must have xlC.rte installed in order to install the client. If you wish to use the graphical client 
under AIX you must have AIXwindows X11R6, Motif 1.2 or Motif 2.0, and the CDE installed.

Acquire the software from Tivoli. You can use wget or lynx to retrieve the files from their web site 
(or use the "Save Target As..." feature of your browser):

ftp://service.boulder.ibm.com/storage/tivoli-storage-management/maintenance/client/v5r1/
Start SMIT to install the software:

smitty install

Select "Install and Update Software", then "Install and Update from LATEST Available Software". 
When it prompts you for the "INPUT device / directory for software" specify the directory in which 
you saved the installation files. Proceed to install the software ("_all_latest")

Change to the new directories created for the client:

cd /usr/tivoli/tsm/client/ba/bin


Create and edit the dsm.sys, dsm.opt, and inclexcl files for your system. Sample files are linked. 
At a minimum, you will have to edit dsm.sys and insert your node name.

Start dsmc by using the ./dsmc command. Enter the command "query schedule" and you will be prompted 
for your node's password. Enter your password and press enter. Once it successfully displays the node's 
backup schedule, enter the command "quit" to exit it. This saves your node's password, so that backups 
and other operations can happen automatically.

To start the TSM client on reboot, edit /etc/inittab and insert the line (all one line):

tsm:2:once:/usr/tivoli/tsm/client/ba/bin/dsmc schedule servername=bucky3 > /dev/null 2>&1 < /dev/null
Issue the following command on the command line, as root, to manually start dsmc:

nohup /usr/tivoli/tsm/client/ba/bin/dsmc schedule -servername=bucky3>/dev/null & 

Verify that the client has started and is working by checking the log files in /usr/tivoli/tsm/client/ba/bin.

You can perform a manual backup to test your settings using the command:

/usr/tivoli/tsm/client/ba/bin/dsmc incremental

Remember that if you change the settings in dsm.sys, dsm.opt, or inclexcl you need to restart the software.

Upgrading the TSM client from 4.2 to 5.1

To upgrade the TSM client from 4.2.1 to 5.1 use the following procedure:

Obtain a copy of the software (use the links at the top of this page). 

Kill the running copy of dsmc (a "ps -ef | grep dsmc" will show you what is running. Kill the parent process). 

Back up dsm.opt, dsm.sys, and inclexcl from your old configuration (probably in /usr/tivoli/tsm/client/ba/bin). 
The upgrade will preserve them, but it pays to have a backup copy. 

Upgrade the TSM client packages using "smitty install". Select "Install and Update Software", 
then "Update Installed Software to Latest Level (Update All)". Specify the directory in which 
the software was downloaded. 

Edit your dsm.sys file and ensure that the TCPServeraddress flag is set to buckybackup2.doit.wisc.edu 
OR buckybackup3.doit.wisc.edu (this just ensures future compatibility with changes to the service). 
This setting could be either server, depending on when you registered your node. 

Restart dsmc using the command:

nohup /usr/tivoli/tsm/client/ba/bin/dsmc schedule -servername=bucky2 >/dev/null & 

Watch your logs to ensure that a backup happened. You can also invoke a manual backup using 
"dsmc incremental" from the command line. 

So how to install:

Run the script tsm-instl 
Modify /usr/tivoli/tsm/client/ba/bin/dsm.opt 
Modify /usr/tivoli/tsm/client/ba/bin/dsm.sys 
Modify /var/tsm/inclexcl.dsm 
Register the target machine name with TSM 

The config files are thus "dsm.opt" and "dsm.sys"


zd77l06:/usr/tivoli/tsm/client/ba/bin>cat dsm.opt

SErvername      ZTSM01
dateformat      4
compressalways  no
followsymbolic  yes
numberformat    5
subdir          yes
timeformat      1


zd77l06:/usr/tivoli/tsm/client/ba/bin>cat dsm.sys

SErvername            ZTSM01
   COMMmethod         TCPip
   TCPPort            1500
   TCPServeraddress   cca-tsm01.ao.nl.abnamro.com
   HTTPPort           1581
   PASSWORDACCESS     GENERATE
   schedmode          PROMPTED
   nodename           zd77l06
   compression        yes
   SCHEDLogretention  7
   ERRORLogretention  7
   ERRORLogname       /beheer/log/tsm/dsmerror.log
   SCHEDLogname       /beheer/log/tsm/dsmsched.log

On HP these directories could be /opt/tivoli/tsm/client/ba/bin/dsm.opt



If you need to exclude a filesystem in the backup run, you can edit dsm.sys and put in an exclude statement
like in the following example:

SErvername            ZTSM01
Exclude "/data/documentum/dmadmin/*"
   COMMmethod         TCPip
   TCPPort            1500
   TCPServeraddress   cca-tsm01.ao.nl.abnamro.com
   HTTPPort           1581
   PASSWORDACCESS     GENERATE
   schedmode          PROMPTED
   nodename           zd110l14
   compression        yes
   SCHEDLogretention  7
   ERRORLogretention  7
   ERRORLogname       /beheer/log/tsm/dsmerror.log
   SCHEDLogname       /beheer/log/tsm/dsmsched.log





81.2 Examples of the dsmc command:
==================================

To view schedules that are defined for your client node, enter: 
# dsmc query schedule
# dsmc q ses

To change the password:
# /usr/bin/dsmc set password 

To make a test incr backup of /tmp now, to see if backups work:
# /usr/bin/dsmc inc /tmp
/usr/bin/dsmc inc /home/se1223

To see what the mksysb backup status is:
# dsmc q archive /mkcd/mksysb.img



81.3 Restore with dsmc:
=======================

-- Example 1:
To restore a file /a/b to /c/d :

# dsmc restore /a/b /c/d

-- Example 2:
Restore the most recent backup version of the /home/monnett/h1.doc file, even if the backup is inactive.

# dsmc restore /home/monnett/h1.doc -latest

-- Example 3:
Display a list of active and inactive backup versions of files from which you can select versions to restore.

# dsmc restore "/user/project/*"-pick -inactive

-- Example 4:
Restore the files in the /home file system and all of its subdirectories.

# dsmc restore /home/ -subdir=yes

-- Example 5:
Restore all files in the /home/mydir directory to their state as of 1:00 PM on August 17, 2002.

# dsmc restore -pitd=8/17/2002 -pitt=13:00:00 /home/mydir/

-- Example 6:
Restore all files from the /home/projecta directory that end with .bak to the /home/projectn/ directory.

# dsmc restore "/home/projecta/*.bak" /home/projectn/

# dsmc restore /data/documentum/dmadmin/backup1008/*
# dsmc restore /data/documentum/dmadmin/backup1608/*

small script for scheduling a restore job from cron:

!#/usr/bin/ksh

echo "Starting backup at: " >> /beheer/adhoc_restore.log
date >> /beheer/adhoc_restore.log

cd /data/documentum/dmadmin
dsmc restore /data/documentum/dmadmin/backup_3011/*

echo "End of backup at: " >> /beheer/adhoc_restore.log
date >> /beheer/adhoc_restore.log







-- Example 7:

- Use of FROMDate=date                                                      
Specify a beginning date for filtering backup versions. Do not  
restore files that were backed up before this date.             
                                                                        
You can use this option with the TODATE option to create a time 
window for backup versions. You can list files that were backed 
up between two dates.                                           
                                                                        
For example, to restore all the files that you backed up from   
the /home/case directory from April 7, 1995 through April 14,   
1995, enter:                                                    
                                                                        
# REStore "/home/case/" -FROMDate=04/07/1995 -TODate=04/14/1995 

As another example, to restore backup versions of files that were created during 
the last week of March 1995 for the files in the /home/jones directory, enter: 

# Restore -FROMDate=03/26/1995 -TODate=03/31/1995 /home/jones
                                                                        
The date must be in the format you select with the DATEFORMAT   
option. For example, the date for date format 1 is mm/dd/yyyy,  
where mm is month; dd is day; and yyyy is year. If you include  
DATEFORMAT with the command, it must precede FROMDATE and       
TODATE.                                                         
                                          
- Use of TODate=date                                                        
Specify an end date for filtering backup versions. ADSM does not
restore backup versions that were backed up after this date.    
                                                                        
You can use this option with the FROMDATE option to create a    
time window for backups. You can restore backup versions that   
were backed up between two dates                                
                                                                        
For example, to restore files from the /home/case directory that
were backed up between April 7, 1995 and April 14, 1995, enter  
the following:                                                  
                                                                        
res "/home/case/" -FROMDate=04/07/1995 -TODate=4/14/1995      
                                                                        
The date must be in the format you select with the DATEFORMAT   
option. For example, the date for date format 1 is mm/dd/yyyy,  
where mm is month; dd is day; and yyyy is year. If you include  
DATEFORMAT with the command, it must precede FROMDATE and       
TODATE.                                       


To start the clients Graphical User Interface enter dsm. The TSM GUI appears.



Example in Dutch:
-----------------

Bestanden terughalen van de backup.
Elke nacht worden er backup's gemaakt van alle data. Als U, om wat voor een reden dan ook, een bestand 
kwijt bent geraakt, kunt U dit zelf terughalen van de backup als er aan de volgende voorwaarden wordt voldaan: 

-het bestand heeft minstens 1 nacht op de server gestaan 
-U bent het minder dan 30 dagen geleden kwijtgeraakt 

Files die u dezelfde dag op het systeem zet en weer weggooid, kunt u niet met een restore terughalen! 

Om een bestand terug te halen (restore) gaat U als volgt te werk: 
Log in op de machine en tik op de commando prompt: 

$ dsmc restore -ina -subdir=yes "file-selectie"

waarbij voor "file-selectie" de filenaam opgegeven wordt van het bestand dat U kwijt bent. Weet U de naam 
niet meer precies tik dan in: 

dsmc restore -ina -subdir=yes "*"

met de "'s. 

--Een voorbeeld
Voorbeeld van een restore 

De voorbeeld file: 

-rw-rw-r--    1 faq  faq      269 Mar 18 16:12 testfile

File (per ongeluk) weggooien: 

$ rm testfile

Controle op het (niet meer) bestaan van de file: 

$ ls -l testfile
ls: testfile: No such file or directory

Starten van een restore: 

$ dsmc restore -pick "test*"

TSM Scrollable PICK Window - Restore

    #    Backup Date/Time        File Size A/I  File
---------------------------------------------------------------
 x  1. | 19-03-2002 03:13:22        269  B  A   /www/faq/testfile
       |
       |
       |
       |
       0---------10--------20--------30--------40--------50--------60--
<U>=Up  <D>=Down  <T>=Top  <B>=Bottom  <R#>=Right  <L#>=Left
=Goto Line #  <#>=Toggle Entry  <+>=Select All  <->=Deselect All
<#:#+>=Select A Range <#:#->=Deselect A Range  <O>=Ok  <C>=Cancel
pick>

Selecteer de te restoren file, in dit geval kiezen we voor testfile 

pick> 1

TSM Scrollable PICK Window - Restore

     #    Backup Date/Time        File Size A/I  File
        ---------------------------------------------------------------
x    1. | 19-03-2002 03:13:22        269  B  A   /www/faq/testfile
        |
        |
        |
        |
        0---------10--------20--------30--------40--------50--------60--
<U>=Up  <D>=Down  <T>=Top  <B>=Bottom  <R#>=Right  <L#>=Left
=Goto Line #  <#>=Toggle Entry  <+>=Select All  <->=Deselect All
<#:#+>=Select A Range <#:#->=Deselect A Range  <O>=Ok  <C>=Cancel
pick>

Bevestig met OK 

pick> o

 ** Interrupted **
ANS1114I Waiting for mount of offline media.

Opmerking: wachttijd is afhankelijk van de drukte op de TSM-server 

Restoring             269 /www/faq/testfile [Done]

Restore processing finished.

Total number of objects restored:         1
Total number of objects failed:           0
Total number of bytes transferred:      296  B
Data transfer time:                    0.00 sec
Network data transfer rate:        19,270.83 KB/sec
Aggregate data transfer rate:          0.00 KB/sec
Elapsed processing time:           00:04:05

Controle op het bestaan van de file: 


$ ls -l testfile
-rw-rw-r--    1 faq      faq           269 Mar 18 16:12 testfile

De (test-)file is weer te gebruiken. 


Example:
--------


- Backup von Dateien:
Mittels des Befehls:

C:>dsmc incremental -subdir=yes "d:\*.*"

kann beispielsweise der Inhalt der ganzen Festplatte (oder Partition) d:\ Ihres PCs gesichert werden. 
Analog lassen sich Unterverzeichnisse derselben Platte sichern. Hier noch ein Beispiel unter dem Betriebssystem Unix:

# dsmc incremental -subdir=no /usr/users/holo

Dieser Befehl sichert alle Dateien in dem Verzeichnis /usr/users/holo .

- Anzeige von mittels Backup gesicherten Dateien:
Um herauszufinden welche Dateien durch Backup im TSM Server gespeichert sind, gibt es folgendes Kommando:

dsmc query backup -subdir=yes "*"

womit Sie alle Ihre gesicherten Dateien aufgelistet bekommen. Die Option -inactive gestattet zus,tzlich 
das Auflisten aller gespeicherten fr_heren Versionen Ihrer Dateien. Haben Sie z.B. unter Unix im Verzeichnis 
/u/holo/briefe/1997 durch Znderungen mehrere Versionen Ihrer Briefe konferenz97.tex abgespeichert, 
so bekommen Sie eine Liste aller Dateien durch: 

# dsmc query backup -inactive \ /u/holo/briefe/1997/konferenz97.tex

Der obige \ ist die Unix Zeilenfortsetzung.

- Restore von Dateien:
Es ist passiert: Sie haben sich eine Datei die Sie sp,ter ben"tigen gel"scht. Mittels des Kommandos restore 
k"nnen Sie Dateien die mittels Backup gesichert wurden, wiederherstellen.

Der Befehl:

# dsmc restore -subdir=yes "/u/holo/briefe/*"

speichert alle aktiven Dateien im Unterverzeichnis /u/holo/briefe wieder zur_ck. Falls Sie dabei existierende 
Dateien _berschreiben wollen, so k"nnen Sie die Option

-replace=yes

angeben.

Es ist auch m"glich ein Zeitintervall anzugeben, in welchem die Sicherung erfolgt sein mu , um so auf ,ltere 
Versionen zur_ckzugehen (die z.B. noch frei von Computerviren sind):

dsmc restore -replace=yes -todate=1997-12-22 d:\*.doc




81.4 Oracle and TSM: TDPO:
==========================

RMAN and Tivoli Data Protection for Oracle
The Oracle Recovery Manager provides consistent and secure backup, restore, and recovery performance 
for Oracle databases. While the Oracle RMAN initiates a backup or restore, TDP for Oracle acts as the 
interface to the Tivoli Storage Manager server (Version 4.1.0 or later). The Tivoli Storage Manager server 
then applies administrator-defined storage management policies to the data. TDP for Oracle Version 2.2.1 
implements Oracle defined Media Management API 2.0, which interfaces with RMAN for backup and restore 
operations and translates Oracle commands into Tivoli Storage Manager API calls to the Tivoli Storage Manager server. 

With the use of RMAN, TDP for Oracle allows you to perform the following functions: 

Full backup function for the following while online or offline: 
-databases 
-tablespaces 
-datafiles 
-archive log files 
-control files 
-Full database restore while offline 
-Tablespace and datafile restore while online or offline 

TDPO.OPT File 
This feature provides a centralized place to define all the options needed by RMAN for TDP for Oracle backup 
and restore operations. This eliminates the need to specify environment variables for each session, 
thereby reducing the potential for human error. This also simplifies the establishment of multiple sessions. 

The Data Protection for Oracle options file, tdpo.opt, contains options that determine the behavior and performance 
of Data Protection for Oracle. The only environment variable Data Protection for Oracle Version 5.2 recognizes 
within an RMAN script is the fully qualified path name to the tdpo.opt file. Therefore, some RMAN scripts may need 
to be edited to use TDPO_OPTFILE=fully qualified path and file name of options file variable in place of other 
environment variables. For example:

allocate channel t1 type 'sbt_tape' parms 
       'ENV=(TDPO_OPTFILE=/home/rman/scripts/tdpo.opt)'See Scripts for further information.

If a fully qualified path name is not provided, Data Protection for Oracle uses the tdpo.opt file located 
in the Data Protection for Oracle default installation directory. If this file does not exist, 
Data Protection for Oracle proceeds with default values.



Note 1:
-------

Installing Data Protection for Oracle 5.1
This chapter provides information on the required client environment for Data Protection for Oracle 
and instructions on installing Data Protection for Oracle.

Make sure these conditions exist before installing Data Protection for Oracle:

- Tivoli Storage Manager Server Version 5.1.0 (or later) is configured. 
- Tivoli Storage Manager API Version 5.1.5 (or later) is installed. This version of the 
  Tivoli Storage Manager API is included in the Data Protection for Oracle product media.

Attention: A root user must install the Tivoli Storage Manager API before installing Data Protection 
for Oracle on the workstation where the target database resides.
After Data Protection for Oracle is installed, you must perform the following configuration tasks:

-Define Data Protection for Oracle options in the tdpo.opt file. 
-Register the Data Protection for Oracle node to a Tivoli Storage Manager Server. 
-Define Tivoli Storage Manager options in the dsm.opt and dsm.sys files. 
-Define Tivoli Storage Manager policy requirements. 
-Initialize the password with a Tivoli Storage Manager Server.

See Configuring Data Protection for Oracle for detailed task instructions.

Note 2:
-------

New Password File Generation

Data Protection for Oracle uses a new password utility for 
password generation and maintenance.  The new TDPO configuration
utility, 'tdpoconf' replaces the previous executable 'aobpswd'.
To generate or update a password, invoke 'tdpoconf' as follows:

	tdpoconf password [-tdpo_optfile=/mydir/myfile.opt]

Successful execution of 'tdpoconf password' should generate apassword file
with the prefix 'TDPO.' followed by your nodename.

For more information, please see the 'Using the Utilities' section in the
Data Protection for Oracle User's Guide.


Note 3:
-------

thread

Q:

Any good step-by-step docs out there? I just need to get this thing setup and working quickly. 
Don't have time (unless it is my only choice of course) do filter through several manuals to pick out 
the key info... Help if you can - I surly would appreciate it 

A:

What needs to happen is this -- 


1. Get the manuals downloaded.

2. Download the TDP and perform default installations

3. Create a node name for the TDP - i.e TDP_Oracle _&lt;hostname>, register it using a simple password because 
   you are going to need it.

3. Add a stanza in dsm.sys for the TDP, this should be the second or third stanza since the first stanza is 
   for the client.

4. In the TDP installation directory, modify the tdpo.opt file - this is the configuration file for the TDP. 
   This file is self explanatory

5. Execute tdpconf showenv - you should get a response back from tsm.

6. Execute tdpoconf passwd - to create tdpo password - the password file will be created and stored 
   in the TDP home directory. If will have the host name as part of the file name.

7. Once you have gotten this far, in Oracle's home directory - create the dsm.opt file and make sure it contains 
   only one line, the servername line of the newly created stanza. The file needs to be owned by oracle.

8. If you are using tracing, the tdpo.opt file will identify the location.

9. Configure RMAN

10. Test and verify


Note 4:
-------

thread

Q:

I see this question has been asked several times in the list, but I fail
to see any answers on ADSM.ORG.

I'm getting the 
"ANS0263E Either the dsm.sys file was not found, or the Inclexcl file
specified in the dsm.sys was not found" 
error when trying to set the password after installing the 64 bit TDPO
on Solaris 8.

(The 32 bit version installs fine)

Anyone have the fix for this handy?

A:

Dale,
Did you check the basics of, as oracle, or your tdpo user:

   # env | grep DSM

Make sure the DSMI variables point to the right locations, then verify
those files are readable by your user.

If after verifying this, you might want to let us know what version of
oracle, tdpo and tsmc you have on this node.

A:

We had an issue with this and discovered that it was looking in the api
directory for the dsm.sys and not the ba/bin directory so we just put a link
in api to bin and it worked.

A:

You may want to break the link to prevent TDP from using the INCLEXCL file that's 
normally in a dsm.sys file.  If you don't, you'll generate errors.  If linked, and 
commented out, your normal backups
won't have an INCLEXCL file, hence, you'll backup everything on your client server 
during your regular client backup.

Note 5:
-------

http://www-1.ibm.com/support/docview.wss?rs=0&uid=swg24012732

TSM for Databases v5.3.3, Data Protection for Oracle Downloads by Platform 
Downloadable files 
 
Abstract 
Data Protection for Oracle v5.3.3 refresh.  
  
Download Description 
Data Protection for Oracle v5.3.3 refresh.

These packages contains no license file. The customer must already have a Paid version of the package 
to obtain the license file.

These packages contain fixes for APARs IC48436, IC48248, IC48056, IC46968, IC45462, IC43896, IC41501, IC38717, 
IC38681, IC38430, IC38061, IC37459, IC36686, IC36389 
   
Prerequisites 
A Paid version of the Data Protection for Oracle package is required. 
  
Installation Instructions 
See the README.TDPO in the download directory.  
 
 
Note 6:
-------

IC44171: ON AIX, FILESET TIVOLI.TSM.CLIENT.API.32BIT 5.3.0.0 IS A PRE-REQFOR INSTALLING 
TIVOLI.TSM.CLIENT.API.64BIT 5.3.0.0 

APAR status
Closed as documentation error.

Error description 
When installing tivoli.tsm.client.api.64bit 5.3.0.0 on AIX,
tivoli.tsm.client.api.32bit 5.3.0.0 is required as pre-requsite
for the installation. The installation will fail if
tivoli.tsm.client.api.32bit 5.3.0.0 is not avaiable for install.
tivoli.tsm.client.api.32bit 5.3.0.0 is needed because of
languages enhancement in 5.3.
Local fix 
Problem summary 
****************************************************************
* USERS AFFECTED: AIX CLIENTS                                  *
****************************************************************
* PROBLEM DESCRIPTION: API 32bit PREREQ for API 64bit not  in  *
* README.                                                      *
****************************************************************
* RECOMMENDATION: apply next available fix.                    *
****************************************************************
Problem conclusion 
Add info to README files and database.
 

Files in TSM client and tdpo: 

tivoli.tivguid                                                         Y
tivoli.tsm.books.en_US.client.htm                                      Y
tivoli.tsm.books.en_US.client.pdf                                      Y
tivoli.tsm.client.api.32bit                                            Y
tivoli.tsm.client.api.64bit                                            Y
tivoli.tsm.client.ba.32bit.base                                        Y
tivoli.tsm.client.ba.32bit.common                                      Y
tivoli.tsm.client.ba.32bit.image                                       Y
tivoli.tsm.client.ba.32bit.nas                                         Y
tivoli.tsm.client.ba.32bit.web                                         Y
tivoli.tsm.client.oracle.aix.64bit                                     Y
tivoli.tsm.client.oracle.books.htm                                     Y
tivoli.tsm.client.oracle.books.pdf                                     Y
tivoli.tsm.client.oracle.tools.aix.64bit   
 
 

81.5 Other stuff with TSM client, stopping and starting:
========================================================


The TSM scheduler dsmcad:
-------------------------

-- Check if its running:

ps -ef | grep dsm
root 245896      1   0   Jan 08      -  0:24 /usr/tivoli/tsm/client/ba/bin/dsmcad

-- To start the process:

#! /bin/sh
#       Copyright (c) 1989, Silicon Graphics, Inc.
#ident  "$Revision: 1.1 $"

if $IS_ON verbose ; then        # For a verbose startup and shutdown
    ECHO=echo
else                            # For a quiet startup and shutdown
    ECHO=:
fi

state=$1
case $state in

'start')
        set `who -r`
        if [ $8 != "0" ]
        then
                exit
        fi

        $ECHO "Starting dsm schedule:"
        DSM_LOG="/usr/tivoli/tsm/client/ba"
        DSM_CONFIG="/usr/tivoli/tsm/client/ba/bin/dsm.opt"
        DSM_DIR="/usr/tivoli/tsm/client/ba/bin"

        export DSM_LOG DSM_CONFIG DSM_DIR

        if [ -f /usr/tivoli/tsm/client/ba/bin/dsmcad ]; then
                /usr/tivoli/tsm/client/ba/bin/dsmcad > /dev/null 2>&1 &
                if [ $? -eq 0 ]; then
                        $ECHO " done"
                else
                        $ECHO " failed"
                        exit 2
                fi
        else
                echo " failed, no dsm installed"
                exit 3
        fi
        ;;

'stop')
        $ECHO "Stopping dsm schedule:"
        killall dsmcad
        ;;
esac


It is also possible now to start and stop dsmcad using the script. For example :

- To start dsmcad manually run:

/etc/init.d/dsmcad start

- To stop dsmcad run:

/etc/init.d/dsmcad stop

- To restart dsmcad (for example to refresh daemon after dsm.sys or dsm.opt modification)

/etc/init.d/dsmcad restart

- To check if dsmcad is running run:

/etc/init.d/dsmcad status
-or-
ps -ef | grep dsmcad 

Or use:

(root) /sbin/init.d/tsmclient stop
(root) /sbin/init.d/tsmclient start



It could also be implemented in a proprierty way as in the following example:

root@zd111l08:/etc#./rc.dsm stop
dsmcad en scheduler gestopt

root@zd111l08:/etc#./rc.dsm start

root@zd111l08:/etc#IBM Tivoli Storage Manager
Command Line Backup/Archive Client Interface - Version 5, Release 2, Level 3.0
(c) Copyright by IBM Corporation and other(s) 1990, 2004. All Rights Reserved.

Querying server for next scheduled event.
Node Name: ZD111L08
Session established with server ZTSM05: AIX-RS/6000
  Server Version 5, Release 2, Level 4.5
  Server date/time: 06.08.2007 08:19:13  Last access: 30.07.2007 14:54:54

Next operation scheduled:
------------------------------------------------------------
Schedule Name:         DAG01U_ALG_CCAZ
Action:                Incremental
Objects:
Options:               -su=yes
Server Window Start:   01:00:00 on 07.08.2007
------------------------------------------------------------
Waiting to be contacted by the server.




81.6 TSCM: Tivoli Security Compliance Manager :
----------------------------------------------



-- client.pref:

The client.pref configuration file contains configuration parameters for the Tivoli Security Compliance Manager 
client and is located in the client installation directory.

The default installation directories are:

UNIX 
/opt/IBM/SCM/client 


-- stopping and starting the client:


Stopping the client on UNIX systems
Note:
You must log in as the root user to complete this task.
To stop the client component on a UNIX system:

Open a command shell. 
Go to the directory where the client component is installed. 
The default installation directory is /opt/IBM/SCM/client.

Enter the following command: 
./jacclient stop

To start it, use
./jacclient start


-- Show status of the client:

./jacclient status
HCVIN0033I The Tivoli Security Compliance Manager client is currently running.


-- Start of the client on boot:

On AIX, the client is started from inittab:

#cat /etc/inittab | grep jac
ibmscmcli:2:once:/opt/IBM/SCM/client/jacclient start >/dev/console 2>&1 # Start Security Compliance Manager Client


81.7 ANS1005E TCP/IP read error on socket xx errono=73:
-------------------------------------------------------

ANS1005E TCP/IP read error on socket xx errono=73 
 Technote (FAQ) 
  
Problem 
ANS1005E TCP/IP read error on socket = 6, errno = 73, reason: 'A connection with a remote socket was reset 
by that socket.'.  
  
Cause 
The same ANR1005E message with errno 10054 is well-documented, but very little documentation exists for errno 73  
  
Solution 
ANS1005E TCP/IP read error on socket = 6, errno = 73, reason: 'A connection with a remote socket was reset 
by that socket.'.

The errno 73 seen in the message above indicates that the connection was reset by the peer, usually an indication 
that the session was cancelled or terminated on the TSM Server. In all likelihood these sessions were terminated 
on the server because they were in an idle wait for a period of time exceeding the idletimeout value on 
the TSM Server. We see that the sessions successfully reconnected and no further errors were seen. 
Sessions sitting in an idle wait is not uncommon and is frequently seen when backing up large amounts of data. 
With multi-threaded clients, some sessions are responsible for querying the server to identify which files 
are eligible to be backed up (producer sessions) while the other sessions are responsible for the actual transfer 
of data (consumer sessions). It usually takes longer to backup files across the network than it takes for a list 
of eligible files to be generated. Once the producer sessions have completed building lists of eligible files 
they will sit idle while the producer sessions actually backup these files to the TSM Server. After some time, 
the TSM Server will terminate the producer sessions because they have been idle for a period of time longer 
than the IDLETIMEOUT value specified on the server. 


Many times this issue can be seen in firewall environment and has been seen with network DNS problems and/or network 
config problems. One of the most common is when a passive device (router, switch, hub, etc.) is in between the 
client & the server. If the port on the passive device is set to Auto-Negotiate, it will automatically defer 
to the active device (the NIC in the client) to set the connection speed. If the NIC is also set to Auto-Negotiate 
(default in most OS's) this often causes excessive delays and interruptions in connectivity. This is because the NIC 
is looking to the network appliance to set the connection speed and vice-versa, so it takes some time before 
the network device will find a suitable connection speed (not always optimal, just suitable) and begin data transfer. 
This repeats every time a data packet is sent across the network. While the negotiating period is relatively short 
by human standards (usually in the nanosecond range) it adds up over time when trying to send a large amount 
of data at a high speed and causes the connection to be broken. The best workaround for that is to hard code 
both the NIC and the network port for a specific setting. This is usually 100Mb Full Duplex for a standard 
CAT-5 copper connection, although older equipment may require reconfiguration of 10/100 NICs to allow for that speed.

The other possible workaround for this issue is to estimate the file transfer time and increase the IDLETIMEOUT 
to a level higher than that time.
 



=========
82. LDAP:
=========


82.1: Introduction:
===================

The Lightweight Directory Access Protocol, better known as LDAP, is based on the X.500 standard, but is significantly 
simpler and more readily adapted to meet custom needs. Unlike X.500, LDAP supports TCP/IP, which is necessary 
for Internet access. The core LDAP specifications are all defined in RFCs.

Associated with LDAP, there must be an "information store" somewhere on Server(s), that holds objects
and all their related properties, like useraccounts and all properties belonging to those accounts.

Strictly speaking, though, LDAP isn't a database at all, but a protocol used to access information stored 
in an information directory (also known as an LDAP directory). 
So, the protocol does not make any assumptions on the actual type or sort of database which is involved,
but it does specify how to describe objects, classes, properties and how to retrieve and store
this information.

The socalled "schema" specifies all object classes and properties.

LDAP directory servers store their data hierarchically. If you've seen the top-down representations of 
DNS trees or UNIX file directories, an LDAP directory structure will be familiar ground. As with DNS host names, 
an LDAP directory record's Distinguished Name (DN for short) is read from the individual entry, 
backwards through the tree, up to the top level.
It's just a "way" to represent a LDAP entry (or record). It has a Distinguished Name (DN) that fully
and uniquely describes the object in the "tree", similar to a file in a subdirectory on a filesystem.

-- What's in a name? The DN of an LDAP entry:
 
All entries stored in an LDAP directory have a unique "Distinguished Name," or DN. The DN for each LDAP entry 
is composed of two parts: the Relative Distinguished Name (RDN) and the location within the LDAP directory 
where the record resides. 

Some people like to refer to "container objects", holding other objects, and "leaf objects" that are endpoints
in the tree. Containers are mostly referred to as "Organizational Units" or OU's.
OU's are completely compairable to the domain components (dc's) of a fully qualified Domain Name.

Compared to a filesystem, an OU looks a lot like a directory.


Some attributes of an object are required, while other attributes are optional. An objectclass definition 
sets which attributes are required and which are not for each entry.

An object is represented (or can be found) by listing all ou's or dc's until you have reached the "endpoint":

-- Example 1:

OU=com.Ou=shell.OU=research.harry   or better according to convention:  harry.OU=research.OU=shell.OU=com

There are quite a few implementations that describe objects. For example, in Novell NDS, a user's Distinguished Name
might be like the following example:

-- Example 2:

CN=jdoe.OU=hrs.O=ADMN

or abbreviated to

jdoe.hrs.admn

Note: In Novell NDS, the toplevel OU is called an Organization, or just "O".
As another example: CN=lpIII.OU=development.OU=engineering.O=VerySmallCompany

-- Example 3:

An object could also be described as in the following example.

cn=Oatmeal Deluxe,ou=recipes,dc=foobar,dc=com 

Which means: In com, then in foobar, then in recipes, we can find the object "Oatmeal Deluxe". 


An LDAP Server is capable of propagating its directories to other LDAP servers throughout the world, 
providing global access to information. Currently, however, LDAP is more commonly used within 
individual organizations, like universities, government departments, and private companies. 

LDAP is a client-server system. The server can use a variety of databases to store a directory, 
each optimized for quick and copious read operations. When an LDAP client application connects to an LDAP server 
it can either query a directory or upload information to it. In the event of a query, the server either answers 
the query or, if it can not answer locally, it can refer the query upstream to a higher level LDAP server 
which does have the answer. If the client application is attempting to upload information to an LDAP directory, 
the server verifies that the user has permission to make the change and then adds or updates the information. 


Note:
LDAP processes listen per default on port 389.



82.2 API's dealing with LDAP:
=============================

-- Programming:
-- ------------

Almost all programming languages, have libraries or modules to access a LDAP Directory Server, and
to query, or add, delete or modify objects.

-- Utilities:
-- ----------

Also, on many platforms, commandline utilities exist which can do manipulation of objects.
In this case, in many implementations, LDIF files can be used.

Lightweight Directory Interchange Format (LDIF) is a text-based format used to describe and modify,
add, and delete--directory entries. In the latter capacity, it provides input 
to command-line utilities.

The two LDIF files immediately following represent a directory entry for a printer. 
The string in the first line of each entry is the entry's name, called a distinguished name. 
The difference between the files is that the first describes the entry--that is, the format is an index 
of the information that the entry contains. The second, when used as input to the command-line utility, 
adds information about the speed of the printer.

Description

dn: cn=LaserPrinter1, ou=Devices, dc=acme,dc=com
objectclass: top
objectclass: printer
objectclass: epsonPrinter
cn: LaserPrinter1
resolution: 600
description: In room 407


Modification

dn: cn=LaserPrinter1, ou=Devices, dc=acme, dc=com
changetype: modify
add: pagesPerMinute
pagesPerMinute: 6


As a few examples in programming languages, consider the following examples:


Java example:
-------------

Listing 1 shows a simple JNDI program that will print out the cn attributes of all the Person type objects 
on your console. 


Listing 1. SimpleLDAPClient.java

                        public class SimpleLDAPClient {
    public static void main(String[] args) {
        Hashtable env = new Hashtable();

        env.put(Context.INITIAL_CONTEXT_FACTORY,"com.sun.jndi.ldap.LdapCtxFactory");
        env.put(Context.PROVIDER_URL, "ldap://localhost:10389/ou=system");
        env.put(Context.SECURITY_AUTHENTICATION, "simple");
        env.put(Context.SECURITY_PRINCIPAL, "uid=admin,ou=system");
        env.put(Context.SECURITY_CREDENTIALS, "secret");
        DirContext ctx = null;
        NamingEnumeration results = null;
        try {
            ctx = new InitialDirContext(env);
            SearchControls controls = new SearchControls();
            controls.setSearchScope(SearchControls.SUBTREE_SCOPE);
            results = ctx.search("", "(objectclass=person)", controls);
            while (results.hasMore()) {
                SearchResult searchResult = (SearchResult) results.next();
                Attributes attributes = searchResult.getAttributes();
                Attribute attr = attributes.get("cn");
                String cn = (String) attr.get();
                System.out.println(" Person Common Name = " + cn);
            }
        } catch (NamingException e) {
            throw new RuntimeException(e);
        } finally {
            if (results != null) {
                try {
                    results.close();
                } catch (Exception e) {
                }
            }
            if (ctx != null) {
                try {
                    ctx.close();
                } catch (Exception e) {
                }
            }
        }
    }
}

                     
VB.Net example:
---------------


 'To retrieve list of all  LDAP users 

 'This function returns HashTable
 _ldapServerName = ldapServerName

 Dim sServerName As String = "mail"

 Dim oRoot As DirectoryEntry = New DirectoryEntry("LDAP://" & ldapServerName & _
       "/ou=People,dc=mydomainname,dc=com")
 
 Dim oSearcher As DirectorySearcher = New DirectorySearcher(oRoot)
 Dim oResults As SearchResultCollection
 Dim oResult As SearchResult
 Dim RetArray As New Hashtable()

 Try

  oSearcher.PropertiesToLoad.Add("uid")
  oSearcher.PropertiesToLoad.Add("givenname")
  oSearcher.PropertiesToLoad.Add("cn")
  oResults = oSearcher.FindAll     

  For Each oResult In oResults

   If Not oResult.GetDirectoryEntry().Properties("cn").Value = "" Then
    RetArray.Add( oResult.GetDirectoryEntry().Properties("uid").Value, _
      oResult.GetDirectoryEntry().Properties("cn").Value)
   End If

  Next

 Catch e As Exception

  'MsgBox("Error is " & e.Message)
  Return RetArray

 End Try

 Return RetArray
  
 End Function</PRE>


Some frgaments in C++:
----------------------

Establishing an LDAP Connection:

LDAPConnection lc("localhost");
try {  
      lc.bind("cn=user,dc=example,dc=org","secret");
    } catch (LDAPException e) {  
    std::cerr << "Bind failed: " << e   << std::endl;


Create user in VB:
------------------

' From the book "Active Directory, Third Edition" 
' ISBN: 0-596-10173-2

Dim objParent As New DirectoryEntry("LDAP://ou=sales,dc=mycorp,dc=com", _
                                    "administrator@mycorp.com",_
                                    "MyPassword", _
                                    AuthenticationTypes.Secure)
Dim objChild As DirectoryEntry = objParent.Children.Add("cn=jdoe", "user")
objChild.Properties("sAMAccountName").Add("jdoe?)
objChild.CommitChanges()

objChild.NativeObject.AccountDisabled = False
objChild.CommitChanges()

Console.WriteLine("Added user")




82.3 Implementing LDAP on AIX:
==============================

IBM Directory Server needs to be configured to support user authentication through LDAP with both the 
AIX specific schema and the RFC 2307 schema on AIX.

Also on AIX, we have a client - Server relationship. Any LDAP client can be authenticated by the
LDAP Server. 

When users log in, the LDAP client sends a query to the LDAP server to get the user and group information 
from the centralized database. DB2r is a database used for storing the user and group information. 
The LDAP database stores and retrieves information based on a hierarchical structure of entries, 
each with its own distinguishing name, type, and attributes. The attributes (properties) define 
acceptable values for the entry. An LDAP database can store and maintain entries for many users. 

An LDAP security load module was implemented as from AIX Version 4.3. This load module provides  
user authentication and centralized user and group management functions through the IBM SecureWayr Directory. 
A user defined on an LDAP server can be configured to log in to an LDAP client even if that user 
is not defined locally. The AIX LDAP load module is fully integrated with the AIX operating system


82.3.1 Configuration of IBM Directory Server:
---------------------------------------------

http://www.ibm.com/developerworks/aix/library/au-ldapconfg/index.html?ca=drs-


IBM Directory Server on AIX can be configured with either: 

- The ldapcfg command line tool 
- The graphical version of the ldapcfg tool, called ldapxcfg 
- The mksecldap command 

The following file sets are required to configure IBM Directory Server: 

- "ldap.server" file sets 
- DB2, the back-end database software that is required by the IBM Directory Server 

AIX provides the mksecldap command to set up the IBM Directory servers and clients to exploit the servers.

The mksecldap command performs the following tasks for the new server setup: 

. Creates the ldapdb2 default DB2 instance. 
. Creates the ldapdb2 default DB2 database. 
. Creates the AIX tree DN (suffix) under which AIX user and group is stored. 
. Exports users and groups from security database files of the local host into the LDAP database. 
. Sets LDAP server administrator DN and password. 
. Optionally sets server to use Secure Sockets Layer (SSL) communication. 
. Installs the /usr/ccs/lib/libsecldapaudit, an AIX audit plug-in for the LDAP server. 
. Starts the LDAP server after all the above is done. 
. Adds the LDAP server entry (slapd) to /etc/inittab for automatic restart after reboot. 

Example of how to setup the LDAP Server:

# mksecldap -s -a cn=admin -p passwd -S rfc2307aix


82.3.1 Configuration of an AIX LDAP Client:
-------------------------------------------

The "ldap.client" file set contains the IBM Directory client libraries, header files, and utilities. 
You can use the mksecldap command to configure the AIX client against the IBM Directory Server, 
as in the following example:

# mksecldap -c -h <LDAP Server name> -a cn=admin -p adminpwd -S rfc2307aix

You must have the IBM Directory Server administrator DN and password to configure the AIX client. 
Once the AIX client is configured, the secldapclntd daemon starts running. Once the AIX client is configured 
against the IBM Directory Server, change the SYSTEM attribute in "/etc/security/user" file to LDAP OR compat 
or compat or LDAP to authenticate users against the AIX client system.

The "/usr/lib/security/methods.cfg" file contains the load module definition. The mksecldap command adds 
the following stanza to enable the LDAP load module during the client setup.

XX
LDAP:
	program = /usr/lib/security/LDAP
	program_64 = /usr/lib/security/LDAP64
 


The "/etc/security/ldap/ldap.cfg" file on the client machine has configuration information for the 
secldapclntd client daemon. This configuration file contains information about the IBM Directory Server name, 
binddn, and password information. The file is automatically updated by the mksecldap command during AIX client setup. 

The auth_type attribute in the /etc/security/ldap/ldap.cfg file specifies where the user needs to be authenticated. 
If the auth_type attribute is UNIX_AUTH, then the user is authenticated at the client system. If it is ldap_auth, 
then the user is authenticated on IBM Directory Server. 


82.3.3 LDAP utilities:
----------------------

Using LDAP Tools on Linux, Solaris, AIX, or HP-UX


ldapadd      -  Adds new entries to an LDAP directory.
 
ldapdelete   - Deletes entries from an LDAP directory server. The ldapdelete tool opens a connection 
               to an LDAP server, binds, and deletes one or more entries.
 
ldapmodify   - Opens a connection to an LDAP server, binds, and modifies or adds entries.
 
ldapmodrdn   - Modifies the relative distinguished name (RDN) of entries in an LDAP directory server. 
               Opens a connection to an LDAP server, binds, and modifies the RDN of entries. 
 
ldapsearch   - Searches entries in an LDAP directory server. Opens a connection to an LDAP server, binds, 
               and performs a search using the specified filter. The filter should conform to the string representation 
 

ldapcfg utility:
----------------

Using the ldapcfg utility:

The ldapcfg utility is a command-line tool that you can use to configure IBM Tivoli Directory Server. 
You can use ldapcfg instead of the Configuration Tool for the following tasks:

- Setting the administrator DN and password. See Setting the administrator DN and password for instructions. 
- Configuring a database. See Configuring the database for instructions. 
- Changing the password of the DB2 administrator in the server configuration file.  
- Enabling the change log. See Enabling the change log for instructions. 
- Adding a suffix. 


1. Setting the administrator DN and password

To define the administrator DN and password, type the following at a command prompt: 

# ldapcfg -u "adminDN" -p password

where 

adminDN is the administrator DN you want. 
password is the password for the administrator DN. 

Note:
Double byte character set (DBCS) characters in the password are not supported.
For example:

# ldapcfg -u "cn=root" -p secret

Note:
Do not use single quotation marks (') to define DNs with spaces in them. They are not interpreted correctly.
To accept the default administrator DN of cn=root and define a password, type the following command 
at a command prompt: 

# ldapcfg -p password
where password is the password for the administrator DN.

For example:

# ldapcfg -p secret



2. Configuring the database

When you configure the database, you must always specify a user ID and password on the command line. 
The instance name is, by default, the same as the user ID. The user ID must already exist and must meet 
certain requirements. If you want a different instance name you can specify it using the -t option. 
This name must also be an existing user ID that meets certain requirements. 
See Before you configure: creating the DB2 database owner and database instance owner for information about 
these requirements on both Windows and UNIX platforms.

Attention:
Before configuring the database, be sure that the environment variable DB2COMM is not set. 
Be sure to read this section before you use the ldapcfg command. Some options (such as -f and -s) have changed. 
Unpredictable results will occur if you use them incorrectly or as they were used in previous releases. 
The server must be stopped before you configure the database. 
To configure a database, the following options are available: 

-l location 
Specifies the location of the DB2 database. For UNIX systems, this is a directory name such as /home/ldapdb. 
For Windows systems, this is a drive letter such as C: 
-a id 
Specifies the DB2 administrator ID. 
-c 
Creates a database in UTF-8 format. (The default, if you do not specify this option, is to create a database 
that is in the local code page.) 
-i 
Destroys any instance currently configured with IBM Tivoli Directory Server. All databases associated with the 
instance are also destroyed. 
-w password 
Specifies the DB2 administrator password. 

Note:
The ldapcfg -w password command no longer changes the system password of the database owner. It only updates 
the ibmslapd.conf file. See Changing the DB2 administrator password for information about using the -w option alone.

-d database 
Specifies the DB2 database name. 

-t dbinstance 
Specifies the database instance. If you do not specify an instance, the instance name is the same as the 
DB2 administrator ID. 
-o 
Overwrites the database if one already exists. By default, the database being overwritten is not deleted. 
-r 
Destroys any database currently configured with IBM Tivoli Directory Server. 
-f 
Specifies the full path of a file to redirect output into. If used in conjunction with the -q option, 
only errors will be sent to the file. 
-q 
Runs in quiet mode. All output is suppressed except for errors. 
-n 
Runs in no prompt mode. All output is generated except for messages requiring user interaction. 

To configure a database on /home/ldapdb2 with a DB2 administrator name of db2admin, a password of mypassword, 
and a database name of dbName when there is not an existing database configured (that is, the first time), the command is: 

# ldapcfg -l /home/ldapdb2 -a db2admin -w mypassword -d dbName

To configure a database on /home/ldapdb2 with a DB2 administrator name of db2admin, a password of mypassword, 
a database name of dbName, and an instance name of dbInstance when there is not an existing database configured 
(that is, the first time), the command is: 

# ldapcfg -l /home/ldapdb2 -a db2admin -w mypassword -d dbName -t dbInstance

To configure a database on /home/ldapdb2 when a database is already configured and you want to overwrite it, 
the command is: 

# ldapcfg -l /home/ldapdb2 -a db2admin -w mypassword -d dbName -o


3. Changing the DB2 administrator password

If you change the password for the DB2 administrator through the operating system, you must also change it 
using ldapcfg with the -w option. This changes the password in the server configuration file. Similarly, 
if you change the password for the DB2 administrator with the ldapcfg command, you must also change it through 
the operating system.

To change the DB2 administrator password to newpassword, type the following command:


ldapcfg -w newpassword

Note:
Double byte character set (DBCS) characters in the password are not supported.


userid='sidnsl2'



Notes:
------


Note 1:
-------

http://www-03.ibm.com/systems/p/os/aix/whitepapers/ldap_client.html

Summary of the above paper:

AIX first implemented a LDAP security load module in version 4.32. The implementation worked well in a 
uniform AIX environment. However, users have found it hard to configure AIX systems to work with third party 
LDAP servers. This shortcoming is primarily the result of the proprietary schema used by AIX1.

Since AIX 5LT version 5.2, AIX supports the schema defined in RFC 2307 which is widely used among IBM peers 
and which is becoming the industry standard for network entities. The schema defines attributes and object classes 
for such entities as users, groups, networks, services, hosts, protocols, rpc, etc.3. 
The RFC 2307 schema is often referred to as the nisSchema. Both of these terms are used interchangeably 
in this paper.

Client support for the nisSchema in AIX is part of Configurable Schema Support Mechanism (CSSM), 
which is a bigger effort to support arbitrary schema. With CSSM, AIX systems can be configured to support 
LDAP directory servers using any schema. At present, CSSM is implemented for users and groups only.

Configuring AIX to do naming lookup through LDAP for network entities, including users and groups, 
is also implemented in AIX 5L v5.2. However, this paper deals only with issues related to user authentication and 
user/group management through LDAP. Naming lookup services for other network entities is addressed in a separate paper. 

This paper addresses only client configuration. Section 2 introduces the major components and their 
functionality in an AIX LDAPclient system. Section 3 gives step-by-step instruction on configuring 
an AIX client system. In Section 4, detailed behaviors and new features of the AIX LDAP client, 
including CSSM are presented and discussed. System management in respect of the LDAP load module and 
detailed steps to enable LDAP user authentication are given in Section 5.


Note 2:
-------

http://www.redbooks.ibm.com/abstracts/sg247165.html

Summary of the above Redbook:

This IBM Redbook is a technical planning reference for IT organizations that are adding AIX 5L clients 
to an existing LDAP authentication and user management environment. It presents integration scenarios 
for the AIX 5L LDAP client with IBM Tivoli Directory Server, the Sun ONE Directory Server, 
and Microsoft Active Directory.


Note 3:
-------

thread

Q:

All-
>
> Having a problem installing a DB2 client on a machine running AIX
> version 5.0. Client appeared to install one time succesfully, then
> was uninstalled and a reinstall was attempted. For some reasons, it
> does not complete the reinstall. See the status report from the GUI
> installer at the end of this note. Errors are towards the bottom.
> Everything installed in /usr/opt for DB2 but the sqllib folder that is
> supposed to be created in the home directory of the instance ownder is
> not installed (in our case the instance ownder is db2inst1). Have
> tried installing DB2 with the user db2inst1 already existing and not.
> Same error seems to appear. The key errors from the output below
> appear to be:
>
> ERROR:Could not switch current DB2INSTANCE to "db2inst1". The return
> code is
> "-2029059916".
> ERROR:DBI1122E Instance db2inst1 cannot be updated.[/color]

A:

Most likely, when you uninstalled, you removed the ~db2inst1/sqllib via
rm -rf, rather than via db2idrop. There are crumbs still sticking
around in your system.

Install the product, don't bother with the instance. Run
/usr/opt/db2_08_01/instance/db2ilist (as root). If it shows db2inst1
in the list, this is your problem. The solution is to recreate the
~db2inst1/sqllib directory (just use mkdir), then try db2idrop. Once
the instance is properly dropped, you can use db2isetup (also in the
..../instance directory) to recreate the instance.

Hope this helps,

A:

Works!! Thanks for your help!


Note 4:
-------

Technote:

http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&q1=loopback+extshm&uid=swg21009742&loc=en_US&cs=utf-8&lang=en

DB2 issues SQL1224N and WebSphere Application Server (WAS) admin server fails with StaleConnectionException 
when attempting more than 10 local concurrent DB2 connections from a single process.

Problem 
On AIX 4.3.3 or later, DB2 will issue SQL1224N and WebSphere administration server will fail with 
StaleConnectionException when attempting more than 10 local concurrent DB2 connections from a single process. 
JDK 1.1.8 allows a maximum number of 10 local concurrent DB2 connections. JDK 1.2.2 allows a maximum of 4 
local connections. JDK 1.3.0 allows a maximum of 2 local connections.
 
Solution 
Symptoms 
DB2 errors: 

In db2diag.log, it has DIA9999E "An internal error occurred" with an error return code of 18 and sqlcode -1224 
appear when running DB2 with a WebSphere application:
2000-10-26-14.46.36.060751 Instance:db2ninst Node:000
PID:35928(java) Appid:
oper_system_services sqlocshr Probe:200 
DIA9999E An internal error occurred. Report the following error code : " 18".

Data Title:SQLCA PID:35928 Node:000
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -1224 sqlerrml: 0
sqlerrmc:
sqlerrp : sqlearcn
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000
(4) 0x00000000 (5) 0x000000FF (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:


The javacore.txt log file shows that an exception is thrown due to SQL1224N when the application attempts 
to connect to the database: 

COM.ibm.db2.jdbc.DB2Exception: [IBM][CLI Driver] SQL1224N A database agent could not be started to service a request, 
or was terminated as a result of a database system shutdown or a force command. SQLSTATE=55032

at java.lang.Throwable.<init>(Throwable.java:96)
at java.lang.Exception.<init>(Exception.java:44)
at java.sql.SQLException.<init>(SQLException.java:45)
at COM.ibm.db2.jdbc.DB2Exception.<init>(DB2Exception.java:93)
at COM.ibm.db2.jdbc.app.SQLExceptionGenerator.throw_SQLException(SQLExceptionGenerator.java:164)
at COM.ibm.db2.jdbc.app.SQLExceptionGenerator.check_return_code(SQLExceptionGenerator.java:402)
at COM.ibm.db2.jdbc.app.DB2Connection.connect(DB2Connection.java(Compiled Code))
at COM.ibm.db2.jdbc.app.DB2Connection.<init>(DB2Connection.java(Compiled Code))
at COM.ibm.db2.jdbc.app.DB2Driver.connect(DB2Driver.java(Compiled Code))
at java.sql.DriverManager.getConnection(DriverManager.java(Compiled Code))
at java.sql.DriverManager.getConnection(DriverManager.java:183)
at newtest.connectDM(newtest.java:35)
at newtest.run(newtest.java:109)
at java.lang.Thread.run(Thread.java:498)
 

Possible cause 

The error return code 18 indicates that there are too many files open and therefore, no available 
segment registers. The Websphere application has reached AIX's limit of 10 shared memory segments per process, 
and so DIA9999E is generated. 

SQL1224N and StaleConnectionException result as a result of DB2 not being able to obtain a new shared memory segment.

Action
DB2 UDB Version 7.2 (DB2 UDB Version 7.1 FixPak 3) or later
The support of EXTSHM has been added to V7.2 (V7.1 Fixpak 3). By default, AIX does not permit 32-bit applications 
to attach to more than 11 shared memory segments per process, of which a maximum of 10 can be used for 
local DB2 connections. To use EXTSHM with DB2, do the following:

In DB2 client sessions:
export EXTSHM=ON
When starting the DB2 UDB Server:
export EXTSHM=ON
db2set DB2ENVLIST=EXTSHM
db2start
On DB2 UDB EEE, also add the following lines to sqllib/db2profile:
EXTSHM=ON
export EXTSHM

The above information has been documented in the DB2 UDB Release Notes for Version 7.2 / Version V7.1 FixPak 3, page 366. 
You can get it from: ftp://ftp.software.ibm.com/ps/products/db2/info/vr7/pdf/letter/db2ire71.pdf


Note 5:
-------

http://publib.boulder.ibm.com/infocenter/wpdoc/v510/index.jsp?topic=/com.ibm.wp.ent.doc_5.1/wps/tbl_adm.html

When modifying user information via WebSphere Portal, if you receive the error Backend storage system failed. 
Please try again later. or the user attributes are not updated in LDAP, it might mean that the default 
tuning parameters for use with DB2 and IBM Tivoli Directory Server need to be adjusted.

Solution: The default DB2 parameters are:

APP_CTL_HEAP_SZ 128
APPLHEAP_SZ 128

The parameters above are too small for IBM Tivoli Directory Server and WebSphere Portal on AIX with 2000 user entries.

The HEAP size of UDB is required when updating or inserting data. WebSphere Portal spawns heavy transactions 
to the LDAP server in any phase, especially changing user attributes, which spawns several updates and inserts. 
To prevent this problem, the following WebSphere Portal tuning is required:

su -ldapdb2
db2 -c update db cfg for ldap using APP_CTL_HEAP_SZ 1024
db2 -c update db cfg for ldap using APPLHEAP_SZ 1024  




82.4 Implementing LDAP on HP-UX:
================================





82.5 Implementing LDAP on RedHat:
=================================


82.5.1 OpenLDAP Daemons and Utilities:
--------------------------------------

The suite of OpenLDAP libraries and tools is spread out over the following packages: 


openldap         - Contains the libraries necessary to run the openldap server and client applications. 

openldap-clients - Contains command-line tools for viewing and modifying directories on an LDAP server. 

openldap-server  - Contains the servers and other utilities necessary to configure and run an LDAP server. 


There are two servers contained in the openldap-servers package: the Standalone LDAP Daemon (/usr/sbin/slapd) 
and the Standalone LDAP Update Replication Daemon (/usr/sbin/slurpd). 

The slapd daemon is the standalone LDAP server while the slurpd daemon is used to synchronize changes from 
one LDAP server to other LDAP servers on the network. The slurpd daemon is only necessary when dealing 
with multiple LDAP servers. 

To perform administrative tasks, the openldap-server package installs the following utilities into the 
/usr/sbin/ directory: 


slapadd    - Adds entries from an LDIF file to an LDAP directory. For example, 
            /usr/sbin/slapadd -l ldif-input will read in the LDIF file, ldif-input, containing the new entries. 

slapcat    - Pulls entries out of an LDAP directory in the default format - Berkeley DB - and saves them 
             in an LDIF file. For example, the command /usr/sbin/slapcat -l ldif-output will output an LDIF file 
             called ldif-output containing the entries from the LDAP directory. 

slapindex  - Re-indexes the slapd directory based on the current content. 

slappasswd - Generates an encrypted user password value for use with ldapmodify or the rootpw value in the 
             slapd configuration file, /etc/openldap/slapd.conf. Execute /usr/sbin/slappasswd to create the password. 


 Warning 
  Be sure to stop slapd by issuing "/usr/sbin/service slapd stop" before using slapadd, slapcat or slapindex. 
  Otherwise, the consistency of the LDAP directory is at risk. 
 

The openldap-clients package installs tools used to add, modify, and delete entries in an LDAP directory 
into /usr/bin/. These tools include the following: 


ldapmodify  - Modifies entries in an LDAP directory, accepting input via a file or standard input. 

ldapadd     - Adds entries to your directory by accepting input via a file or standard input; 
              ldapadd is actually a hard link to ldapmodify -a. 

ldapsearch  - Searches for entries in the LDAP directory using a shell prompt. 

ldapdelete  - Deletes entries from an LDAP directory by accepting input via user input at the terminal or via a file. 


With the exception of ldapsearch, each of these utilities is more easily used by referencing a file containing 
the changes to be made rather than typing a command for each entry you wish to change in an LDAP directory. 
The format of such a file is outlined in each application's man page. 

NSS, PAM, and LDAP
In addition to the OpenLDAP packages, Red Hat Linux includes a package called nss_ldap which enhances LDAP's ability 
to integrate into both Linux and other UNIX environments. 

The nss_ldap package provides the following modules: 

/lib/libnss_ldap-<glibc-version>.so

/lib/security/pam_ldap.so

The libnss_ldap-<glibc-version>.so module allows applications to look up user, group, hosts, and other 
information using an LDAP directory via glibc's Nameservice Switch (NSS) interface. NSS allows applications 
to authenticate using LDAP in conjunction with Network Information Service (NIS) name service and 
flat authentication files. 

The pam_ldap module allows PAM-aware applications to authenticate users using information stored in an 
LDAP directory. PAM-aware applications include console login, POP and IMAP mail servers, and Samba. By deploying 
an LDAP server on your network, all of these login situations can authenticate against one user ID and 
password combination, greatly simplifying administration. 

PHP4, the Apache HTTP Server, and LDAP
Red Hat Linux includes a package containing LDAP module for the PHP server-side scripting language. 

The php-ldap package adds LDAP support to the PHP4 HTML-embedded scripting language via the 
/usr/lib/php4/ldap.so module. This module allows PHP4 scripts to access information stored in an LDAP directory. 
 



82.5.2 OpenLDAP Configuration Files:
------------------------------------

OpenLDAP configuration files are installed into the /etc/openldap/ directory. The following is a brief 
list highlighting the most important directories and files: 


/etc/openldap/schema/ directory - This subdirectory contains the schema used by the slapd daemon. 
                                  
/etc/openldap/ldap.conf         - This is the configuration file for all client applications which use 
                                  the OpenLDAP libraries. These include, but are not limited to, Sendmail, Pine, 
                                  Balsa, Evolution, and Gnome Meeting. 

/etc/openldap/slapd.conf        - This is the configuration file for the slapd daemon.  


 Note 
  If the nss_ldap package is installed, it will create a file named /etc/ldap.conf. This file is used by the 
  PAM and NSS modules supplied by the nss_ldap package. See the Section called Configuring Your System to 
  Authenticate Using OpenLDAP for more information about this configuration file. 
 

-- slapd.conf
In order to use the slapd LDAP server, you will need to modify its configuration file, 
/etc/openldap/slapd.conf. You must to edit this file to make it specific to your domain and server. 

The suffix line names the domain for which the LDAP server will provide information. The suffix line should be 
changed from: 

suffix          "dc=your-domain,dc=com"
 
so that it reflects your domain name. For example: 

suffix          "dc=example,dc=com"
 

The rootdn entry is the Distinguished Name (DN) for a user who is unrestricted by access controls or 
administrative limit parameters set for operations on the LDAP directory. The rootdn user can be thought of as 
the root user for the LDAP directory. In the configuration file, change the rootdn line from its default value 
to something like the example below: 

rootdn          "cn=root,dc=example,dc=com"
 
Change the rootpw line to something like the example below: 

rootpw          {SSHA}vv2y+i6V6esazrIv70xSSnNAJE18bb2u
 
In the rootpw example, you are using an encrypted root password, which is a much better idea than leaving a 
plain text root password in the slapd.conf file. To make this encrypted string, type the following command: 

# slappasswd
 
You will be prompted to type and then re-type a password. The program prints the resulting encrypted password 
to the terminal. 

 Warning 
  LDAP passwords, including the rootpw directive specified in /etc/openldap/slapd.conf, are sent over the network 
  in plain text unless you enable TLS encryption. 
 
For added security, the rootpw directive should only be used if the initial configuration and population 
of the LDAP directory occurs over a network. After the task is completed, it is best to comment out the rootpw 
directive by preceding it with a pound sign (#). 


 Tip 
  If you are using the slapadd command-line tool locally to populate the LDAP directory, using the rootpw directive 
  is not necessary. 
 

The /etc/openldap/schema/ Directory

The /etc/openldap/schema/ directory holds LDAP definitions, previously located in the slapd.at.conf and 
slapd.oc.conf files. All attribute syntax definitions and objectclass definitions are now located 
in the different schema files. The various schema files are referenced in /etc/openldap/slapd.conf 
using include lines, as shown in this example: 

include		/etc/openldap/schema/core.schema
include		/etc/openldap/schema/cosine.schema
include		/etc/openldap/schema/inetorgperson.schema
include		/etc/openldap/schema/nis.schema
include		/etc/openldap/schema/rfc822-MailMember.schema
include		/etc/openldap/schema/autofs.schema
include		/etc/openldap/schema/kerberosobject.schema
 
 Caution 
  You should not modify any of the schema items defined in the schema files installed by OpenLDAP. 
 
You can extend the schema used by OpenLDAP to support additional attribute types and object classes using 
the default schema files as a guide. To do this, create a local.schema file in the /etc/openldap/schema directory. 
Reference this new schema within slapd.conf by adding the following line below your default include schema lines: 

include          /etc/openldap/schema/local.schema
 
Next, go about defining your new attribute types and object classes within the local.schema file. 
Many organizations use existing attribute types and object classes from the schema files installed by default 
and modify them for use in the local.schema file. This can help you to learn the schema syntax while meeting 
the immediate needs of your organization. 

Extending schema to match certain specialized requirements is quite involved and beyond the scope of this chapter. 
Visit http://www.openldap.org/doc/admin/schema.html for information on writing new schema files. 


82.5.3 Setting up OpenLDAP on RedHat:
-------------------------------------

The basic steps for creating an LDAP server are as follows: 

1. Install the openldap, openldap-servers, and openldap-clients RPMs. 

2. Edit the /etc/openldap/slapd.conf file to reference your LDAP domain and server. 
   Refer to the Section called slapd.conf for more information on how to edit this file. 

3. Start slapd with the command:

/sbin/service/ldap start
 

After you have configured LDAP correctly, you can use chkconfig, ntsysv, or Services Configuration Tool 
to configure LDAP to start at boot time. For more information about configuring services, 
see to the chapter titled Controlling Access to Services in the Official Red Hat Linux Customization Guide. 

4. Add entries to your LDAP directory with ldapadd. 

5. Use ldapsearch to see if slapd is accessing the information correctly. 

6. At this point, your LDAP directory should be functioning properly and you can configure any LDAP-enabled 
   applications to use the LDAP directory. 




=========================
83. Introduction SAMBA:
=========================


83.1 Introduction:
==================


-- File and Print services:

Traditionally, unix machines have their own "usual" protocols and utilities on top of tcp/ip 
with regards to file and print services, like scp, ftp, http, rcp, lp, ipc mechanisms etc..

File and print services on Windows, traditionally uses "Server Message Blocks", otherwise known
as the SMB protocol. 

The SMB protocol can be installed on unix as well, making it "look" like a Windows Server
as far as Windows clients are concerned, who want to use a Server for file and print services. 
For this to make a reality, you can instal "Samba" on your unix machine.

-- Authentication:

Machines from both the Windows an unix worlds, have means to "authenticate" a user locally,
or let the user be authenticated by a remote entity.

For example, on a unix machine, a user "can logon locally", using the local password file (in reality,
this could be more complex), or be authenticated "remotely" by "NIS" (Network Information System),
or be authenticated by a ldap Server etc..

Also, on a Windows machine, a user might logon locally, or be authenticated remotely by a 
PDC or BDC (Domain Controllers in a NT4 network), or be authenticated by Active Directory (Win2K, Win2K3).

With samba, you can integrate a unix machine in the Windows-type of authentication, that is, let a unix
machine function as a Windows Domain Controller (NT4), or integrate it in Active Directory (2000, 2003).

In the next sections, we take a look on how samba can be used on HP-UX, Solaris, RedHat, and AIX.


 


=========================
84. AIX and SNA:
=========================


Note 1:
-------

SNA defines a set of rules that systems use to communicate. These rules define the layout of the data 
that flows between the systems and the action the systems take when they receive the data. 
SNA does not specify how a system implements the rules. A fundamental objective of SNA is to allow 
systems that have very different internal hardware and software designs to communicate. 
The only requirement is that the externals meet the rules of the architecture. 

Logical Unit (LU) is an SNA term used to describe a logical collection of services that can be accessed 
from a network. In this environment, you can think of a CICS region as an LU. SNA defines many different 
types of LUs, including devices like terminals and printers. The type of LU that is used for 
CICS intersystem communication is LU type 6.2. 

Each LU is identified by a name of up to eight characters, referred to as the LU name. An IBM mainframe-based 
CICS system uses the APPLID defined in the CICS system initialization table as its LU name 
(also referred to as a NETNAME). The LU name for a CICS OS/2 system is specified in the 
Communications Manager/2 Local LU definition, and the LU name for a CICS/400 system is defined 
in the APPL parameter of the ADDCICSSIT command. 

An SNA network also has a name of up to eight characters, called the network name. The network name 
is sometimes referred to as the network ID or the netid. An LU can be uniquely identified by combining 
its LU name with the network name of the network that owns it. The LU's name is then referred to as the 
network-qualified LU name or the fully-qualified LU name. For example, if an LU named CICSA belongs to 
a network named NETWORK1, its network-qualified LU name is NETWORK1.CICSA. 

For an LU to communicate with another LU, it must establish at least one session between them. 
The request to activate a session is referred to as a BIND request. It is used to pass details 
of the capabilities of the initiating LU to the receiving system, and also to determine a route 
through the network. The receiving LU then sends a description of its capabilities to the 
initiating LU in the BIND response. Once the session is established, it can be used for a number 
of intersystem requests and remains active for as long as the two LUs and the network between them are available. 

When you configure your network, you can set up different characteristics for the sessions established 
between a pair of LUs, such as in the route they take through the network. Session characteristics 
are referred to as modegroups. All the sessions associated with a modegroup have the same characteristics. 
A modegroup is identified by a modename of up to eight characters. 

When defining a CICS region, you must also identify the SNA synchronization level required. 
CICS supports all three synchronization levels defined by SNA: 


Synchronization level 0 (NONE)-- SNA provides no synchronization support. The application must code its own. 
Synchronization level 1 (CONFIRM)-- SNA provides the ability to send simple acknowledgment requests. 
Synchronization level 2 (SYNCPOINT)-- SNA provides the ability for two or more systems to treat the updates 
made by an application on these systems as one logical unit of work (LUW). 

There are many ways to connect CICS systems in a network. If the data is successfully transferred in the 
correct format, these CICS systems are unaware of the network makeup. SNA configuration is performed at two levels: 

-The logical level, described in the preceding paragraphs, incorporates the characteristics of the systems 
that wish to communicate. 
-The physical level incorporates the linking of actual machines, or nodes, in the network. Each node has 
physical links, or connections, to other nodes so that every node is connected to at least one other node. 
Data must sometimes travel along a number of links to get from one system to another. Also, these links 
can be of different types. For example, IBM Token Ring, Synchronous Data Link Control (SDLC), Ethernet, 
and X.25 are all physical links. These types of links are collectively referred to as 
data link control (DLC) protocols. 


Each node has a Physical Unit (PU). This is a combination of hardware and software that controls the links 
to other nodes. Several PU types with different capabilities and responsibilities exist, such as: 

-PU type 5--The best-known example is an IBM mainframe processor running VTAM. VTAM provides the support 
 for the Systems Services Control Point (SSCP) function defined in SNA. 
-PU type 4--This is a communications controller, such as an Advanced Communications Function for the 
 Network Control Program (ACF/NCP), that resides in the center of a network, routing and controlling the 
 data flow between machines. 
-PU type 2--This is a small machine, such as an advanced program-to-program communications (APPC) workstation. 
 It can communicate directly only with a PU type 4 or PU type 5 and relies on these PUs to route the data to the 
 correct system. 
-PU type 2.1--This is a more advanced PU type 2 that can also communicate with other PU type 2.1 nodes directly. 
 This node can support an independent LU. An independent LU can establish a session with another LU 
 without using VTAM. Communications Server for AIX is a PU type 2.1 node. 


PU type 2.1 nodes may have support for Advanced Peer-to-Peer Networking (APPN). This support enables a node 
to search for an LU in the network, rather than requiring a remote LU's location to be preconfigured locally. 
There are two types of APPN nodes: end nodes and network nodes. An end node can receive a search request 
for an LU and respond, indicating whether the LU is local to the node or not. A network node can issue search 
requests, as well as respond to them, and maintains a dynamic database that contains the results of 
the search requests. Support for APPN can greatly reduce the maintenance work in an SNA network, especially 
if the network is large or dynamic. Communications Server for AIX supports APPN.


Note 2:
-------

IBMr Communications Server for AIXr provides an essential foundation for enterprise networking

It helps provide a security-rich, scalable, and high-performance communications solution for the AIX operating system.

-Reaps the benefits of IBM's years of experience with SNA, TCP/IP, and network computing
-Enables customers and Business Partners to choose applications based on their business needs, 
 not their network infrastructure
-Provides an excellent offering for multi-protocol networking environments with Enterprise Extender, 
 enhanced TN3270E Server, Telnet Redirector, and Remote API client/server support
-Offers use of comprehensive Secure Sockets Layer (SSL) data encryption, and SSL client and server 
 authentication with the TN3270E Server, the Telnet Redirector and the Remote API Client/Server using 
 HTTPS connections for access to SNA networks
-Offers the ideal choice for customers who need more secure, robust Telnet and Remote API networking environments
-Includes full implementation of APPN (network node and end node), HPR, and DLUR, along with integrated 
 gateway capabilities, positioning itself as a participant in a host (hierarchical) or peer-to-peer distributed 
 network environment
-Operating systems supported: AIX


IBM Communications Server exist for:



Note 3:
-------

Introduction to SNA 
Summary: In the early 1970s, IBM discovered that large customers were reluctant to trust unreliable 
communications networks to properly automate important transactions. In response, IBM developed 
Systems Network Architecture (SNA). "Anything that can go wrong will go wrong," and SNA may be unique 
in trying to identify literally everything that could possibly go wrong in order to specify the proper response. 
Certain types of expected errors (such as a phone line or modem failure) are handled automatically. 
Other errors (software problems, configuration tables, etc.) are isolated, logged, and reported 
to the central technical staff for analysis and response. This SNA design worked well as long as communications 
equipment was formally installed by a professional staff. It became less useful in environments when any PC 
simply plugs in and joins the LAN. Two forms of SNA developed: Subareas (SNA Classic) managed by mainframes, 
and APPN (New SNA) based on networks of minicomputers. 

In the original design of SNA, a network is built out of expensive, dedicated switching minicomputers 
managed by a central mainframe. The dedicated minicomputers run a special system called NCP. No user programs 
run on these machines. Each NCP manages communications on behalf of all the terminals, workstations, and PCs 
connected to it. In a banking network, the NCP might manage all the terminals and machines in branch offices 
in a particular metropolitan area. Traffic is routed between the NCP machines and eventually into the central mainframe. 

The mainframe runs an IBM product called VTAM, which controls the network. Although individual messages 
will flow from one NCP to another over a phone line, VTAM maintains a table of all the machines and 
phone links in the network. It selects the routes and the alternate paths that messages can take between 
different NCP nodes. 

A subarea is the collection of terminals, workstations, and phone lines managed by an NCP. Generally, 
the NCP is responsible for managing ordinary traffic flow within the subarea, and VTAM manages the connections 
and links between subareas. Any subarea network must have a mainframe. 

The rapid growth in minicomputers, workstations, and personal computers forced IBM to develop a second kind of SNA. 
Customers were building networks using AS/400 minicomputers that had no mainframe or VTAM to provide control. 
The new SNA is called APPN (Advanced Peer to Peer Networking). APPN and subarea SNA have entirely different 
strategies for routing and network management. Their only common characteristic is support for applications 
or devices using the APPC (LU 6.2) protocol. Although IBM continues the fiction that SNA is one architecture, 
a more accurate picture holds that it is two compatible architectures that can exchange data. 

It is difficult to understand something unless you have an alternative with which to compare it. Anyone reading 
this document has found it from the PC Lube and Tune server on the Internet. This suggests the obvious 
comparison: SNA is not TCP/IP. This applies at every level in the design of the two network architectures. 
Whenever the IBM designers went right, the TCP/IP designers went left. As a result, instead of the two 
network protocols being incompatible, they turn out to be complimentary. An organization running both 
SNA and TCP/IP can probably solve any type of communications problem. 

An IP network routes individual packets of data. The network delivers each packed based on an address number 
that identifies the destination machine. The network has no view of a "session". When PC Lube and Tune sends 
this document through the network to your computer, different pieces can end up routed through different cities. 
TCP is responsible for reassembling the pieces after they have been received. 

In the SNA network, a client and server cannot exchange messages unless they first establish a session. 
In a Subarea network, the VTAM program on the mainframe gets involved in creating every session. 
Furthermore, there are control blocks describing the session in the NCP to which the client talks 
and the NCP to which the server talks. Intermediate NCPs have no control blocks for the session. 
In APPN SNA, there are control blocks for the session in all of the intermediate nodes through which 
the message passes. 

Every design has advantages and limitations. The IP design (without fixed sessions) works well in experimental 
networks built out of spare parts and lab computers. It also works well for its sponsor (the Department of Defense) 
when network components are being blown up by enemy fire. In exchange, errors in the IP network often go unreported 
and uncorrected, because the intermediate equipment reroutes subsequent messages through a different path. 
The SNA design works well to build reliable commercial networks out of dedicated, centrally managed devices. 
SNA, however, requires a technically trained central staff ready and able to respond to problems as they are 
reported by the network equipment. 

The mainframe-managed subarea network was originally designed so that every terminal, printer, or application 
program was configured by name on the mainframe before it could use the network. This worked when 3270 terminals 
were installed by professional staff and were cabled back to centrally managed control units. Today, when 
ordinary users buy a PC and connect through a LAN, this central configuration has become unwieldy. 
One solution is to create a "pool" of dummy device names managed by a gateway computer. PC's then power up 
and borrow an unused name from the pool. Recent releases allow VTAM to define a "prototype" PC and 
dynamically add new names to the configuration when devices matching the prototype appear on the network. 

A more formal solution, however, is provided by the APPN architecture designed originally for minicomputers. 
APPN has two kinds of nodes. An End Node (EN) contains client and server programs. Data flows in or out of 
an End Node, but does not go through it. A Network Node (NN) also contains clients and servers, 
but it also provides routing and network management. When an End Node starts up, it connects to one 
Network Node that will provide its access to the rest of the network. It transmits to that NN a list 
of the LUNAMEs that the End Node contains. The NN ends up with a table of its own LUNAMEs and those of all 
the EN's that it manages. 

When an EN client wants to connect to a server somewhere in the network, its sends a BIND message with 
the LUNAME of the server to the NN. The NN checks its own table, and if the name is not matched broadcasts 
a query that ultimately passes through every NN in the network. When some NN recognizes the LUNAME, 
it sends back a response that establishes both a session and a route through the NN's between the client 
and the server program. 

Most of APPN is the set of queries and replies that manage names, routes, and sessions. Like the rest of SNA, 
it is a fairly complicated and exhaustively documented body of code. 

Obviously workstations cannot maintain a dynamic table that spans massive networks or long distances. 
The solution to this problem is to break the APPN network into smaller local units each with a Network ID (NETID). 
In common use, a NETID identifies a cluster of workstations that are close to each other 
(in a building, on a campus, or in the same city). The dynamic exchange of LUNAMEs does not occur between 
clusters with different NETIDs. Instead, traffic to a remote network is routed based on the NETID, 
and traffic within the local cluster is routed based on the LUNAME. The combination of NETID and LUNAME 
uniquely identifies any server in the system, but the same LUNAME may appear in different NETID groups 
associated with different local machines. After all, one has little difficulty confusing "CHICAGO.PRINTER" 
from "NEWYORK.PRINTER" even though the LUNAME "PRINTER" is found in each city. 

TCP/IP is a rather simple protocol. The source code for programs is widely available. SNA is astonishing complex, 
and only IBM has the complete set of programs. It is built into the AS/400. Other important workstation products include: 

NS/DOS for DOS and Windows 
Communications Manager for OS/2 
SNA Services for AIX 
SNA Server for Windows NT [from Microsoft] 

The native programming interface for modern SNA networks is the Common Programming Interface for Communications 
(CPIC). This provides a common set of subroutines, services, and return codes for programs written in COBOL, 
C, or REXX. It is documented in the IBM paper publication SC26-4399, but it is also widely available in 
softcopy on CD-ROM. 

Under the IBM Communications Blueprint, SNA becomes one of several interchangeable "transport" options. 
It is a peer of TCP/IP. The Blueprint is being rolled out in products carrying the "Anynet" title. 
This allows CPIC programs to run over TCP/IP, or programs written to use the Unix "socket" interface can run 
over SNA networks. Choice of network then depends more on management characteristics. 

The traditional SNA network has been installed and managed by a central technical staff in a large corporation. 
If the network goes down, a company like Aetna Insurance is temporarily out of business. TCP/IP is designed to be 
casual about errors and to simply discard undeliverable messages.




Note 3:
-------

Using IBM Communications Server for AIX with CICS

--------------------------------------------------------------------------------

Starting AIX SNA
To start Communications Server for AIX, enter smitty sna and select these options: 

  -> Manage SNA Resources
      -> Start SNA Resources
          -> Start Node              

This command starts SNA, the node, and the main SNA process. It also starts the links that listen 
for other machines calling to activate links if the activation parameter on the configuration of the DLC, 
port, and link station is set to start the links at startup time. 

If you have defined a link that calls another machine, you can start this link by using the following command: 

  -> Manage SNA Resources
      -> Start SNA Resources
          -> Start Link Station              

You can start a session by using the following command: 

  -> Manage SNA Resources
      -> Start SNA Resources
          -> Start an SNA Session              

To start a session, you must supply either a local LU name or a local LU alias and either a partner LU alias 
or a fully-qualified partner LU name. You must also supply a modename. In the example below, 
OPENCICS is the LU alias and CICSESA is the partner LU alias. CICSISC0 is a modegroup 
that is valid for the connection. 


Figure 53. Starting an SNA Session


+--------------------------------------------------------------------------------+
|                                                Start an SNA Session            |
|                                                                                |
|Type or select values in entry fields.                                          |
|Press Enter AFTER making all desired changes.                                   |
|                                                                                |
|                                                        [Entry Fields]          |
|  Enter one of:                                                                 |
|        Local LU alias                               [OPENCICS]               + |
|        Local LU name                                []                       + |
|                                                                                |
|  Enter one of:                                                                 |
|        Partner LU alias                             [CICSESA]                + |
|        Fully-qualified Partner LU name              []                       + |
|                                                                                |
|* Mode name                                          [CICSISC0]               + |
|  Session polarity                                    POL_EITHER              + |
|  CNOS permitted?                                     YES                     + |
|                                                                                |
|                                                                                |
|F1=Help             F2=Refresh          F3=Cancel           F4=List             |
|F5=Reset            F6=Command          F7=Edit             F8=Image            |
|F9=Shell            F10=Exit            Enter=Do                                |
+--------------------------------------------------------------------------------+
If the command returns an error indicating that no sessions can be activated between LUs, one of the 
following problems exists: 

-The link station that is used by the connection is not active. 
-The maximum number of sessions has been started already. 
-The specified modename, although defined locally, is not known on the remote system. 
-The specified local or remote system name is not known on the remote machine. 
-The remote system is not accepting connection requests (for example, if it is a mainframe-based CICS system, 
 the connection possibly is not installed and in service). 
-Check that the configuration matches the values in the remote system. 




Note 4:
-------

Versions of SNA Services for AIX and Communications Server 
 Technote (troubleshooting) 
  
Problem(Abstract) 
Versions of IBM's SNA Services for AIX and Communications Server  
  
 
 
Resolving the problem 
The following table provides information about the different versions of Communications Server for AIX and 
the levels of AIX on which they will run.
The VRMF (Version.Release.Modification.Fixlevel) values and the external name of different AIX SNA levels 
have changed over time. 
You can check the VRMF level by issuing    : lslpp -h 'sna*' 
You can check the product number by issuing: lslpp -i 'sna*' 

The listed AIX levels are the minimum levels required for CS/AIX to function.

The only currently supported version is 6.3 on AIX 5.2 and higher.


External Name V.R.M.F. AIX Levels Product # 
Communications Server for AIX, V6.3 6.3.1.0 5.2 ML5,
5.3 ML2, 6.1 5765-E51 
6.3.0.x 5.2 ML5,
5.3 ML2 
Communications Server for AIX, V6.1 (EOS 09/30/2006; 12/31/2003 on AIX 4) 6.1.0.5 4.3.3, 5.1, 5.2, 5.3 
6.1.0.1  4.3.3, 5.1, 5.2 
6.1.0.0  4.3.3, 5.1  
Communications Server for AIX, V6 (EOS 06/30/2002) 6.0.x.x  4.1.5, 4.2.1, 4.3.2  
Communications Server for AIX, V5 (EOS 06/30/2001) 5.0.x.x  4.1.5, 4.2.1, 4.3  5765-D20  
Communications Server for AIX, V4.2 (EOS 11/30/1998) 3.1.2.x  4.1, 4.2, 4.3  5765-652  
Communications Server for AIX, V4.1 (EOS 11/30/1998) 3.1.1.x  4.1, 4.2  
Communications Server for AIX, V3 (EOS 03/31/1997) 3.1.0.x  4.1  5765-582 
AIX SNA Server/6000 V2.2 (EOS 04/26/1996) 2.2.x.x  4.1  5765-247  
AIX SNA Server/6000 V2.1 (EOS 12/31/1997) 1.3.x.x  3.2.5  
AIX SNA Services/6000 V1 (EOS 12/31/1995) 1.2.x.x  3.1, 3.2  5601-287 


EOS = End Of Service: No defect work will be performed after this date.  
 
 
 


84. The dd and od commands:
===========================

Note 1:
-------

http://www.codecoffee.com/tipsforlinux/articles/036.html

>> How and when to use the dd command?  
 

In this article, Sam Chessman explains the use of the dd command with a lot of useful examples. This article is not aimed at absolute beginners. 
Once you are familiar with the basics of Linux, you would be in a better position to use the dd command. 

The ' dd ' command is one of the original Unix utilities and should be in everyone's tool box. It can strip headers, extract parts of 
binary files and write into the middle of floppy disks; it is used by the Linux kernel Makefiles to make boot images. 
It can be used to copy and convert magnetic tape formats, convert between ASCII and EBCDIC, swap bytes, and force to upper and lowercase. 


For blocked I/O, the dd command has no competition in the standard tool set. One could write a custom utility to do specific I/O or 
formatting but, as dd is already available almost everywhere, it makes sense to use it. 

Like most well-behaved commands, dd reads from its standard input and writes to its standard output, unless a command line specification 
has been given. This allows dd to be used in pipes, and remotely with the rsh remote shell command. 

Unlike most commands, dd uses a keyword=value format for its parameters. This was reputedly modeled after IBM System/360 JCL, 
which had an elaborate DD 'Dataset Definition' specification for I/O devices. A complete listing of all keywords is available from GNU dd with 

$ dd --help

Some people believe dd means ``Destroy Disk'' or ``Delete Data'' because if it is misused, a partition or output file can be trashed very quickly. 
Since dd is the tool used to write disk headers, boot records, and similar system data areas, misuse of dd has probably trashed 
many hard disks and file systems. 

In essence, dd copies and optionally converts data. It uses an input buffer, conversion buffer if conversion is specified, and an output buffer. 
Reads are issued to the input file or device for the size of the input buffer, optional conversions are applied, and writes are issued 
for the size of the output buffer. This allows I/O requests to be tailored to the requirements of a task. Output to standard error reports 
the number of full and short blocks read and written. 


Example 1


A typical task for dd is copying a floppy disk. As the common geometry of a 3.5" floppy is 18 sectors per track, two heads and 80 cylinders, 
an optimized dd command to read a floppy is: 

Example 1-a : Copying from a 3.5" floppy

dd bs=2x80x18b if=/dev/fd0 of=/tmp/floppy.image 
1+0 records in
1+0 records out 

The 18b specifies 18 sectors of 512 bytes, the 2x multiplies the sector size by the number of heads, and the 80x is for the cylinders--
a total of 1474560 bytes. This issues a single 1474560-byte read request to /dev/fd0 and a single 1474560 write request to 
/tmp/floppy.image, whereas a corresponding cp command 

cp /dev/fd0 /tmp/floppy.image


issues 360 reads and writes of 4096 bytes. While this may seem insignificant on a 1.44MB file, when larger amounts of data are involved, 
reducing the number of system calls and improving performance can be significant. 


This example also shows the factor capability in the GNU dd number specification. This has been around since before the Programmers Work Bench and, 
while not documented in the GNU dd man page, is present in the source and works just fine, thank you. 


To finish copying a floppy, the original needs to be ejected, a new diskette inserted, and another dd command issued to write to the diskette: 

Example 1-b : Copying to a 3.5" floppy
dd bs=2x80x18b < /tmp/floppy.image > /dev/fd0 
1+0 records in 
1+0 records out 

Here is shown the stdin/stdout usage, in which respect dd is like most other utilities. 


Example 2


The original need for dd came with the 1/2" tapes used to exchange data with other systems and boot and install Unix on the PDP/11. 
Those days are gone, but the 9-track format lives. To access the venerable 9-track, 1/2" tape, dd is superior. With modern SCSI tape devices, 
blocking and unblocking are no longer a necessity, as the hardware reads and writes 512-byte data blocks. 

However, the 9-track 1/2" tape format allows for variable length blocking and can be impossible to read with the cp command. The dd command allows 
for the exact specification of input and output block sizes, and can even read variable length block sizes, by specifying an input buffer size larger 
than any of the blocks on the tape. Short blocks are read, and dd happily copies those to the output file without complaint, simply reporting on the 
number of complete and short blocks encountered. 


Then there are the EBCDIC datasets transferred from such systems as MVS, which are almost always 80-character blank-padded Hollerith Card Images! 
No problem for dd, which will convert these to newline-terminated variable record length ASCII. Making the format is just as easy and dd again 
is the right tool for the job. 

Example 2 : Converting EBCDIC 80-character fixed-length record to ASCII variable-length newline-terminated record 
dd bs=10240 cbs=80 conv=ascii,unblock if=/dev/st0 of=ascii.out
40+0 records in
38+1 records out 

The fixed record length is specified by the cbs=80 parameter, and the input and output block sizes are set with bs=10240. 
The EBCDIC-to-ASCII conversion and fixed-to-variable record length conversion are enabled with the conv=ascii,noblock parameter. 


Notice the output record count is smaller than the input record count. This is due to the padding spaces eliminated from the output file and 
replaced with newline characters. 


Example 3


Sometimes data arrives from sources in unusual formats. For example, every time I read a tape made on an SGI machine, the bytes are swapped. 
The dd command takes this in stride, swapping the bytes as required. The ability to use dd in a pipe with rsh means that the tape device 
on any *nix system is accessible, given the proper rlogin setup. 

Example 3 : Byte Swapping with Remote Access of Magnet Tape
rsh sgi.with.tape dd bs=256b if=/dev/rmt0 conv=swab | tar xvf -


The dd runs on the SGI and swaps the bytes before writing to the tar command running on the local host. 


Example 4

Murphy's Law was postulated long before digital computers, but it seems it was specifically targeted for them. 
When you need to read a floppy or tape, it is the only copy in the universe and you have a deadline past due, that is when you will have a bad spot 
on the magnetic media, and your data will be unreadable. To the rescue comes dd, which can read all the good data around the bad spot and continue 
after the error is encountered. Sometimes this is all that is needed to recover the important data. 

Example 4 : Error Handling
dd bs=265b conv=noerror if=/dev/st0 of=/tmp/bad.tape.image 


Example 5


The Linux kernel Makefiles use dd to build the boot image. In the Alpha Makefile /usr/src/linux/arch/alpha/boot/Makefile, 
the srmboot target issues the command: 

Example 5 : Kernel Image Makefile
dd if=bootimage of=$(BOOTDEV) bs=512 seek=1 skip=1 

This skips the first 512 bytes of the input bootimage file (skip=1) and writes starting at the second sector of the $(BOOTDEV) device (seek=1). 
A typical use of dd is to skip executable headers and begin writing in the middle of a device, skipping volume and partition data. 
As this can cause your disk to lose file system data, please test and use these applications with care.

 
Note 2:
-------

 


85. Openssl, certificates, AIX:
===============================

Note 1:
-------

Short for Secure Sockets Layer, a protocol developed by Netscape for transmitting private documents via the Internet. 
SSL uses a cryptographic system that uses two keys to encrypt data - a public key known to everyone and a private or secret key known only 
to the recipient of the message. Both Netscape Navigator and Internet Explorer support SSL, and many Web sites use the protocol 
to obtain confidential user information, such as credit card numbers. By convention, URLs that require an SSL connection 
start with https: instead of http:. 
Another protocol for transmitting data securely over the World Wide Web is Secure HTTP (S-HTTP). Whereas SSL creates a secure connection 
between a client and a server, over which any amount of data can be sent securely, S-HTTP is designed to transmit individual messages securely. 
SSL and S-HTTP, therefore, can be seen as complementary rather than competing technologies. Both protocols have been approved 
by the Internet Engineering Task Force (IETF) as a standard. 

Note 2:
-------

SSL (Secure Sockets Layer), also known as TLS (Transport Layer Security), is a protocol that allows two programs to communicate 
with each other in a secure way. Like TCP/IP, SSL allows programs to create "sockets," endpoints for communication, and make 
connections between those sockets. But SSL, which is built on top of TCP, adds the additional capability of encryption. 
The HTTPS protocol spoken by web browsers when communicating with secure sites is simply the usual World Wide Web HTTP protocol, 
"spoken" over SSL instead of directly over TCP. 
In addition to providing privacy, SSL encryption also allows us to verify the identity of the party we are talking to. 
This can be very important if we don't trust the Internet. While it is unlikely in practice that the root DNS servers 
of the Internet will be subverted, a "man in the middle" attack elsewhere on the network could substitute the address of one 
Internet site for another. SSL prevents this scenario by providing a mathematically sound way to verify the other program's identity. 
When you log on to your bank's website, you want to be very, very sure you are talking to your bank! 

-- How SSL Works
SSL provides both privacy and security using a technique called "public/private key encryption" (often called "asymmetric encryption" 
or simply "public key encryption"). 
A "public key" is a string of letters and numbers that can be used to encrypt a message so that only the owner of the public key can read it. 
This is possible because every public key has a corresponding private key that is kept secret by the owner of the public key. 

-- The SSL Handshake: Identity and Privacy
Let's suppose Jane wants to log into www.examplebank.com. When Jane's web browser makes an HTTPS connection to www.examplebank.com, 
her browser sends the bank's server a string of randomly generated data, which we'll call the "greeting." 
The web server responds with two things: its own public key encoded in an SSL certificate, which we'll examine more closely later, 
and the "greeting" encrypted with its private key. 

Jane's web browser then decrypts the greeting with the bank's public key. If the decrypted greeting matches the original greeting 
sent by the browser, then Jane's browser can be sure it is really talking to the owner of the private key - 
because only the holder of the private key can encrypt a message in such a way that the corresponding public key will decrypt it. 

Now, let's suppose Bob is monitoring this traffic on the Internet. He has the bank's public key, and Jane's greeting. 
But he doesn't have the bank's private key. So he can't encrypt the greeting and send it back. That means Jane can't be fooled by Bob. 

-- The Identity Problem
But what if Bob inserts himself into the picture even before Jane's browser connects to the bank? What if Jane's browser is actually 
talking to Bob's server from the very beginning? Then Bob can substitute his own public and private keys, encrypt the greeting successfully, 
and convince Jane's browser that his computer is the bank's. Not good! 
That's why the complete SSL handshake includes more than just the bank's public key. The public key is part of an "SSL certificate" issued 
by a certificate authority that Jane's browser already trusts. 

How does this work? When web browser software is installed on a computer, it already contains the public keys of several certificate authorities, 
such as GoDaddy, VeriSign and Thawte. Companies that want their secure sites to be "trusted" by web browsers must purchase 
an SSL certificate from one of these authorities. 

But what is the certificate, exactly? The SSL certificate consists essentially of the bank's public key and a statement 
identifying the bank, encrypted with the certificate authority's private key. 

When the bank's web server sends its certificate to Jane's browser, Jane's browser decrypts it with the public key of the 
certificate authority. If the certificate is fake, the decryption results in garbage. If the certificate is valid, out pops 
the bank's public key, along with the identifying statement. And if that statement doesn't include, among other information, 
the same hostname that Jane connected to, Jane receives an appropriate warning message and decides not to continue the connection. 

Now, let's return to Bob. Can he substitute himself convincingly for the bank? No, he can't, because he doesn't have the certificate authority's 
private key. That means he can't sign a certificate claiming that he is the bank. 

Now that Jane's browser is thoroughly convinced that the bank is what it appears to be, the conversation can continue. 


-- certlist 

Purpose
certlist lists the contents of one or more certificates.

Syntax
certlist [-c] [-a attr [attr....] ]tag [username]

Description
The certlist command lists the contents of one or more certificates. Using the -c option causes the output to be formatted 
as colon-separated data with the attribute names associated with each field on the previous line as follows: 

# name: attribute1: attribute2: ... 
User: value1: value2: ... The -f option causes the output to be formatted in stanza file format with the username attribute 
given as the stanza name. Each attribute=value pair is listed on a separate line: 

user: 
     attribute1=value 
     attribute2=value 
     attribute3=value 

When neither of these command line options are selected, the attributes are output as attribute=value pairs.

Flags
-c Displays the output in colon-separated records. 
-f Displays the output in stanzas. 
-a attr Selects one or more attributes to be displayed. 



===================================================
86. Kernel parameters HP-UX, Solaris, Linux for MQ:
===================================================


-------
Note:
-------

Article:

HPUX HP-UX

Kernel parameters HP-UX for MQ:

Kernel configuration
WebSphere® MQ uses semaphores and shared memory. It is possible, therefore, that the default kernel configuration is not adequate.

Before installation, review the machine's configuration and increase the values if necessary. The minimum recommended values 
of the tunable kernel parameters are given in Figure 1. These values might need to be increased if you obtain 
any First Failure Support Technology™ (FFST™) records.

Note: 
On platforms earlier than HP-UX 11i v1.6 (11.22), if you intended to run a high number of concurrent connections 
to WebSphere MQ, you were required to configure the number of kernel timers (CALLOUTS) by altering the NCALLOUT kernel parameter. 
On HP-UX 11i v1.6 (11.22) platforms or later, the NCALLOUT parameter is obsolete as the kernel automatically adjusts the data structures. 
Semaphore and swap usage does not vary significantly with message rate or message persistence. 
WebSphere MQ queue managers are generally independent of each other. Therefore system tunable kernel parameters, 
for example shmmni, semmni, semmns, and semmnu need to allow for the number of queue managers in the system. 
See the HP-UX documentation for information about changing these values. 

Figure 1. Minimum recommended tunable kernel parameters values

   shmmax           536870912
   shmseg           1024
   shmmni           1024
   semaem           16384
   semvmx           32767
   semmns           16384
   semmni           1024 (semmni < semmns)
   semmnu           16384
   semume           256
   max_thread_proc  66
   maxfiles         10000
   maxfiles_lim     10000
   nfile            10000 

Note: For HP-UX 11.23 (11i V2) and later operating systems, the tunable kernel parameters: shmem, sema, semmap, and maxusers, are obsolete. 
This applies to the Itanium and PA-RISC platforms.
You must restart the system once you have made any changes to the tunable kernel parameters. 


System resource limits
You can set global limits for the size of process data segments and the size of process stack segments for the whole system 
by altering the tunable kernel parameters.
The tunable kernel parameters are: 
Parameter What it controls Recommended minimum value 
maxdsiz Maximum size of the data segment for 32–bit processes 1073741824 
maxdsiz_64bit Maximum size of the data segment for 64–bit processes 1073741824 
maxssiz Maximum size of the stack segment for 32–bit processes 8388608 
maxssiz_64bit Maximum size of the stack segment for 64–bit processes 8388608 

If other software on the same machine recommends higher values, then the operation of WebSphere MQ will not 
be adversely affected if those higher values are used.
For the full documentation for these parameters see the HP-UX product documentation.

To apply the settings to an HP-UX 11i system which has the System Administration Manager (SAM) utility, 
you can use SAM to achieve the following steps:
Select and alter the parameters 
Process the new kernel 
Apply the changes and restart the system 
It is possible that other releases of HP-UX provide different facilities to set the tunable kernel parameters. If so, then please 
consult your HP-UX product documentation for the relevant information.

The ulimit shell command
On a per-shell basis the available limits can be tuned down from the values stored for the System resource limits parameters above. 
Use the ulimit shell command to tune the values of the parameters with a combination of the following switches:
Switch Meaning 
-H The hard limit 
-S The soft limit 
-d The data segments size 
-s The stack segment size 

Verifying that the kernel settings are applied
To verify that the resource limits have not been lowered by a ulimit command and that the queue manager will experience 
the correct limits, go to the shell from which the queue manager will be started and enter:

ulimit -Ha
ulimit -SaAmongst the console output you should see:
data(kbytes)   1048576
stack(kbytes)  8192If lower numbers are returned, then a ulimit command has been issued in the current shell to lower the limits. 
You should consult with your system administrator to resolve the issue.




-------
Note:
-------

Article:

Kernel parameters Solaris and MQ:


Resource limit configuration
Configure Solaris systems with the resource limits required by WebSphere® MQ.

WebSphere MQ uses semaphores, shared memory, and file descriptors, and it is probable that the default resource limits 
are not adequate.

The configuration required by WebSphere MQ depends on the version of Solaris you are using.

>> If you are using Solaris 10:

You must change the default resource limits for each zone WebSphere MQ will be installed in. 
To set new default limits for all users in the mqm group, set up a project for the mqm group in each zone.

To find out if you already have a project for the mqm group, log in as root and enter the following command:
projects -lIf you do not already have a group.mqm project defined, enter the following command:

projadd -c "WebSphere MQ default settings" 
        -K "process.max-file-descriptor=(basic,10000,deny)" 
        -K "project.max-shm-memory=(priv,4GB,deny)"
        -K "project.max-shm-ids=(priv,1024,deny)" 
        -K "project.max-sem-ids=(priv,1024,deny)" group.mqm

If a project called group.mqm is listed, review the attributes for that project. 
The attributes must include the following minimum values:


process.max-file-descriptor=(basic,10000,deny)
project.max-sem-ids=(priv,1024,deny)
project.max-shm-ids=(priv,1024,deny)"
project.max-shm-memory=(priv,4294967296,deny)

If you need to change any of these values, enter the following command:

projmod -s -K "process.max-file-descriptor=(basic,10000,deny)" 
           -K "project.max-shm-memory=(priv,4GB,deny)" 
           -K "project.max-shm-ids=(priv,1024,deny)"
           -K "project.max-sem-ids=(priv,1024,deny)" group.mqm

Note that you can omit any attributes from this command that are already correct.
For example, to change only the number of file descriptors, enter the following command: 
projmod -s -K "process.max-file-descriptor=(basic,10000,deny)" group.mqm
(To set only the limits for starting the queue manager under the mqm user, 
login as mqm and enter the command projects. The first listed project is likely to be default, 
and so you can use default instead of group.mqm, with the projmod command.) 

You can find out what the file descriptor limits for the current project are, by compiling and running the following program:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

int main () {
  int fd;
  for (;;) {
    fd = open ("./tryfd", O_RDONLY);
    printf ("fd is %d\n", fd);
    if (fd == -1)  break;
  }
}

To ensure that the attributes for the project group.mqm are used by a user session when running Websphere MQ, 
make sure that the primary group of that user ID is mqm. In the above examples, the group.mqm project ID will be used. 
For further information on how projects are associated with user sessions, see Sun's System Administration Guide: 
Solaris Containers-Resource Management and Solaris Zones for your release of Solaris.

>>> If you are using Solaris 8 or Solaris 9:

Review the system's current resource limit configuration.

As the root user, load the relevant kernel modules into the running system by typing the following commands:
modload -p sys/msgsys
modload -p sys/shmsys
modload -p sys/semsys 

Then display your current settings by typing the following command:

sysdef

Check that the following parameters are set to the minimum values required by WebSphere MQ, or higher. 
The minimum values required by WebSphere MQ are documented in the tables below.

Table 1. Minimum values for semaphores required by WebSphere MQ Parameter Minimum value 
SEMMNI 1024 
SEMAEM 16384 
SEMVMX 32767 
SEMMNS 16384 
SEMMSL 100 
SEMOPM 100 
SEMMNU 16384 
SEMUME 256 

Table 2. Minimum values for shared memory required by WebSphere MQ Parameter Minimum value 
SHMMAX 4294967295 
SHMMNI 1024 
SHMSEG (Solaris 8 only) 1024 

Table 3. Minimum values for file descriptors required by WebSphere MQ Parameter Minimum value 
rlim_fd_cur 10000 
rlim_fd_max 10000 

To change any parameters that are lower than the minimum value required by WebSphere MQ, edit your 
/etc/system file to include the relevant lines from the following list:

set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmni=1024
set semsys:seminfo_semmni=1024
set semsys:seminfo_semaem=16384
set semsys:seminfo_semvmx=32767
set semsys:seminfo_semmns=16384
set semsys:seminfo_semmsl=100
set semsys:seminfo_semopm=100
set semsys:seminfo_semmnu=16384
set semsys:seminfo_semume=256
set shmsys:shminfo_shmseg=1024
set rlim_fd_cur=10000
set rlim_fd_max=10000Note: 

These values are suitable for running WebSphere MQ, other products on the system might require higher values. 
Do not change the value of shmmin from the system default value. 
Semaphore and swap usage does not vary significantly with message rate or persistence. 
WebSphere MQ queue managers are generally independent of each other. Therefore system kernel parameters, 
for example shmmni, semmni, semmns, and semmnu need to allow for the number of queue managers in the system. 
After saving the /etc/system file, you must reboot your system


-------
Note:
-------

Article:

Kernel parameters Linux and MQ:


Kernel configuration
WebSphere® MQ makes use of System V IPC resources, in particular shared memory and semaphores. The default configuration 
of these resources, supplied with your installation, is probably adequate for WebSphere MQ but if you have 
a large number of queues or connected applications, you might need to increase this configuration.

You can determine the amount of System V IPC resources available by looking at the contents of the following files: 

  /proc/sys/kernel/shmmax - The maximum size of a shared memory segment.
  /proc/sys/kernel/shmmni - The maximum number of shared memory segments.
  /proc/sys/kernel/shmall - The maximum amount of shared memory that can be allocated.
  /proc/sys/kernel/sem    - The maximum number and size of semaphore sets  that can be allocated.

For example, to view the maximum size of a shared memory segment that can be created enter: 

  cat /proc/sys/kernel/shmmax

To change the maximum size of a shared memory segment to 256 MB enter: 

  echo 268435456 > /proc/sys/kernel/shmmax

To view the maximum number of semaphores and semaphore sets which can be created enter: 

cat /proc/sys/kernel/sem

This returns 4 numbers indicating:
 SEMMSL - The maximum number of semaphores in a sempahore set
 SEMMNS - The maximum number of sempahores in the system
 SEMOPM - The maximum number of operations in a single semop call
 SEMMNI - The maximum number of sempahore sets   

For WebSphere MQ: 
the SEMMSL value must be 128 or greater 
the SEMOPM value must be 5 or greater 
the SEMMNS value must be 16384 or greater 
the SEMMNI value must be 1024 or greater 

To increase the maximum number of semaphores available to WebSphere MQ, you should update the SEMMNS and SEMMNI values.
To configure these values every time the machine is restarted you are recommended to add these commands 
to a startup script in /etc/rc.d/...


------
Note:
------

MQ Processes: List 1:
=====================


MQSERIES PROCESSES BY PLATFORM 

PLATFORM =AIX 
ProcName        Process Function 
amqhasmx        logger 
amqharmx        log formatter,used only if the queue manager has linear logging selected 
amqzllp0        checkpoint processor 
amqzlaa0        queue manager agent(s) 
amqzxma0        processing controller 
runmqsc 	MQ Command interface 
amqpcsea        PCF command processor 
amqcrsta        Any remotely started channel over TCP/IP - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
amqcrs6a        Any remotely started channel over LU62/SNA - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
runmqchl        Any locally started channel over any protocol - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
runmqlsr        listener process 
runmqchi        channel initiator 



PLATFORM = AS/400 
ProcName        Process Function 
AMQHIXK4        Storage Manager (Housekeeper) 
AMQMCPRA        Data Store (Object Cache) 
AMQCLMAA        Listener 
AMQALMP4        Check Point Process 
AMQRMCLA        Sender channel 
AMQPCSVA        PCF command processor 
AMQRIMNA        Channel initiator (trigger monitor to start channel) 
AMQIQES4        Quiesce (forces user logoffs - for upgrades) 
AMQIQEJ4        Quiesce (without user logoffs - for daily use if desired) 
AMQCRSTA        Any remotely started channel over TCP/IP - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
AMQCRS6A        Any remotely started channel over LU62/SNA - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 



PLATFORM = HP/UX 
ProcName        Process Function 
amqhasmx        logger 
amqharmx        log formatter, used only if the queue manager has linear logging selected 
amqzllp0        checkpoint processor 
amqzlaa0        queue manager agents 
amqzxma0        processing controller 
runmqsc         MQ Command interface 
amqpcsea        PCF command processor 
amqcrsta        Any remotely started channel over TCP/IP - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
amqcrs6a        Any remotely started channel over LU62/SNA - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
runmqchl        Any locally started channel over any protocol - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
runmqlsr        listener process 
runmqchi        channel initiator 



PLATFORM = OS2 
ProcName        Process Function 
AMQHASM2.EXE    The logger 
AMQHARM2.EXE    Log formatter (LINEAR logs only) 
AMQZLLP0.EXE    Checkpoint process 
AMQZLAA0.EXE    LQM agents 
AMQZXMA0.EXE    Execution controller 
AMQXSSV2.EXE    Shared memory servers 
RUNMQSC.EXE     MQSeries Command processor 
AMQPCSEA.EXE    PCF command processor 
AMQCRSTA.EXE    Any remotely started channel over TCP/IP - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
AMQCRS6A.EXE    Any remotely started channel over LU62/SNA - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
RUNMQCHL.EXE    Any locally started channel over any protocol - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
RUNMQLSR        LISTENER PROCESS 
RUNMQCHI        CHANNEL INITIATOR 



PLATFORM = SOLARIS 
ProcName        Process Function 
amqhasmx        logger 
amqharmx        log formatter, used only if the queue manager has linear logging selected 
amqzllp0        checkpoint processor 
amqzlaa0        queue manager agents 
amqzxma0        processing controller 
amqcrsta        Any remotely started channel over TCP/IP - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
amqcrs6a        Any remotely started channel over LU62/SNA - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
runmqchl        Any locally started channel over any protocol - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
runmqlsr        listener process 
runmqchi        channel initiator 
runmqsc         MQ Command interface 
amqpcsea        PCF command processor 



Windows/NT 
ProcName        Process Function 
AMQHASMN.EXE   The logger 
AMQHARMN.EXE   Log formatter (LINEAR logs only) 
AMQZLLP0.EXE   Checkpoint process 
AMQZLAA0.EXE   LQM agents 
AMQZTRCN.EXE   Trace 
AMQZXMA0.EXE   Execution controller 
AMQXSSVN.EXE   Shared memory servers 
AMQCRSTA.EXE   Any remotely started channel over TCP/IP - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
AMQCRS6A.EXE   Any remotely started channel over LU62/SNA - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
RUNMQCHL.EXE   Any locally started channel over any protocol - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
RUNMQLSR       LISTENER PROCESS 
RUNMQCHI       CHANNEL INITIATOR 
RUNMQSC.EXE    MQSeries Command processor 
AMQPCSEA.EXE   PCF command processor 
AMQSCM.EXE     Service Control Manager 



Process Names     Process Function 
amqpcsea        Command server 
amqhasmx        Logger 
amqharmx        Log formatter (linear logs only) 
amqzllp0        Checkpoint processor 
amqzlaa0        Queue manager agents 
amqzfuma        OAM process 
amqzxma0        Processing controller 
amqrrmfa        Repository process (for clusters) 
amqzdmaa        Deferred message processor 
 

OS/390 and z/OS are very simple: 

qmgrMSTR - the main address space (manager, API calls etc) 
qmgrCHIN - communications (listener, channels) 

qmgr = name of the queue manager



MQ Processes: List 2:
=====================


Windows/NT 
AMQHASMN.EXE - The logger 
AMQHARMN.EXE - Log formatter (LINEAR logs only) 
AMQZLLP0.EXE - Checkpoint process 
AMQZLAA0.EXE - LQM agents 
AMQZTRCN.EXE - Trace 
AMQZXMA0.EXE - Execution controller 
AMQXSSVN.EXE - Shared memory servers 
AMQCRSTA.EXE - Any remotely started channel over TCP/IP 
             - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
AMQCRS6A.EXE - Any remotely started channel over LU62/SNA 
             - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
RUNMQCHL.EXE - Any locally started channel over any protocol 
             - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
RUNMQLSR     - LISTENER PROCESS 
RUNMQCHI     - CHANNEL INITIATOR 
RUNMQSC.EXE  - MQSeries Command processor 
AMQPCSEA.EXE - PCF command processor 
AMQSCM.EXE   - Service Control Manager 

SOLARIS 
amqhasmx - logger 
amqharmx - log formatter, used only if the queue manager has linear logging selected 
amqzllp0 - checkpoint processor 
amqzlaa0 - queue manager agents 
amqzxma0 - processing controller 
amqcrsta - Any remotely started channel over TCP/IP 
         - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
amqcrs6a - Any remotely started channel over LU62/SNA 
         - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
runmqchl - Any locally started channel over any protocol 
         - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
runmqlsr - listener process 
runmqchi - channel initiator 
runmqsc  - MQ Command interface 
amqpcsea - PCF command processor 



AS/400 
AMQHIXK4 - Storage Manager (Housekeeper) 
AMQMCPRA - Data Store (Object Cache) 
AMQCLMAA - Listener 
AMQALMP4 - Check Point Process 
AMQRMCLA - Sender channel 
AMQCRSTA - Any remotely started channel over TCP/IP 
         - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
AMQCRS6A - Any remotely started channel over LU62/SNA 
         - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
AMQPCSVA - PCF command processor 
AMQRIMNA - Channel initiator (trigger monitor to start channel) 
AMQIQES4 - Quiesce (forces user logoffs - for upgrades) 
AMQIQEJ4 - Quiesce (without user logoffs - for daily use if desired) 



AIX 
amqhasmx - logger 
amqharmx - log formatter, used only if the queue manager has linear logging selected 
amqzllp0 - checkpoint processor 
amqzlaa0 - queue manager agent(s) 
amqzxma0 - processing controller 
amqcrsta - Any remotely started channel over TCP/IP 
         - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
amqcrs6a - Any remotely started channel over LU62/SNA 
         - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
runmqchl - Any locally started channel over any protocol 
         - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
runmqlsr - listener process 
runmqchi - channel initiator 
runmqsc  - MQ Command interface 
amqpcsea - PCF command processor 

HP/UX 
amqhasmx - logger 
amqharmx - log formatter, used only if the queue manager has linear logging selected 
amqzllp0 - checkpoint processor 
amqzlaa0 - queue manager agents 
amqzxma0 - processing controller 
amqcrsta - Any remotely started channel over TCP/IP 
         - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
amqcrs6a - Any remotely started channel over LU62/SNA 
         - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
runmqchl - Any locally started channel over any protocol 
         - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
runmqlsr - listener process 
runmqchi - channel initiator 
runmqsc  - MQ Command interface 
amqpcsea - PCF command processor 

OS2 
AMQHASM2.EXE - The logger 
AMQHARM2.EXE - Log formatter (LINEAR logs only) 
AMQZLLP0.EXE - Checkpoint process 
AMQZLAA0.EXE - LQM agents 
AMQZXMA0.EXE - Execution controller 
AMQXSSV2.EXE - Shared memory servers 
AMQCRSTA.EXE - Any remotely started channel over TCP/IP 
             - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
AMQCRS6A.EXE - Any remotely started channel over LU62/SNA 
             - Could be RECEIVER,REQUESTER,CLUSRCVR,SVRCONN,SENDER,SERVER 
RUNMQCHL.EXE - Any locally started channel over any protocol 
             - Could be SENDER,SERVER,CLUSSDR,REQUESTER 
RUNMQLSR     - LISTENER PROCESS 
RUNMQCHI     - CHANNEL INITIATOR 
RUNMQSC.EXE  - MQSeries Command processor 
AMQPCSEA.EXE - PCF command processor 


MQSERIES PROCESSES BY PLATFORM

PLATFORM =AIX
ProcName        Process Function
amqhasmx        logger
amqharmx        log formatter,used only if the queue manager has linear
logging selected
amqzllp0        checkpoint processor
amqzlaa0        queue manager agent(s)
amqzxma0        processing controller
amqcrsta        TCPIP Receiver channel & Client Connection
amqcrs6a        LU62 Receiver channel & Client Connection
runmqchl        Sender Channel
runmqsc MQ Command interface
amqpcsea        PCF command processor


PLATFORM = AS/400
ProcName        Process Function

AMQHIXK4        Storage Manager (Housekeeper)
AMQMCPRA        Data Store (Object Cache)
AMQCLMAA        Listener
AMQALMP4        Check Point Process
AMQRMCLA        Sender channel
AMQCRSTA        TCP/IP Receiver channel & Client Connection
AMQCRS6A        LU62 Receiver channel & Client Connection
AMQPCSVA        PCF command processor
AMQRIMNA        Channel initiator (trigger monitor to start channel)
AMQIQES4        Quiesce (forces user logoffs - for upgrades)
AMQIQEJ4        Quiesce (without user logoffs - for daily use if desired)


PLATFORM = HP/UX
ProcName        Process Function
amqhasmx        logger
amqharmx        log formatter, used only if the queue manager has linear
logging selected
amqzllp0        checkpoint processor
amqzlaa0        queue manager agents
amqzxma0        processing controller
amqcrsta        TCPIP Receiver channel & Client Connection
amqcrs6a        LU62 Receiver channel & Client Connection
runmqchl        Sender Channel
runmqsc MQ Command interface
amqpcsea        PCF command processor

PLATFORM = OS2
ProcName                Process Function
AMQHASM2.EXE    The logger
AMQHARM2.EXE    Log formatter (LINEAR logs only)
AMQZLLP0.EXE    Checkpoint process
AMQZLAA0.EXE    LQM agents
AMQZXMA0.EXE    Execution controller
AMQXSSV2.EXE    Shared memory servers
AMQCRSTA.EXE    TCPIP Receiver channel & Client Connection
AMQCRS6A.EXE    LU62 Receiver channel & Client Connection
RUNMQCHL.EXE    Sender Channel
RUNMQSC.EXE             MQSeries Command processor
AMQPCSEA.EXE    PCF command processor


PLATFORM = SOLARIS
ProcName        Process Function
amqhasmx        logger
amqharmx        log formatter, used only if the queue manager has linear
logging selected
amqzllp0        checkpoint processor
amqzlaa0        queue manager agents
amqzxma0        processing controller
amqcrsta        TCPIP Receiver channel & Client Connection



Process Names     Process Function

amqpcsea        Command server
amqhasmx        Logger
amqharmx        Log formatter (linear logs only)
amqzllp0        Checkpoint processor
amqzlaa0        Queue manager agents
amqzfuma        OAM process
amqzxma0        Processing controller
amqrrmfa        Repository process (for clusters)
amqzdmaa        Deferred message processor



MQ Processes: List 3:
=====================

Description of WebSphere MQ tasks
When a queue manager is running, you see some or all of the following batch jobs running under the QMQM user profile 
in the WebSphere MQ subsystem. The jobs are described briefly in Table 1, to help you decide how to prioritize them.

Table 1. WebSphere MQ tasks. Job name Function 
AMQALMPX The checkpoint processor that periodically takes journal checkpoints 
AMQCLMAA Non-threaded TCP/IP listener 
AMQCRSTA TCP/IP-invoked channel responder 
AMQCRS6B LU62 receiver channel and client connection (see note). 
AMQFCXBA Broker worker job 
AMQPCSEA PCF command processor that handles PCF and remote administration requests 
AMQRMPPA Channel process pooling job 
AMQRRMFA Repository manager for clusters 
AMQZDMAA Deferred message handler 
AMQZFUMA Object authority manager (OAM) 
AMQZLAA0 Queue manager agents that perform the bulk of the work for applications that connect to the queue manager using MQCNO_STANDARD_BINDING 
AMQZLAS0 Queue manager agent 
AMQZXMA0 The execution controller that is the first job started by the queue manager. It deals with MQCONN requests, 
          and starts agent processes to process WebSphere MQ API calls 
AMQZMGR0 Process controller. This job is used to start up and manage listeners and services. 
AMQZMUC0 Utility manager. This job executes critical queue manager utilities, for example the journal chain manager. 
AMQZMUR0 Utility manager. This job executes critical queue manager utilities, for example the journal chain manager. 
RUNMQBRK Broker control job 
RUNMQCHI The channel initiator 
RUNMQCHL Sender channel job that is started for each sender channel 
RUNMQDLQ Dead letter queue handler 
RUNMQLSR Threaded TCP/IP listener 
RUNMQTRM Trigger monitor 
Note:
The LU62 receiver job runs in the communications subsystem and takes its run-time properties from the routing and communications 
entries that are used to start the job. See WebSphere MQ Intercommunication for more details. 

You can view all jobs connected to a queue manager, except listeners (which do not connect), 
using option 22 on the Work with Queue Manager (WRKMQM) panel. You can view listeners using the WRKMQMLSR command

 


=========================
87. Connect Direct:
=========================



--------------
Note 1: Intro:
--------------


Do you mean Network Data Manager (NDM) ??? 

File transmission protocol software. 
Normal Disconnect Mode. No connection is established to another station on the physical medium. 

Network Data Mover (NDM) is a legacy file transfer product developed by System Center. NDM is now owned by Sterling Commerce 
and has been renamed to Connect:Direct. 

Just a quick Overview on Connect:Direct 

Today?s organizations need more than FTP. Every organization depends on the reliable movement of files. 
From batch integration, to the movement of large images or catalogs, to the synchronization of remote locations or 
disaster-recovery sites, your business needs file transfer. For years, many organizations have turned to unmanaged, 
unsecured FTP to meet this need. But increasingly rigorous security policies and shorter processing windows 
have companies looking for alternatives to traditional FTP. 

File transfer you can count on. Connect:Direct is the point-to-point file transfer software optimized for high-volume, 
secure, assured delivery of files within and among enterprises. Proven in the demanding financial services 
and telecommunications industries, where it moves terabytes of business information daily, Connect:Direct can deliver your files with: 

Predictability ? Assured delivery is provided via automated scheduling, checkpoint restart and automatic recovery/retry 

Security ? Ensures that your customer information stays private, and that your file transfers are auditable 
for regulatory compliance via a proprietary protocol, authorization and encryption 

Performance ? Handles your most demanding loads, from high volumes of small files to multi-gigabyte files 

Connect:Direct provides a common command structure and syntax across most operating systems in commercial use today. 

So for more informations contact Sterling Industries. 


Typical processes:

cduser@arcturus:/appl/cd4sn/ndm/cfg/ndm.nawaga $ ps -ef | grep cd4
cd4sngat  471088       1   0   Apr 27      -  1:44 /appl/cd4sn/ndm/bin/cdpmgr -i /appl/cd4sn/ndm/cfg/ndm.nawaga/initparm.cfg
cd4sngat  598270  958472   0 13:47:56  pts/0  0:00 tail -f beheer.log
cd4sngat  774210 1040616   0 13:09:34      -  0:00 /appl/cd4sn/ndm/SwiftNet/Version3/program/SwiftCmdServer -f /appl/cd4sn/ndm/SwiftNet/Version3 -s 1
cd4sngat  868398  704540   0 14:10:06  pts/2  0:00 grep cd4
cd4sngat  913624       1   0   Apr 27      -  0:00 /appl/cd4sn/ndm/bin/cdstatm -i /appl/cd4sn/ndm/cfg/ndm.nawaga/initparm.cfg
cd4sngat  958472  651372   0 10:11:11  pts/0  0:00 -ksh
cd4sngat 1040616       1   0 13:09:34      -  0:00 /appl/cd4sn/ndm/SwiftNet/Version3/program/SwiftConnectd -f /appl/cd4sn/ndm/SwiftNet/Version3 -s 1
cd4sngat 1093700  696520   0 13:37:15  pts/1  0:00 tail -f beheer.log


=========================
88. HPUX Service Guard commands:
=========================


Service Guard commands:
=======================

=> Viewing cluster and package status:

# cmviewcl –v

This will tell you the detailed status of the cluster, nodes, packages and services. 
For simple cluster status you can use cmviewcl also.

# cmgetconf -C config_name

This will show the particular config_name.

=> start the cluster:

# cmruncl -v              # start entire cluster 
# cmruncl -v -n nodename  # if only one node is available

=> start cluster on one node:

# cmrunnode –v
# cmrunnode –v othernode  # start a single node 

This command will start the specified node to join an already running cluster.

=> Running a package

# cmrunpkg [ -n <node name> ] <packag name>

This will run the package on the current node or on the node specified. 
Logs will be written in /etc/cmcluster/<SID>/<control_script>.log.

=>  halt the cluster:

# cmhaltcl –v
# cmhaltcl –f

This will force the packages to halt and after that it halts ServiceGuard operations on all nodes 
which are currently running in the cluster

=> stop cluster on one node:

# cmhaltnode –v
# cmhaltnode –v othernode

This command will halt ServiceGuard operations on the specified node. If any packages are running
on that node, the node will not be halted.

# cmhaltnode –f <node name>

Force the node to halt even if there are packages or group members running on it

=> Halting a package:

# cmhaltpkg <packag name>

This will halt the package, Logs will be written in /etc/cmcluster/<packag name>/<control_script>.log.

=> enable or disable switching attributes for a cluster

# cmmodpkg –e/-d <packag name>

Enabling a package to run on a particular node
After a package has failed on one node, that node is disabled. This means the package 
will not be able to run on that node. The following command will enable the package to run on the specified node.

# cmmodpkg –e -n [<node name>] <package name>

=> Disabling a package from running on a particular node

# cmmodpkg-d-n <node name> <packag name>

This will command will disable the package to run on the specified node.

daemons that control MC/Serviceguard:
=====================================

The main Cluster Management Daemon is a process called cmcld.

It is up to Serviceguard to activate a volume group on a node that needs access to the data. In order to disable volume
group activation at boot time, we need to modify the startup script /etc/lvmrc. The first
part of this process is as follows:
AUTO_VG_ACTIVATE=1
Changed to …
AUTO_VG_ACTIVATE=0

Make sure every node has the /etc/cmcluster/cmclnodelist file in place

There are the OS MC ServiceGurard Components, and the Application Packages. Eight Daemons are associated with MC/ServiceGuard.

/usr/lbin/cmclconfd     ---ServiceGuard Configuration Daemon (gathers cluster info ie network and vol grp info started in /etc/inetd.conf)
/usr/lbin/cmcld         ---ServiceGuard Cluster Daemon (determines cluster membership. Package Mgr, Cluster Mgr, and Network Mgr run as parts of cmcld.)
/usr/lbin/cmlogd        ---ServiceGuard Syslog Log Daemon (used by cmcld to write syslog messages.)
/usr/lbin/cmlvmd        ---Cluster Logical Volume Manager Daemon (keeps track of Volume group info.)
/usr/lbin/cmomd         ---Cluster Object Manager Daemon - logs to /var/opt/cmom/cmomd.log (provides info to client about the cluster. /etc/inetd.conf.)
/usr/lbin/cmsnmpd       ---Cluster SNMP subagent (optionally running) (produces MIB for snmp)
/usr/lbin/cmsrvassistd  ---ServiceGuard Service Assistant Daemon (fork and exec scripts for the cluster.)
/usr/lbin/cmtaped       ---ServiceGuard Shared Tape Daemon (keeps track of shared tape devices.)

Each of these daemons logs to the /var/adm/syslog/syslog.log file


Information about the starting and halting of each package is found in the package’s 
control script log. This log provides the history of the operation of the package control script. 
It is found at /etc/cmcluster/<pkgname>/pkgname.cntl.log or /etc/cmcluster/package_name/control_script.log.

You can also find in /var/adm/syslog/syslog.log which indicate what has occurred and whether or not 
the package has halted or started.

Configuration Files:
====================

/etc/cmcluster/cmclnodelist                       – Contains the list of nodes in the cluster 
/etc/cmcluster/cluster_config.ascii               - cluster configuration file. 
/etc/cmcluster/package_name/package_config.ascii  - package configuration file
/etc/cmcluster/package_name/package.cntl          - package control script 
/etc/cmcluster/<packag name>/<control_script>.log - package control script log


# cd /etc/cmcluster
# cmquerycl –v –C cluster.ascii –n node1 –n node2


Being “cluster aware” sets a flag in the VGRA that allows the volume group to be activated in
“exclusive” mode (vgchange –a e)

root@hpeos001[cmcluster] # cmcheckconf -v -C cluster.ascii
Checking


Compile and distribute binary cluster configuration file

root@hpeos001[cmcluster] # cmapplyconf -v -C cluster.ascii


Some scenarios in cluster:
==========================

=> Fail over without halting clustering on either node:


1. cmviewcl –v 

This will display status packages and nodes defined to cluster. Verify status of nodes and pkgs 
before taking any action.

2. cmhaltpkg –n <nodename> –v <pkgname> 

command can be issued from either node; if node name not specified, command will be executed 
on whichever node it is issued from

3. Wait to see results of command; tail –f /etc/cmcluster/<pkgname>.cntl.log to determine
success or failure of halt command. If successful, move on to step 3.

4. cmmodpkg –e –n <nodename> -v <pkgname> 

enables pkg to run, and enables pkg switching. This can be issued on either node. 
It will automatically start pkg on it’s adoptive node if nodename is not specified

5. cmrunpkg –n <nodename> -v <pkgname> 

starts specified pkg on specified node. Can be run from either node.


=> Fail over of one node to another, halting clustering on one node:

1. cmhaltnode –f –v <nodename> 

halts clustering on node specified, and fails over any running pkgs to other node.

2. Check /etc/cmcluster/<pkgname>.cntl.log on each machine to verify that the pkg did
shutdown on one node and then started on the other node.


=> Run both the packages to single node (other server is shutdown completely)

cmruncl -n < node_name>



=========================
89. OTHER STUFF SECTION:
=========================


Here we comment on a variety of programs, or demons, or commands, found in Solaris, HP, Linux or AIX.


lrud:
=====

lrud (least recently used) is a page managing memory process in AIX.

Solaris uses a completely different page stealing algorythm to AIX, so 
you cannot compare the 2. 

AIX uses LRUD and Solaris uses LIFO (last in 1st 
out). 

Again, AIX will build as large a filesystem cache as possible by 
default. when it hits minperm it is going to scan for pages to free up 
and free them according to LRUD algorythm... the pages it frees up are 
dependant on the number of filepages cached and the maxperm / minperm 
settings. 

If numperm is above maxperm it is non discriminate over what pages to 
mark as candidates to free up, but if numperm is below maxperm, then 
it will only mark file (persistient) pages as candidates to get the 
size of the fs cache down. 

NOTE: by default it only does this once you hit minfree.. 

To strictly set the maximum number of file pages cached you would set 
strict_maxperm, but you usually do not have to do this unless you are 
working with a very large amount of memory (64Gb and up) ... so, i 
would leave well alone if you only have a couple of GB... 


gil:
====

GIL is a kernel process, which does TCP/IP timing. It handles
transmission errors, ACKs, etc. Normally it shouldn't consume too much
CPU, but it can take quite a lot of CPU when the system is using the
network a lot (like with NFS filesystems which are heavily used).
.
The kproc gil runs the TCP/IP timer driven operations. Every 200ms, and
every 500ms the GIL thread is kicked to go run protocol timers. With TCP
up (which is ALWAYS the case), TCP timers are called which end up
looking at every connection on the system (to do retransmission, delayed
acks,etc). In version 4 this work is all done on a multi-threaded kproc
to promote concurrency and SMP scalability.gil.

GIL is one of the kprocs (kernel processes) in AIX 4.3.3, 5.1 and 5.2.
Since the advent of topas in AIX 4.3.3 and changes made to the ps
command in AIX 5.1, system administrators have become aware of this
class of processes, which are not new to AIX. These kprocs have no
user interfaces and have been largely undocumented in base
documentation. Once a kproc is started, typically it stays in the
process table until the next reboot. The system resources used by any
one kproc are accounted as kernel resources, so no separate account is
kept of resources used by an individual kproc.
.
Most of these kprocs are NOT described in base AIX documentation and
the descriptions below may be the most complete that can be found.
.
GIL term is an acronym for "Generalized Interrupt Level" and was
created by the Open Software Foundation (OSF), This is the networking
daemon responsible for processing all the network interrupts, including
incoming packets, tcp timers, etc.
.
Exactly how these kprocs function and much of their expected behavior
is considered IBM proprietary information.




picld:
------

The Platform Information and Control Library (PICL) provides a mechanism to publish platform-specific 
information for clients to access in a platform-independent way. picld maintains and controls access 
to the PICL information from clients and plug-in modules. 
The daemon is started in both single-user and multi-user boot mode.

Upon startup, the PICL daemon loads and initializes the plug-in modules. These modules use the 
libpicltree(3PICLTREE) interface to create nodes and properties in the PICL tree to publish 
platform configuration information. After the plug-in modules are initialized, the daemon opens 
the PICL daemon door to service client requests to access information in the PICL tree.

arraymon:
---------

arraymon is the disk array daemon process sometimes found in Solaris. It performs these major functions:

- Monitoring of the error information maintained by the disk array controllers.

- Reporting of events that require operator attention in a manner selected by the user via 
  the rmparams file and the rmscript file.

- Launching of the parityck utility at the designated time, if the parity check option is enabled.

arraymon maintains logs of the messages currently outstanding on the system console 
and in the file /etc/raid/rmlog.log. In addition, all error information is written to the 
system error log /var/adm/messages ).

sar:
----

System activity data can be accessed at the special  request of  a  user  (see  sar(1))  and  automatically, 
on a routine basis, as described  here.  The  operating  system  contains several  counters  that  are  
incremented  as various system actions occur. These include counters for  CPU  utilization,
buffer  usage,  disk  and  tape  I/O  activity,  TTY  device activity, switching and system-call  activity,  
file-access, queue  activity,  inter-process  communications, and paging.
For  more   general  system  statistics,  use  iostat  (1M), sar(1), or vmstat(1M).

Note 1:
-------

I'm paring down processes and port listners on a Solaris 8 server to have the very minimal services/ports open. 
I have followed their guidelines/blueprints for Solaris 6 hardening.

I need to find out what is listening on the ports below and how to disable services for them.

Specifically, listners on ports 5987, 898, and 32768. (See netstat output below)

Also what are: root 181 1 0 15:08:10 ? 
0:00 /usr/sadm/lib/smc/bin/smcboot root 182 181 0 15:08:10 ? 
0:00 /usr/sadm/lib/smc/bin/smcboot root 56 1 0 15:08:04 ? 
0:00 /usr/lib/sysevent/syseventd root 58 1 0 15:08:05 ? 
0:00 /usr/lib/sysevent/syseventconfd root 67 1 0 15:08:05 ? 
0:01 /usr/lib/picl/picld root 202 1 0 15:08:12 ? 
0:00 /usr/lib/efcode/sparcv9/efdaemon

And can they be disabled? How?

This host will only run standalone firewall and sendmail only. 
On Solaris 2.6 these listners and procs do not exist.


Regarding the "smcboot" process the answer is simple. This is the boot process for the 
Solaris Management Console (SMC) which is a GUI (well - more a framework with a several existing modules) 
to manage your system.

If you're not interested to manage your host using SMC, then you can safely disable this 
(remove or diable /etc/rc2.d/S90wbem). This smc process is also responsible for listening on port 898 and 5987.

The port 32768 is not used for a fixed service. You should check your system to idenfity 
which process is using this port. This can be done by using the pfiles command, e.g. 
"cd /proc; /usr/proc/bin/pfiles * > /tmp/pfiles.out" and then look in /tmp/pfiles.out for the portnumber.

The picld process is a new abstraction layer for programs who want to access platform specific information. 
Instead of using some platform specific program applications can use the picl library to access 
information in a generic way.

Disabling the picld daemon will affect applications which are using the libpicltree. 
You can use the "ldd" command to identify such applications and decide whether you're using them or not. 
Example applications are "prtpicl" or "locator" (see the manpages).

The "syseventd" is responsible for delivering system events and disabling this service will affect your 
ability to create new devices on the fly (e.g. for dynamic reconfiguration). The "efdaemon" is another example 
of such a process which is needed for dynamic reconfiguration.

Disabling syseventd and/or efdaemon havily depends on your required services. 
After creating your devices (boot -r) you can safely turn of these daemons but you'll run into trouble 
when trying dynamic reconfiguration... Without knowing your requirements we can't tell whether it's ok 
to disable those services or not.


bpbkar:
=======

bpbkar is part of the Veritas Netbackup client, usually installed at
/usr/openv/netbackup .


<defunct> process:
==================

Note 1:

In general, defunct processes are caused by a parent process not reaping its children. Find out which process 
is the parent process of all those zombies (ps -e). It's that process that has a bug. 

In Solaris 2.3 (and presumably earlier) there is a bug in the pseudo tty modules that makes them hang in close. 
This causes processes to hang forever while exiting. 

Fix: Apply patch 101415-02 (for 2.3). 

In all Solaris 2 releases prior to 2.5 (also fixed in the latest 2.4 kernel jumbo patch), 
init (process 1) calls sync() every five minutes which can hang init for some considerable time. 
This can cause a lot of zombies accumulating with process 1 as parent, but occurs only in rare circumstances. 

Note 2:

My app has a parent that forks a child. Sometimes, one of them dies and leaves a defunct process, 
along with shared memory segments. I try to get rid of the shared memory and kill the defunct task, 
but to no avail. I then have to reboot the system to clean up the shared memory and to get rid 
of the defunct process. How can I kill a defunct process and get rid of the associated shared memory ?

A defunct task is already dead. You can not kill a "zombie".
The problem is obviously that the app does not expect a child to die and does not make the 
necessary wait calls to relieve the child from its return code.
Did you stopp the app and see what happens?

use ipcrm to release shared memory. 

But a zombie indicates also a programming problem with the application. 
So it is time to redesign the application. 

Note 3:

A zombie process is a process which has died and whose parent process is still running 
and has not wait()ed for it. In other words, if a process becomes a zombie, it means 
that the parent process has not called wait() or waitpid() to obtain the child process's 
termination status. Once the parent retrieves a child's termination status, that child process 
no longer appears in the process table.

You cannot kill a zombie process, because it's already dead. It is taking up space in the process table, 
and that's about it.

If any process terminates before its children do, init inherits those children. 
When they die, init calls one of the wait() functions to retrieve the child's termination status, 
and the child disappears from the process table.

A zombie process is not, in and of itself, harmful, unless there are many of them taking up space 
in the process table. But it's generally bad programming practice to leave zombies lying around, in the same way
that it's generally a Bad Thing to never bother to free memory you've malloc()ed. 

Note 4:

Other than Windows, unix manages an explicit parent-child relationships between processes. 
When a child process dies, the parent will receive a notification. It is then the duty of the parent process 
to explicitly take notice of the childs demise by using the wait() system call. The return value of the wait() 
is the process ID of the child, which gives the parent exact control about which of its children are still alive. 
As long as the parent hasn't called wait(), the system needs to keep the dead child in the global process list, 
because that's the only place where the process ID is stored. The purpose of the "zombies" is really just for 
the system to remember the process ID, so that it can inform the parent process about it on request. 
If the parent "forgets" to collect on its children, then the zombie will stay undead forever. 
Well, almost forever. If the parent itself dies, then "init" (the system process with the ID 0) will take over 
fostership over its children and catch up on the neglected parental duties



S_IFCHR and S_IFDOOR:
=====================

Suppose you use the pfiles command on a PID

# /usr/proc/bin/pfiles 194
194:    /usr/sbin/nscd
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0666 dev:85,1 ino:3291 uid:0 gid:3 rdev:13,2
      O_RDWR
   1: S_IFCHR mode:0666 dev:85,1 ino:3291 uid:0 gid:3 rdev:13,2
      O_RDWR
   2: S_IFCHR mode:0666 dev:85,1 ino:3291 uid:0 gid:3 rdev:13,2
      O_RDWR
   3: S_IFDOOR mode:0777 dev:275,0 ino:0 uid:0 gid:0 size:0
      O_RDWR FD_CLOEXEC  door to nscd[194]


# /usr/proc/bin/pfiles 254
254: /usr/dt/bin/dtlogin -daemon Current rlimit: 2014 file descriptors 
0: S_IFDIR mode:0755 dev:32,24 ino:2 uid:0 gid:0 size:512 
O_RDONLY|O_LARGEFILE 1: S_IFDIR mode:0755 dev:32,24 ino:2 uid:0 gid:0 size:512 
O_RDONLY|O_LARGEFILE 2: S_IFREG mode:0644 dev:32,24 ino:143623 uid:0 gid:0 size:41 
O_WRONLY|O_APPEND|O_LARGEFILE 3: S_IFCHR mode:0666 dev:32,24 ino:207727 uid:0 gid:3 rdev:13,12 
O_RDWR 4: S_IFSOCK mode:0666 dev:174,0 ino:4686 uid:0 gid:0 size:0 
O_RDWR|O_NONBLOCK 5: S_IFREG mode:0644 dev:32,24 ino:143624 uid:0 gid:0 size:4 
O_WRONLY|O_LARGEFILE advisory write lock set by process 245 7: 
S_IFSOCK mode:0666 dev:174,0 ino:3717 uid:0 gid:0 size:0 O_RDWR 8: 
S_IFDOOR mode:0444 dev:179,0 ino:65516 uid:0 gid:0 size:0 O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[171] 

This listing shows the files open by the dtlogin process. Notice how easy it is to decipher the file types 
in this output. We have: 

S_IFDIR directory files
S_IFREG regular files 
S_IFCHR character mode device 
S_IFSOCK sockets S_IFDOOR a "door" file

Flags That Specify Access Type
The following OFlag parameter flag values specify type of access:

O_RDONLY The file is opened for reading only. 
O_WRONLY The file is opened for writing only. 
O_RDWR The file is opened for both reading and writing. 


Limits on the number of files that a process can open can be changed system-wide in the /etc/system file. 

If you support a process that opens a lot of sockets, then you can monitor the number of open files 
and socket connections by using a command such as this: 

# /usr/proc/bin/pfiles <procID> | grep mode | wc -l 

The third limit determines how many file references can be held in memory at any time (in the inode cache). 
If you're running the sar utility, then a sar -v command will show you (in one column of its output (inod-sz)) 
the number of references in memory and the maximum possible. On most systems, these two numbers will be oddly 
stable throughout the day. The system maintains the references even after a process has stopped running 
-- just in case it might need them again. These references will be dropped and the space reused as needed. 
The sar output might look like this: 

00:00:00 proc-sz ov inod-sz ov file-sz ov 11:20:00 400/20440 0 41414/46666 0 1400/1400 0 0/0 

The 4th field reports the number of files currently referenced in the inode cache and 
the maximum that can be stored. 



EXP shell script:
=================

#!/usr/bin/ksh
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
export NLS_LANG
ORACLE_SID=ECM
export ORACLE_SID
cd /u03/dumps/ECM
mv ECM.dmp.Z ECMformer.dmp.Z
exp system/arcturus81 file=ECM.dmp full=y statistics=none
cp ECM.dmp /u01/dumps/ECM
compress -v ECM.dmp

xntpd:
======

The xntpd daemon sets and maintains a Unix system time-of-day in compliance with Internet standard time servers. 
The xntpd daemon is a complete implementation of the Network Time Protocol (NTP) version 3 standard, 
as defined by RFC 1305, and also retains compatibility with version 1 and 2 servers as defined by 
RFC 1059 and RFC 1119, respectively. The xntpd daemon does all computations in fixed point arithmetic 
and does not require floating point code. 

The xntpd daemon reads from a configuration file (/etc/ntp.conf is the default) at startup time. 
You can override the configuration file name from the command line. You can also specify a working, 
although limited, configuration entirely on the command line, eliminating the need for a configuration file. 
Use this method when configuring the xntpd daemon as a broadcast or multicast client, that determines 
all peers by listening to broadcasts at runtime. You can display the xntpd daemon internal variables with the 
ntpq command (Network Time Protocol (NTP) query program). You can alter configuration options 
with the xntpdc command. 

Note for AIX: checking the status of the xntpd subsystem:

# lssrc -s xntpd


utmpd:
======

Solaris:

NAME
     utmpd - utmpx monitoring daemon

SYNOPSIS
     utmpd [-debug]

DESCRIPTION
     The  utmpd daemon  monitors  the  /var/adm/utmpx  file.  See
     utmpx(4) (and utmp(4) for historical information).

     utmpd receives requests from  pututxline(3C)  by  way  of  a
     named  pipe.  It  maintains  a  table  of processes and uses
     poll(2) on /proc files to detect process  termination.  When
     utmpd  detects that a process has terminated, it checks that
     the process has removed its utmpx entry from /var/adm/utmpx.
     If  the  process'   utmpx entry has not been removed,  utmpd
     removes   the   entry.   By   periodically   scanning    the
     /var/adm/utmpx  file, utmpd also monitors processes that are
     not in its table.

OPTIONS
     -debug
           Run  in debug mode, leaving the process  connected  to
           the controlling terminal.  Write debugging information
           to standard output.

HP-UX 11i:

NAME    [Toc]    [Back]
      utmpd - user accounting database daemon

 SYNOPSIS    [Toc]    [Back]
      /usr/sbin/utmpd

 DESCRIPTION    [Toc]    [Back]
      utmpd, user accounting database daemon, manages the user accounting
      database which is the database of currently logged-in users.  This was
      previously maintained by /etc/utmp and /etc/utmpx files on HP-UX.

      Upon startup, utmpd writes its pid to the file
      /etc/useracct/utmpd_pid.  Applications can add, update, or query
      entries into the database using the getuts() APIs.  See the getuts(3C)
      manual page for more information.

      utmpd(1M) takes care of synchronizing the legacy /etc/utmpx file and
      its own in-memory database.  The synchronization is bi-directional
      from the utmpd's database to the /etc/utmpx and from the /etc/utmpx
      file to utmpd's database.  However, this synchronization does not
      happen in real time.  There is a time lag which could span from a few
      seconds on a lightly loaded system to a few minutes on a heavily
      loaded system.


pwconv:
=======

NAME
     pwconv - installs and updates /etc/shadow  with  information
     from /etc/passwd

DESCRIPTION
     The pwconv command  creates  and  updates  /etc/shadow  with
     information from /etc/passwd.

     pwconv relies on a special value  of  'x'  in  the  password
     field  of /etc/passwd.  This value of 'x' indicates that the
     password for the user is already in /etc/shadow  and  should
     not be modified.

     If the /etc/shadow file does not exist,  this  command  will
     create  /etc/shadow  with information from /etc/passwd.  The
     command populates /etc/shadow with the  user's  login  name,
     password, and password aging information.  If password aging
     information does not exist in /etc/passwd for a given  user,
     none  will  be  added  to  /etc/shadow.   However,  the last
     changed information will always be updated.

     If the /etc/shadow file does exist, the following tasks will
     be performed:

          Entries that are in the /etc/passwd file and not in the
          /etc/shadow file will be added to the /etc/shadow file.

          Entries that are in the /etc/shadow file and not in the
          /etc/passwd file will be removed from /etc/shadow.

          Password attributes (for example,  password  and  aging
          information) that exist in an /etc/passwd entry will be
          moved to the corresponding entry in /etc/shadow.

     The pwconv command can only be used by the super-user.



TCL:
====

Tcl

Description

The Tcl (Tool Command Language) provides a powerful platform for creating integration applications 
that tie together diverse applications, protocols, devices, and frameworks. 
When paired with the Tk toolkit, Tcl provides a fastest and powerful way to create cross-platform GUI applications. 
Tcl can also be used for a variety of web-related tasks and for creating powerful command languages for applications. 



ESCON:
======

The Enterprise System Connection Architectureİ, ESCON, was developed by IBM as a channel connection architecture 
with the intent of improving connectivity by incorporating fibre optics into a network. ESCON uses fibre optics 
to replace existing bus and tag cables in a new or existing data centre. Designed to connect a wide range of 
peripherals to IBM mainframe computers, the architecture supports data communications at a speed of 200Mbps.

Basically, its a fiber optic switch, connecting Control Units or other nodes.


FICON:
======

FICON (for Fiber Connectivity) is a high-speed input/output (I/O) interface for mainframe computer connections 
to storage devices or other nodes. As part of IBM's S/390 or z servers, FICON channels increase I/O capacity 
through the combination of a new architecture and faster physical link rates to make them up to eight times 
as efficient as ESCON (Enterprise System Connection), IBM's previous fiber optic channel standard.  


nscd:
=====

     nscd is a process that provides a cache for the most  common
     name  service requests. It starts up during multi-user boot.
     The default configuration-file /etc/nscd.conf determines the
     behavior of the cache daemon. See nscd.conf(4).

     nscd provides caching for the passwd(4), group(4), hosts(4),
     ipnodes(4),  exec_attr(4),  prof_attr(4),  and  user_attr(4)
     databases  through  standard  libc   interfaces,   such   as
     gethostbyname(3NSL),               getipnodebyname(3SOCKET),
     gethostbyaddr(3NSL), and others. Each cache has  a  separate
     time-to-live  for  its  data;  modifying  the local database
     (/etc/hosts, /etc/resolv.conf, and  so  forth)  causes  that
     cache  to become invalidated upon the next call to nscd. The
     shadow file is specifically not cached.  getspnam(3C)  calls
     remain uncached as a result.

     nscd also  acts  as  its  own  administration  tool.  If  an
     instance  of nscd is already running, commands are passed to
     the running version transparently.

     In order to preserve NIS+ security, the startup  script  for
     nscd (/etc/init.d/nscd) checks the permissions on the passwd
     table if NIS+ is being used. If this table cannot be read by
     unauthenticated  users,  then  nscd  will make sure that any
     encrypted password information returned from the NIS+ server
     is supplied only to the owner of that password.


A sample /etc/nscd.conf file, which minimizes the functionality of nscd, is as follows: 

logfile                 /var/adm/nscd.log
enable-cache            passwd          no
enable-cache            group           no
positive-time-to-live   hosts           3600
negative-time-to-live   hosts           5
suggested-size          hosts           211
keep-hot-count          hosts           20
old-data-ok             hosts           no
check-files             hosts           yes
enable-cache            exec_attr       no
enable-cache            prof_attr       no
enable-cache            user_attr       no


If your system has any instability with respect to host names and/or IP addresses, it is possible 
to substitute the following line for all the above lines containing hosts. 
This may slow down host name lookups, but it should fix the name translation problem. 

enable-cache            hosts           no


EBCDIC and unix:
================

thread 1:
---------

Take a look at the dd command with the option
dd conv=ebcdic

See man dd for more details.


thread 2:
---------

1. nvdmetoa command:

How to convert EBCDIC files to ASCII:
On your AIX system, the tool nvdmetoa might be present.

Examples:
 
nvdmetoa <AS400.dat  >AIXver3.dat 

Converts an EBCDIC file taken off an AS400 and converts to an ASCII file for the pSeries or RS/6000 

nvdmetoa 132 <AS400.txt  >AIXver3.txt 

Converts an EBCDIC file with a record length of 132 characters to an ASCII file with 132 bytes per line 
PLUS 1 byte for the linefeed character. 


thread 3:
---------

od command:

The od command translate a file into other formats, like for example hexadecimal format.
To translate a file into several formats at once, enter: 

# od -t cx a.out > a.xcd

This command writes the contents of the a.out file, in hexadecimal format (x) and character format (c), 
into the a.xcd file. 

thread 4:
---------

I'm using the DD command in UNIX to convert ASCII to EBCDIC so that I can print 
via "lp" to a AS/400 attached printer.  I'm using the AS/400 as a print server.  
The command below works fine except that the carriage return/line feed disappear.  
The file prints without the carriage return line feed.

Here is the unix command:

cat $file | dd ibs=80 cbs=132 conv=ebcdic | lp -d AS400PRNT -s




utmp, wtmp, failedlogin File Format:
====================================

AIX:
----

Purpose
Describes formats for user and accounting information.

Description
The utmp file, the wtmp file, and the failedlogin file contain records with user and accounting information.

When a user attempts to logs in, the login program writes entries in two files:

The /etc/utmp file, which contains a record of users logged into the system. 
The /var/adm/wtmp file (if it exists), which contains connect-time accounting records.
On an invalid login attempt, due to an incorrect login name or password, the login program makes an entry in:

The /etc/security/failedlogin file, which contains a record of unsuccessful login attempts.
The records in these files follow the utmp format, defined in the utmp.h header file.

To convert a binary record in wtmp format to an ASCII record called dummy.file, enter: 

/usr/sbin/acct/fwtmp < /var/adm/wtmp > /etc/dummy.file
The content of a binary wtmp file is redirected to a dummy ASCII file.

failedlogin:

Use the who command to read the contents of the /etc/security/failedlogin file:

# who /etc/security/failedlogin

# who /etc/security/failedlogin > /tmp/failed_login.txt

To clear the file use:

# cp /dev/null /etc/security/failedlogin



The /etc/default/login file:
==========================I=

If you uncomment, or put, the line

CONSOLE=/dev/console

into the "/etc/default/login" file, root can only logon from the console,
and not from other terminals.

Ofcourse, on a normal terminal, you can still logon with your useraccount and then su to root.




Notes on the libc libary in AIX 5L:
===================================

What is it?
-----------

Most unixes has a couple of important shared libraries. One of them is the libc.a lib on AIX.

libc = C Libary
glibc = GNU C library (on linux and open systems)

It is an XCOFF shared library under AIX and hence a critical part of the running
system. 

The standard C library, `libc.a', is automatically linked into your programs by the `gcc' control program. 
Or it is used by C/C++ compilers to create statically linked programs.
It provides many of the functions that are normally associated with C programs 

For each function or variable that the library provides, the definition of that symbol will include 
information on which header files to include in your source to obtain prototypes and type definitions 
relevant to the use of that symbol. 

Note that many of the functions in `libm.a' (the math library) are defined in `math.h' but are not present 
in libc.a. Some are, which may get confusing, but the rule of thumb is this--the C library contains 
those functions that ANSI dictates must exist, so that you don't need the -lm if you only use ANSI functions. 
In contrast, `libm.a' contains more functions and supports additional functionality such as the matherr 
call-back and compliance to several alternative 
standards of behavior in case of FP errors. 


Version:
--------

On AIX, you can determine the version of the libc fileset on your machine as follows:

# lslpp -l bos.rte.libc


Its gone, now what?
-------------------

Note: You might want to look at the "recsh" recovery shell command first.

Other ways to recover:

You can recover from this without rebooting or reinstalling, if you
have another copy of libc.a available that is also named "libc.a".  If
you moved libc.a to a different directory, you're in luck -- do the
following:

export LIBPATH=/other/directory


And your future commands will work.  But if you renamed libc.a, this
won't do it.  If you have an NFS mounted directory somewhere, you can
put libc.a on the that host, and point LIBPATH to that directory as
shown above.

Or..

If you have a good copy of from somewhere..

Copy the libc.a fix into place, e.g.,

a. # cp -f your_dir/locale_format/lib/libc.a /usr/ccs/lib/
b. # chown bin.bin /usr/ccs/lib/libc.a
c. # chmod 555 /usr/ccs/lib/libc.a
d. # ln -sf /usr/ccs/lib/libc.a /usr/lib/libs.a
e. # unset LIBPATH
f. # slibclean

Make sure that the new libraries will be picked up at
the next reboot.

Now Reboot.


IBM's version on how to recover:
--------------------------------


Restore Access to an Unlinked or Deleted System Library
When the existing libc.a library is not available, most operating system commands are not recognized. 
The most likely causes for this type of problem are the following: 

The link in /usr/lib no longer exists. 
The file in /usr/ccs/lib has been deleted.

The following procedure describes how to restore access to the libc.a library. This procedure requires 
system downtime. If possible, schedule your downtime when it least impacts your workload to protect 
yourself from a possible loss of data or functionality.

The information in this how-to was tested using AIXr 5.3. If you are using a different version or level of AIX, 
the results you obtain might vary significantly. 

Restore a Deleted Symbolic Link

Use the following procedure to restore a symbolic link from the /usr/lib/libc.a library to 
the /usr/ccs/lib/libc.a path:

With root authority, set the LIBPATH environment variable to point to the /usr/ccs/lib directory by typing 
the following commands: 
# LIBPATH=/usr/ccs/lib:/usr/lib
# export LIBPATH

At this point, you should be able to execute system commands. 
To restore the links from the /usr/lib/libc.a library and the /lib directory to the /usr/lib directory, 
type the following commands: 
ln -s /usr/ccs/lib/libc.a /usr/lib/libc.a
ln -s /usr/lib /lib

At this point, commands should run as before. If you still do not have access to a shell, 
skip the rest of this procedure and continue with the next section, Restore a Deleted System Library File. 
Type the following command to unset the LIBPATH environment variable. 

unset LIBPATH



Symbol resolution failed for /usr/lib/libc_r.alibc_r.a:
=======================================================


Note 1:
-------

libc_r.a is a standard re-entrant C library, which allows synchronization of the tasks at exit. 



Note 2:
-------

thread:

Q:

Hi there

I've just tried to install Informix 9.3 64-bit on AIX 52. It failed with the
error shown below. Any suggestions as to what could be wrong? I tried to
find information on the web as to what versions of Informix (if any) are
supported on AIX52, but could not find anything.

I would be grateful for any advice in this matter.


Disk Initializing Demo IBM Informix Dynamic Server
exec(): 0509-036 Cannot load program
/u01/app/informix-9.3-64/server/bin/oninit
because of the following errors:
0509-130 Symbol resolution failed for /usr/lib/libc_r.a[aio_64.o]
becau
e:
0509-136 Symbol kaio_rdwr64 (number 0) is not exported from
dependent module /unix.
0509-136 Symbol listio64 (number 1) is not exported from
dependent module /unix.
0509-136 Symbol acancel64 (number 2) is not exported from
dependent module /unix.
0509-136 Symbol iosuspend64 (number 3) is not exported from
dependent module /unix.
0509-136 Symbol aio_nwait (number 4) is not exported from
dependent module /unix.
0509-150 Dependent module libc_r.a(aio_64.o) could not be loaded.
0509-026 System error: Cannot run a file that does not have a valid
for
at.
0509-192 Examine .loader section symbols with the
'dump -Tv' command.
Bundle Install program has finished

A:

Did you enable AIX aio? If not then run the following smit command.

$ smit aio

Choose "Change / Show Characteristics of Asynchronous I/O"

Set the state to be configure at system restart to Available.
Set state of fast path to Enable.

Also check that you enabled 64-bit version of AIX run time.


Note 3:
-------

Q:

Suppose you get the error: Symbol resolution failed for /usr/lib/libc_r.a

Examples:

Error:  Exec(): 0509-036 Cannot load program
Article ID: 20180 
Software:  ArcGIS - ArcInfo 8.0.1, 8.0.2, 8.1 ArcView GIS 3.1, 3.2, 3.2a 
Platforms:  AIX 4.3.2.0, 4.3.3.0 

Error Message
Executing some ArcInfo Workstation commands, or running ArcView GIS cause the following errors to occur: 

Exec(): 0509-036 Cannot load program ... because of the 
0509-130 Symbol resolution failed for /usr/lib/libc_r.a(aio.o) because: 
0509-136 Symbol kaio_rdwr (number 0) is not exported from dependant module /unix 
0509-136 Symbol listio (number 1) is not exported from dependant 
module /unix 
0509-136 Symbol acancel (number 2) is not exported from dependant module /unix 
0509-136 Symbol iosuspend (number 3) is not exported from dependant module /unix 
0509-136 Symbol aio_nwait (number 4) is not exported from dependant module /unix 
0509-192 Examine .loader section symbols with the 'dump -Tv' command.


root@n5110l13:/appl/emcdctm/dba/log#cat dmw_et.log
Could not load program ./documentum:
Symbol resolution failed for /usr/lib/libc_r.a(aio.o) because:
        Symbol kaio_rdwr (number 0) is not exported from dependent
          module /unix.
        Symbol listio (number 1) is not exported from dependent
          module /unix.
        Symbol acancel (number 2) is not exported from dependent
          module /unix.
        Symbol iosuspend (number 3) is not exported from dependent
          module /unix.
        Symbol aio_nwait (number 4) is not exported from dependent
          module /unix.
        Symbol aio_nwait64 (number 5) is not exported from dependent
          module /unix.
        Symbol aio_nwait_timeout (number 6) is not exported from dependent
          module /unix.
        Symbol aio_nwait_timeout64 (number 7) is not exported from dependent
          module /unix.
System error: Error 0


A:

Cause
The AIX asynchronous I/O module has not been loaded.

Solution or Workaround
Load asynchronous I/O. You must do this as a ROOT user:

Use SMITTY and navigate to Devices > Async I/O > Change/Show. 
Make the defined option available. 
Reboot the machine. 

or

Enable AIO by running the following commands: 
/usr/sbin/chdev -l aio0 -a autoconfig=available 
/usr/sbin/mkdev -l aio0 




KDB kernel debugger and kdb command:
====================================

AIX Only

KDB kernel debugger and kdb command
This document describes the KDB kernel debugger and kdb command. The KDB kernel debugger and the kdb command 
are the primary tools a developer uses for debugging device drivers, kernel extensions, and the kernel itself. 
Although they appear similar to the user, the KDB kernel debugger and the kdb command are two separate tools:

-- KDB kernel debugger: 
The KDB kernel debugger is integrated into the kernel and allows full control of the system while a 
debugging session is in progress. The KDB kernel debugger allows for traditional debugging tasks such as 
setting breakpoints and single-stepping through code. 

-- kdb command: 
This command is implemented as an ordinary user-space program and is typically used for post-mortem analysis 
of a previously-crashed system by using a system dump file. The kdb command includes subcommands specific to the 
manipulation of system dumps. 

Both the KDB kernel debugger and kdb command allow the developer to display various structures normally found 
in the kernel's memory space. Both do the following:

-Provide numerous subcommands to decode various data structures found throughout the kernel. 
-Print the data structures in a user-friendly format. 
-Perform debugging at the machine instruction level. Although this is less convenient than source level debugging, 
 it allows the KDB kernel debugger and the kdb command to be used in the field where access to source code 
 might not be possible. 
-Process the debugging information found in XCOFF objects. This allows the use of symbolic names for functions 
 and global variables.


slibclean:
==========

AIX:

Note 1:

Removes any currently unused modules in kernel and library memory.

Syntax

# slibclean


Description
The slibclean command unloads all object files with load and use counts of 0. It can also be used to 
remove object files that are no longer used from both the shared library region and in the shared library 
and kernel text regions by removing object files that are no longer required.

Files
/usr/sbin/slibclean Contains the slibclean command. 






thread_getregs, thread_waitlock, sigprocmask:
=============================================

Note 1:
-------

thread:

Q:

thread_waitlock

Hello all 
Can someone please provide me with a link to where the above function is 
documented ?? I know its part of libc_r.a and is used for thread 
synchronization ... I need to get some details on the function as to 
what exactly it does since a program I'm trying to debug is getting a 
ENOTSUP error while calling this function ... 

Would really appreciate the help. 

A:

thread_waitlock()
Reply from Richard Joltes on 8/25/2003 5:48:00 PM  

This doesn't seem to be documented anywhere, but it appears this 
function is _not_ in libc(_r). I found this elsewhere: 

"...kernel symbols are defined by import lists found in /usr/lib. 
You'll need threads.exp and syscalls.exp. Look at the -bI option 
in the ld documentation." 

You'll find references to this function if you look through these 
two file so maybe that's your best option. Threads.exp even 
specifically says "the system calls listed below are not imported 
by libc.a." 


Note 2:
-------

thread:

APAR: IY17298 COMPID: 5765C3403 REL: 430 
ABSTRACT: ASSERT IN THREAD_SETSTATE_FAST 

PROBLEM DESCRIPTION: 
Program termination due to an assert in thread_setstate_fast. 

PROBLEM SUMMARY: 
Assert in thread_setstate_fast 

PROBLEM CONCLUSION: 
Increase lock scope. 


Note 3:
-------

thread:

Paul Pluzhnikov wrote: 
> "pankajtakawale" <pankaj.takaw...@gmail.com> writes: 

> > Here is the snippet of truss output 
> ... 
> > sbrk(0x00000060)                              Err#12 ENOMEM 


> > Do i need to increase swap space or thread stack size? 


> Increasing swap might help, but I would not expect it. 
> You are running out of *heap* space. Check your limits, e.g. 'ulimit 
> -a' in *sh or 'limit' in *csh. 

Yes process was running out of heap space. In my local environment I 
decreased soft limit of data segment and ran app. truss showed 'sbrk 
faild with ENOMEM'. Now Im planning to run app in very heavy 
configuration such that 'unlimited data segment' too will be 
insufficient. And on same configuration I will make app large addr 
space model by setting env variable LDR_CNTRL (app shud run in large 
addr space model). And will update thread with results. 
Thanks for your valuable help Paul. 


Note 4:
-------

+   On Linux, the interface exports a bunch of "#define __NR_foo 42" style 
+   definitions, so there is no implementation. 
+ 
+   On AIX, syscall numbers are not fixed ahead of time; in principle 
+   each process can have its own assignment of numbers to actual 
+   syscalls.  As a result we have a bunch of global variables to store 
+   the number for each syscall, which are assigned to at system 
+   startup, and a bunch of #defines which map "__NR_foo" names to 
+   these global variables.  Initially, when we don't know what a 
+   syscall's number is, it is set to __NR_AIX5_UNKNOWN. 
+ 
+   Therefore, on AIX, this module provides a home for those variables. 
+ 
+   It also provides VG_(aix5_register_syscall) to assign numbers to 
+   those variables. 
+*/ 

e.g.

+Int VG_(aix5_NR__sigqueue) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR__sigsuspend) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR__sigaction) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_sigprocmask) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_siglocalmask) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_count_event_waiters) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_waitact) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_waitlock_local) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_waitlock) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_wait) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_unlock) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_twakeup_unlock) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_twakeup_event) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_twakeup) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_tsleep_event) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_tsleep_chkpnt) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_tsleep) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_post_many) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_post) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_ue_proc_unregister) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_ue_proc_register) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_kthread_ctl) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR__thread_setsched) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_threads_runnable) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_getregs) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_terminate_unlock) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_terminate_ack) = __NR_AIX5_UNKNOWN; 
+Int VG_(aix5_NR_thread_setstate_fast) = __NR_AIX5_UNKNOWN; 
etc....



skulker command:
================

AIX:

The skulker command is a command file for periodically purging obsolete or unneeded files from file systems. 
Candidate files include files in the /tmp directory, files older than a specified age, a.out files, core files, 
or ed.hup files. 

The skulker command is normally invoked daily, often as part of an accounting procedure run by the cron command 
during off-peak periods. Modify the skulker command to suit local needs following the patterns shown in the 
distributed version. System users should be made aware of the criteria for automatic file removal. 

The find command and the xargs command form a powerful combination for use in the skulker command. 
Most file selection criteria can be expressed conveniently with find expressions. The resulting file list 
can be segmented and inserted into rm commands using the xargs command to reduce the overhead that would 
result if each file were deleted with a separate command. 

Note 
Because the skulker command is run by a root user and its whole purpose is to remove files, 
it has the potential for unexpected results. Before installing a new skulker command, test any additions 
to its file removal criteria by running the additions manually using the xargs -p command. After you have 
verified that the new skulker command removes only the files you want removed, you can install it. 
 
To enable the skulker command, you should use the crontab -e command to remove the comment statement 
by deleting the # (pound sign) character from the beginning of the /usr/sbin/skulker line in the 
/var/spool/cron/crontabs/root file. 



L1 cache, L2 cache, L3 cache and differences:
=============================================


Note 1:
-------

Q:

Question from cmos "What is L2 cache? 

What is L3 cache? 

What is the major difference between the two." 

A:

Answer from LinksMaster "CPU Cache (the example CPU is a little old but the concepts are still the same) 

* The initial level of storage on a processor are the registers. The registers are where the actually processing 
input and output takes place.

* L1 cache - Then the level 1 cache comes next. It is logically the closest high speed memory 
to the CPU core / registers. It usually runs at the full speed (meaning the same as the CPU core clockspeed). 
L1 often comes in size of 8kB, 16kB, 32kB, 64kB or 128kB. But, it is very high speed even though the amount 
is relatively small.

* L2 cache - The next level of cache is L2, or level 2. Nowadays L2 is larger than L1 and it often comes in 
256kB, 512kB and 1,024MB amounts. L2 often runs at 1/4, 1/2 or full speed in relation to the CPU core clockspeed.

* L3 cache - Level 3 cache is something of a luxury item. Often only high end workstations and servers 
need L3 cache. L3 has been both "on-die", meaning part of the CPU or "external" meaning mounted near 
the CPU on the motherboard. It comes in many sizes and speeds.


Note 2:
-------

L2 cachhe, short for Level 2 cache, cache memory that is external to the microprocessor. In general, L2 cache memory, 
also called the secondary cache, resides on a separate chip from the microprocessor chip. 
Although, more and more microprocessors are including L2 caches into their architectures. 

As more and more processors begin to include L2 cache into their architectures, Level 3 cache is now the name 
for the extra cache built into motherboards between the microprocessor and the main memory. 

Quite simply, what was once L2 cache on motherboards now becomes L3 cache when used with microprocessors 
containing built-in L2 caches. 





xcom:
=====

Used for filetransfer between systems with many nice features like printing a report of the transfer,
queuing of transfers, EBCIDIC - ASCII conversion, scheduling etc..

The command "xcom62" is used for SNA networks.
The command "xcomtcp" is used for the tcpip networks.

The xcom daemon is "xcomd" in /opt/xcom/bin

Use:
xcomd -c	to kill/stop
xcomd		to start  the daemon.

logging and config of xcom events can be found in:
/usr/spool/xcom 

- xcom.glb has all conf. settings and is a superset of xcom.cnf
- After reconfig, stop and start the daemon
- atoe files are for ascii / ebcdic conversions
- xcomqm for maintaining queues
- xcom.trusted must be defined for each trusted transfer
- xcomtool is for GUI

sending of files is done with "xcomtcp -c[1234] other options"
where -c1 means sending a file

Example commands:
xcomtcp -c1 -f /tmp/xcom.cnf LOCAL_FILE=/tmp/xcomtest.txt REMOTE_FILE=Q:\
REMOTE_SYSTEM=NLPA020515.patest.nl.eu.abnamro.com QUEUE=NO PROTOCOL=TCPIP PORT=8044

xcomtcp -c1 -f /tmp/xcom.cnf LOCAL_FILE=/tmp/xcomtest.txt REMOTE_FILE=c:\test.txt
REMOTE_SYSTEM=NLPR020796.branches.nl.eu.abnamro.com QUEUE=NO PROTOCOL=TCPIP PORT=8044


xcomtcp -c1 -f /tmp/xcom.cnf LOCAL_FILE=/tmp/xcomtest.txt REMOTE_FILE=Q:\XCOM\DATA\IN\t.txt
REMOTE_SYSTEM=NLPA020515.patest.nl.eu.abnamro.com QUEUE=NO PROTOCOL=TCPIP PORT=8044

xcomtcp -c1 -f /tmp/xcom.cnf LOCAL_FILE=/tmp/xcomtest.txt REMOTE_FILE=Q:\
REMOTE_SYSTEM=NLPA020515.patest.nl.eu.abnamro.com QUEUE=NO PROTOCOL=TCPIP PORT=8044

where "/tmp/xcom.cnf" contains parameters (e.g. userid, password etc..) not specified at the prompt.




vmcid: IBM  minor code: E02
===========================


Note 1:
------- 

Seems there is no port 32888 assign on this systeem ZD111L08

/etc/services
#                                32775-32895            # Unassigned

Maybe this is the issue..



The Infamous JVMXM008 error:
============================

Problem:

JVMXM008: Error occured while initialising System ClassException in thread "main" 
Could not create the Java virtual machine.

This error may arise due to various causes. Most common cause is incorrect permission on
some mount point.

Here, some cases are presented which may resemble your situation:


Case 1:
=======


Incorrect underlying mount point permission's prevent WebSphere Application Server from starting
  
 Technote (troubleshooting) 
  
Problem(Abstract) 
When the filesystem in which Application Server is mounted has incorrect mount point permission's 
the server may not start when attempting to start the server as a non-root user. 

Attempting to execute the $WAS_HOME/java/bin/java as this non-root user causes the following error.

/usr/WebSphere/DeploymentManager/java/bin> java -version
JVM not found: libjvm.a - libjvm.a
/usr/WebSphere/DeploymentManager/bin> startManager.sh
JVMXM008: Error occurred while initialising System ClassException in thread "main" Could not create JVM.  
  

Cause 
The JVM is  Unable to initialize because the user does not have the proper permission to access the necessary 
files due to incorrect underlying mount point permission's.    
 
Resolving the problem 
Unmount the file system and change the permission's of the mount point to 777 (rwxrwxrwx). 
Mount the file system and run Application Server as a non-root user.  
  
 
Cross Reference information 
Segment Product Component Platform Version Edition 
Application Servers Runtimes for Java Technology Java SDK    


Case 2:
=======

WebSphere Application Server V6.1's Java on AIX will not start as a non-root user and will report error 
"JVMXM008" or "JVM not found: libjvm.a - libjvm.a"
  
 Technote (troubleshooting) 
  
Problem(Abstract) 
When attempting to run WebSphere Application Server V6.1 as a non-root user on AIX, 
it may fail to initialize. It produces an error message, "JVMXM008: Error occured while initialising System ClassException 
in thread "main" Could not create the Java virtual machine." or a different error message, 
"JVM not found: libjvm.a - libjvm.a". Running the same process as the root user does not produce any problems.  
  
Symptom 
This article describes an issue clients may encounter after successfully installing WebSphere Application Server V6.1 
on AIX as either the root user or a non-root user. 

Attempting to use any WebSphere tools, such as the profile creation utility, fail immediately. 
The cause of all those failures is because WebSphere Application Server's Java SDK does not initialize properly 
when run as the non-root user. Testing Java in the reveals specific errors, as shown in the examples below.

Examples
In these examples illustrating the Java initialization errors, assume that WebSphere Application Server 
has been successfully installed as a non-root user to the /usr/WAS directory.

Example One: Java fails to initialize when running it from a location outside its directory

Running these commands...

cd /
/usr/WAS/java/jre/bin/java -version

...produces this error:

JVMXM008: Error occured while initialising System ClassException in thread "main" 
Could not create the Java virtual machine.


Example Two: Java fails to initialize when running it from inside its "bin" directory

Running these commands...

cd /usr/WAS/java/jre/bin
./java -version

...produces this error:

JVM not found: libjvm.a  - libjvm.a 


The error shown in Example Two occurs even though the libjvm.a file is present in the proper location 
and has correct permissions.  
  
 
Cause 
If the mount point for the filesystem to which WebSphere is installed is not set up correctly, this can cause 
problems for Java initialization.  
  
 
Resolving the problem 
In situations where Java initializes properly when run as the root user but fails when run as a non-root user, 
check the following points: 
The WebSphere Application Server V6.1 installation should be successful. Check the installation's "log.txt" file 
to ensure that the installation ended with a "success" message.

A non-root user must have read access to all files installed with WebSphere Application Server V6.1, including all 
of the files in the product's "java" subdirectory. This can be accomplished by granting ownership of all 
the product's files to that non-root user.

A non-root user should also have access to the AIX system libraries, such as the libraries in /usr/lib and /usr/ccs/lib .

If each of those points are checked and appear to be in good order, then there is one more aspect of the WebSphere 
configuration to review. Follow these steps to check and correct an issue which could lead to problems 
when initializing Java as a non-root user:

Locate the mount point of the filesystem where WebSphere Application Server V6.1 is mounted on.

Unmount that filesystem.

Check the permissions of the blank directory which acts as the mount point for the filesystem. 
The permissions should be at least "755". The ownership and permissions of that directory must be configured 
in a manner which allows the non-root user read access to that directory.

Check that the directory which acts as the mount point is empty. If the directory contains anything, 
remove that content so that the directory becomes empty.

Remount the WebSphere Application Server V6.1 filesystem. Java should initialize properly 
as a non-root user once the issues with the mount point permissions are resolved.  
 
 
Case 3:
=======
   

JVMXM008 workaround 
Brian D. Carlstrom wrote:


At Stanford I have my own automated regression infrastructure that runs processor simulations 
remotely via ssh. One problem that has existed for months is this error from jbuild.linkImage:

JVMXM008: Error occured while initialising System ClassException in thread "main" 
Could not create the Java virtual machine.

I noticed that Sergiy Kyrylkov had posted in his blog about a similar issue when running 
JikesRVM regressions from cron:

http://sergiy.kyrylkov.name/blog/Jikes%20RVM/

I found that if I forced the ssh remote bash to be a login shell, I did not get the error. 
Eventually I narrowed it down to the fact that /etc/profile was sourcing /etc/profile.d/lang.sh 
which was setting the LANG environment variable. I found that if I my remote command set LANG explicitly, 
I can build jikesrvm over ssh without the JVMXM008 error.

I just wanted to post the workaround to the list in case it helps someone else. I don't know if it is needed 
for later SDKs, but my version of JikesRVM seems to only work with 1.4.1, nothing earlier, nothing later. 

For the record, this is on Fedora Core 4 on G5 machines running IBM's SDK 1.4.1-SR2 with a jikesrvm 
source tree from last updated at "2005-12-09 03:00:00". We are frozen until after some paper deadlines this month...

I hope I'll have time soon to try this workaround on UNM machines.


Case 4:
=======

Q:
Hi

When I try to install PT 8.46.02 in AIX box, I m got the below error message.
"JVMXM008: Error occurred while initializing System Class Exception in thread "main" Could not create 
the Java virtual machine".

Any help could really appreciate.

A:
set java_home class path before installing


Case 5:
=======

Thursday, 27 November 2008

JVMXM008: Error occured while initialising System ClassException in thread "main" Could not create 
the Java virtual machine.

« Problems running gmi to upgrade Tivoli Directory Integrator TDI | Main 

Trying to get IBM Websphere running as non root user (wasadmin:wasgroup), we got the following error at startup:
JVMXM008: Error occured while initialising System ClassException in thread "main" Could not create the Java virtual machine.
This problem also occurs, just running java -version somewhere under /usr/ibm. 
(/usr/ibm/WebSphere or /usr/ibm/WebSphere/Appserver etc.)

Everywhere else in the filesystem, there was no problem and java spits out its version string.
The cause was, that the /usr/ibm/WebSphere/Appserver filesystem was a mounted volumegroup.
/dev/volg /usr/ibm
Althoug the rights on /usr/ibm looked good, when the FS was mounted, the JVM did not work.
drwxr-xr-x 13 root system 4096 Jul 29 17:56 /usr/ibm
The root cause shows up, when the /usr/ibm fs gets unmounted:
drwxr-x--- 13 root system 4096 Jan 01 17:56 /usr/ibm
So changing the /usr/ibm directory rights while the fs was unmounted solved the problem. 
Posted by awaldman at 4:35 PM in IBM TIM TDI 




AIX:  0403-031 The fork function failed:
========================================


run "svmon -P" to see the top consumers.
You might try using:

1. increase the paging space

chps -s number_of_PP's hd6 

Or 

2. increase the maxuproc might help if its too low:

maxuproc:    Specifies the maximum number of processes per user ID. 
Values:      Default: 40; Range: 1 to 131072 
Display:     lsattr -E -l sys0 -a maxuproc 
Change:      chdev -l sys0 -a maxuproc=NewValue 


3. Actually, you should increase RAM memory as a structural solution, or decrease
the number of processes.


4. If you need to kill all your processes, use the command:

# kill -9 -1        # after that, you are able to enter commands again.

or try

# killall



AIX create Ramdisk:
===================

Example:

# mkramdisk 40000 
# ls -l /dev | grep ram 
# mkfs -V jfs /dev/ramdiskx 
# mkdir /ramdiskx 
# mount -V jfs -o nointegrity /dev/ramdiskx /ramdiskx 



AIX 5.3 ping error:
===================

thread:

Q:

Hi all, 

If a normal use trys to ping to my workstation then it gives the followin error "0821-067 ping: 
The socket creation call failed.: The file access permissions do not allow the specified action" 

And in my workstation if i login as non-root user then if I ping to some other system it gives the same error..
whereas it is not so with root user. 

Any suggestions what can be the problem? 

A:

Hi, 

looks like problems in program file ping rights... in my AIX system i have the following for /usr/sbin/ping 

# ls -l /usr/sbin/ping 
-r-sr-xr-x 1 root system 31598 Dec 17 2002 /usr/sbin/ping 

A:

Technote IBM

Cannot ping as Non-root user 
 Technote (FAQ) 
  
Problem 

When trying to ping as a user that is not root, the following error message was 
displayed:

0821-067 Ping: the socket creation call failed.
the file access permissions do not allow the specified
actions.

  
Solution 

--------------------------------------------------------------------------------

Environment
AIX Version 5.x Change the setuid bit permissions for /usr/sbin/ping. Enter: 
chmod 4555 /usr/sbin/ping




Root Password Recovery on Solaris : 
===================================


Go to the OK Prompt - by pressing Stop +A . Put the 1st cd for Solaris in the cdrom 
At OK prompt give the command # boot cdrom -s ú Now mount the boot device onto /a 
( To check boot device you can use df command) 

#mount /dev/dsk/c0t0d0s0 /a

Now open the password file and remove the password entry i.e. 
the second field root:$1$NYDu1c8p$Mdm2n6IPb9k14pP2s2FXZ.:13063:0:99999:7::: 

# vi /a/etc/shadow

Now unmount the /a mount point 

#umount /a

Reboot the server in single mode 

#ok boot -s

Give a new password for root: 

        #passwd
          New Password:
          Verify Password:

This will reset the password for root and you will be able to login to the box using this password. 


  
itm_ora_App2


AIX: 0403-006 Execute permission denied:
========================================

Note 1:
-------

thread

Q:

hello all,now,I want to exe a shell script,the result of command "ls 
-l",it's permission: 
-rwxr-x--x 

but i use the "./proname" to exe it,the result is: 
0403-006 Execute permission denied 


WHY?? the permission is all eXecute!!! 

A:

If permissions seems ok, then chcek this:
Make sure there are no "empty" lines with an "~" below the last statement, like

~


because "~" cannot be executed.


AIX: The certlist command:
==========================

You can use the certlist command without any parameters or flags, which will show
you all installed certificates for your account on your system.

The man page of certlist:

certlist Command
Purpose
certlist lists the contents of one or more certificates.

Syntax
certlist [-c] [-a attr [attr....] ]tag [username]

Description
The certlist command lists the contents of one or more certificates. Using the -c option causes the output 
to be formatted as colon-separated data with the attribute names associated with each 
field on the previous line as follows: 

# name: attribute1: attribute2: ... 
User: value1: value2: ... 

The -f option causes the output to be formatted in stanza file format with the username attribute 
given as the stanza name. Each attribute=value pair is listed on a separate line: 

user: 
     attribute1=value 
     attribute2=value 
     attribute3=value 

When neither of these command line options are selected, the attributes are output as attribute=value pairs.

The -a option selects a list of one or more certificate attributes to output. In addition to the attributes 
supported by the load module, several pseudo-attributes shall also be provided for each certificate.

Those attributes are:

auth_user User's authentication certificate. 
distinguished_name User's subject distinguished name in the certificate. 
alternate_name User's subject alternate name in the certificate. 
validafter The date the user's certificate becomes valid. 
validuntil The date the user's certificate becomes invalid. 
tag The name that uniquely identifies this certificate. 
issuer The distinguished name of the certificate issuer. 
label The label that identifies this certificate in the private keystore. 
keystore The location of the private keystore for the private key of the certificate. 
serialnumber The serial number of the certificate. 
verified true indicates that the user poved that he is in possession of the private key. 

Flags
-c Displays the output in colon-separated records. 
-f Displays the output in stanzas. 
-a attr Selects one or more attributes to be displayed. 

The tag parameter selects which of the user's certificates is to be output. The reserved value ALL indicates 
that all certificates for the user are to be listed.

The username parameter specifies the name of the AIX user to be queried. If invoked without the username parameter, 
the certdelete command uses the name of the current user.

Exit Status
0 If successful. 
EINVAL If the command is ill-formed or the arguments are invalid. 
ENOENT If a) the user doesn't exist, b) the tag does not exist c) the file does not exist. 
EACCES If the attribute cannot be listed, for example, if the invoker does not have read_access to the user data-base. 
EPERM If the user identification and authentication fails. 
errno If system error. 

Security
This command can be executed by any user in order to list the attributes of a certificate. 
Certificates listed may be owned by another user.

Audit
This command records the following event information:

CERT_List <username>

Examples
$ certlist -f -a verified keystore label signcert bob
bob:
      verified=false
      keystore=file:/var/pki/security/keys/bob
      label=signcert

$ certlist -c -a validafter validbefore issuer signcert bob
#name:validafter:validuntil:issuer
bob:1018091201:1018091301:c=US,o=xyz

$ certlist -f ALL bob
bob:
      auth_cert=logincert
      distinguished_name=c=US,o=xyz,cn=bob
      alternate_name=bob@xyz.com
      validafter=0921154701
      validuntil=0921154801
      issuer=c=US,o=xyz
      tag=logincert
      verified=true
      label=loginkey
      keystore=file:/var/pki/security/keys/bob
      serialnumber=03
bob:
      auth_cert=logincert
      distinguished_name=c=US,o=xyz,cn=bob
      alternate_name=bob@ibm.com
      validafter=1018091201
      validuntil=1018091301
      issuer=c=US,o=xyz
      tag=signcert
      verified=false
      label=signkey
      keystore=file:/var/pki/security/keys/bob
      serialnumber=02Files
/usr/lib/security/pki/acct.cfg

/usr/lib/security/pki/policy.cfg



 
SAM on HP-UX:
=============


The easiest way to administer HP-UX is to use Sam.
As root, simply type "sam"... easy, huh?
If you're in text-mode, you'll get a curses-based window, and if you're in CDE / VUE, you will get 
a new window on your workspace... Simply navigate your way through - 
you can do a lot of your administration via sam.

Some example screens in textmode sam

# sam
..
..
Starting the terminal version of sam...

To move around in sam:

- use the "Tab" key to move between screen elements
- use the arrow keys to move within screen elements
- use "Ctrl-F" for context-sensitive help anywhere in sam

On screens with a menubar at the top like this:

        ------------------------------------------------------
       |File View Options Actions                         Help|
       | ---- ---- ------- ------------------------------- ---|

- use "Tab" to move from the list to the menubar
- use the arrow keys to move around
- use "Return" to pull down a menu or select a menu item
- use "Tab" to move from the menubar to the list without selecting a menu item
- use the spacebar to select an item in the list

On any screen, press "CTRL-K" for more information on how to use the keyboard.


+ ===             System Administration Manager (gavnh300) (1)                 +
YFile View Options Actions                                                Help Y
Y                       Press CTRL-K for keyboard help.                        Y
YSAM Areas                                                                     Y
Y------------------------------------------------------------------------------Y
Y  Source   Area                                                               Y
Y+---------------------------------------------------------------------------+ Y
YY SAM      Accounts for Users and Groups ->                                 ^ Y
YY SAM      Auditing and Security         ->                                   Y
YY SAM      Backup and Recovery           ->                                   Y
YY SAM      Clusters                      ->                                   Y
YY SAM      Disks and File Systems        ->                                   Y
YY SAM      Display                       ->                                   Y
YY SAM      Kernel Configuration          ->                                   Y



YY SAM      Networking and Communications ->                                   Y
YY SAM      Performance Monitors          ->                                   Y
YY SAM      Peripheral Devices            ->                                   Y
YY SAM      Printers and Plotters         ->                                   Y
YY SAM      Process Management            ->                                   Y
YY Other    Resource Management           ->                                   Y
YY SAM      Routine Tasks                 ->                                 v Y
Y+---------------------------------------------------------------------------+ Y
Y                                                                              Y
+------------------------------------------------------------------------------+


Choose "Accounts for Users and Groups" and the following screen shows:

+ ===             System Administration Manager (gavnh300) (1)                 +
YFile View Options Actions                                                Help Y
Y                       Press CTRL-K for keyboard help.                        Y
YSAM Areas:Accounts for Users and Groups                                       Y
Y------------------------------------------------------------------------------Y
Y  Source   Area                                                               Y
Y+---------------------------------------------------------------------------+ Y
YY ..(go up)                                                                 ^ Y
YY SAM      Groups                                                             Y
YY SAM      Users                                                              Y
YY                                                                             Y
YY                                                                             Y
YY                                                                             Y
YY                                                                             Y
YY                                                                             Y
YY                                                                             Y
YY                                                                             Y
YY                                                                             Y
YY                                                                             Y
YY                                                                             Y
YY                                                                           v Y
Y+---------------------------------------------------------------------------+ Y
Y    Working...                                                                Y
+------------------------------------------------------------------------------+


+ ===             Accounts for Users and Groups (gavnh300) (1)                 +
YFile List View Options Actions                                           Help Y
Y                       Y Add...                                Y              Y
YTemplate In Use: None  Y User Templates ->                     Y              Y
YFiltering:  Displaying Y Task Customization...                 Y              Y
Y-----------------------Y ===================================== Y--------------Y
YUsers                  Y Modify...                             Yf 314 selectedY
Y-----------------------Y Remove...                             Y--------------Y
Y  Login      User ID   Y Modify Secondary Group Membership...  Yce      Of    Y
Y  Name         (UID)   Y Modify User's Password                Ye       Lo    Y
Y+----------------------Y Reset User's Password                 Y------------+ Y
YY ru1160        6243   Y ------------------------------------- YLDEV     R. ^ Y
YY sa2064        6975   Y Deactivate...                         YRATOR    I  Y Y
YY sa2194        8172   Y ------------------------------------- YLMAN     S  Y Y
YY sc3060        5318   Y Modify Security Policies...           YMAN      KE Y Y
YY sc4634        8140   Y Set Authorized Login Times...         YDONLY    JH Y Y
YY sc6228        7027   +---------------------------------------+LMAN     P.   Y
YY se1223        8170   SEL                       sysman      SYSMAN      VA Y Y
YY sh0403        7735   SHARIF                    oper        OPERATOR    T. Y Y
YY si1608        7479   Sinha                     support     APPLMAN     R  Y Y
YY si1624        7391   Sinha                     support     APPLMAN     A  v Y
Y <------------------------------------------------------------------------->+ Y
Y                                                                              Y
+------------------------------------------------------------------------------+



AIX 5L vmstat output issues:
============================

Suppose you see output like this:

[pl003][tdbaeduc][/dbms/tdbaeduc/educroca/admin/dump/bdump] vmstat -v
              1572864 memory pages
              1506463 lruable pages
                36494 free pages
                    7 memory pools
               336124 pinned pages
                 80.0 maxpin percentage
                 20.0 minperm percentage
                 80.0 maxperm percentage
                 43.4 numperm percentage
               654671 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 45.8 numclient percentage
                 80.0 maxclient percentage
               690983 client pages
                    0 remote pageouts scheduled
                    0 pending disk I/Os blocked with no pbuf
         -->  8868259 paging space I/Os blocked with no psbuf
         -->     2740 filesystem I/Os blocked with no fsbuf
         -->    13175 client filesystem I/Os blocked with no fsbuf
         -->  319766 external pager filesystem I/Os blocked with no fsbuf


What is the meaning, and interpretation, of the outputlines like "pending disk I/Os blocked with no pbuf" ?

Note 1:
-------

http://www.circle4.com/jaqui/eserver/eserver-AugSep06-AIXPerformance.pdf
..
..

The last five lines of the vmstat -v report are useful when you're looking for I/O problems. The first line is 
for disk I/Os that were blocked because there were no pbufs. Pbufs are pinned memory buffers used 
to hold I/O requests at the logical volumemanager layer. Prior to AIX v5.3, this was a systemwide parameter. 
It's now tuneable on a volume-group basis using the lvmo command. The ioo parameter that controls the default 
number of pbufs to add when a disk is added to a volume groupis pv_min_pbuf, and it defaults to 512. 
This specifies the minimum number of pbufs per PV that the LVM uses, and it's a global value that applies to all 
VGs on the system. If you see the pbuf blocked I/Os field above increasing over time, you may want to use the 
lvmo -a command to find out which volume groups are having problems with pbufs and then slowly increase 
pbufs for that volume group using the lvmo command. A reasonable value could be 1024.

Paging space I/Os blocked with no psbuf refers to the number of paging space I/O requests blocked 
because no psbuf was available. These are pinned memory buffers used to hold I/O requests at the 
virtual memory manager layer. If you see these increasing, then you need to either find out why the system 
is paging or increase the size of the page datasets. Filesystem I/Os blocked with no fsbufs refers to the 
number of filesystem I/O requests blocked because no fsbuf was available. Fsbufs are pinned memory buffers 
used to hold I/O requests in the filesystem layer. If this is constantly increasing, then it may be necessary 
to use ioo to increase numfsbufs so that more bufstructs are available. The default numfsbufs value
is determined by the system and seems to normally default to 196. I regularly increase this to either 1,024 or 2,048.

Client filesystem I/Os blocked with no fsbuf refers to the number of client filesystem I/O requests blocked 
because no fsbuf was available. Fsbufs are pinned memory buffers used to hold I/O requests in the 
filesystem layer. This includes NFS, VxFS (Veritas) and GPFS filesystems. Finally, ext pager 
filesystem I/Os blocked with no fsbuf refers to the number of external pager client filesystem I/O requests 
blocked because no fsbuf was available. JFS2 is an external pager client filesystem. If I see this growing, 
I typically set j2_nBufferPerPagerDevice=1024


Note 2:
-------

thread:

Q:

we have I/O issue on the AIX box for our Oracle DB
the disks having the Database files are always 100% busy
and the wa column in vmstat hits above 50
and the vmstat -v show the I/O's being blocked
    2238141 pending disk I/Os blocked with no pbuf
  13963233 paging space I/Os blocked with no psbuf
  2740 filesystem I/Os blocked with no fsbuf
  1423313 client filesystem I/Os blocked with no fsbuf
 1128548 external pager filesystem I/Os blocked with no fsbuf
  
What does these indicate, short of real mem or does some kernal parameters need to be adjusted?

A:

I'd up the number of fsbufs per filesystem.

What are your current settings?

ioo -L|egrep 'numfsbufs|j2_nBufferPerPagerDevice'

numfsbufs is for jfs filesystems
j2_nBuffer... is for jfs2 filesystems

if I'm not mistaken.

Note, if you change these values, you have to umount/mount the filesystems to take effect. 
I.e. you have to bring Oracle down.

HTH,

p5wizard
 
 
p5wizard, Thanks, I dont have the access to it,i will get the SA to get me the output.
Are these figures cummulative since the last reboot of the box.
what a good setting for this 
 
dbinsight (TechnicalUser) 10 Mar 06 14:06  
ioo -L |egrep 'numfsbufs|j2_nBufferPerPagerDevice'       
numfsbufs                 512    196    512    1      2G-1                     M
j2_nBufferPerPagerDevice  512    512    512    0      2G-1

The above are our settings, Are these the default settings?

  
p5wizard (IS/IT--Management) 13 Mar 06 3:35  
answers to your questions:

yes, cumulative (so depends on how long the system's been running to interpret the values).

no, already been increased for jfs
yes, defaults for jfs2

1st value = current situation
2nd value = system default
3rd value = value for nextboot
 
# ioo -L|head -3; ioo -L|egrep 'numfs|j2_nBuff' 
NAME                      CUR    DEF    BOOT   MIN    MAX    UNIT           TYPE
     DEPENDENCIES
--------------------------------------------------------------------------------
j2_nBufferPerPagerDevice  512    512    512    0      256K                     M
numfsbufs                 400    196    400    1      2G-1                     M

But as I said before, doesn't help to increase 'em unless you unmount/mount the filesystems. 
As your SA has upped the 'NEXTBOOT' values, I guess (s)he knows about that. 

Run "topas 2" for a few iterations, and post that screenful please.
Also "vmo -L|egrep 'perm%|client%'" output please.

You have a very high value:
  13963233 paging space I/Os blocked with no psbuf
On my large DB servers this is close to zero.

Run "lsps -a" and post that output also please.

I googled for "aix psbufs" and found an Oracle AIX performance Technical Brief, here's an excerpt:

# vmstat -v | tail -5 (we only need last 5 lines)
0 pending disk I/Os blocked with no pbuf
  o for pbufs, increase pv_min_pbuf using ioo,
0 paging space I/Os blocked with no psbuf
  o for psbufs, stop paging or add more paging space,
8755 filesystem I/Os blocked with no fsbuf ?? JFS
  o for fsbufs, increase numfsbufs using ioo,
  o default is 196, recommended starting value is 568,
0 client filesystem I/Os blocked with no fsbuf (NFS/Veritas)
  o for client filesystem fsbufs, increase:
     nfso's nfs_v3_pdts and nfs_v3_vm_bufs
2365 external pager filesystem I/Os blocked with no fsbuf (JFS2)
  o for external pager fsbufs, increase:
     j2_nBufferPerPagerDevice, default is 512, recommended value is 2048,
     j2_dynamicBufferPreallocation using ioo.


Note 3:
-------

thread:

4.2) File System Buffers.  By default, the number of file system buffers is set to 196.  For high I/O systems,
this is typically too small.  To see if you are blocking I/O due to not having enough 
file system buffers, run: vmstat -v.  

For JFS file systems, look at the "filesystem I/Os blocked with no fsbuf" line.  
For JFS2 file systems, look at the "client filesystem I/Os blocked with no fsbuf" line.  

If these values are more than a couple thousand, you may need to increase the respective parameters.  
For JFS file systems, you will need to change the numfsbufs parameter.  For JFS2 file systems, 
change the  j2_nBufferPerPagerDevice parameter.  Changing this parameter does not require a reboot, 
but will only take effect when the file system is mounted, so you will have to unmount/mount the file system.

4.2) JFS Log Devices.  Heavily used filesystems should ALWAYS have their own JFS log on a 
separate physical disk.  All writes to a JFS (or JFS2) file system are written to the JFS log.  
By default, there is only one JFS log created for any volume group containing JFS file systems.  
This means that ALL writes to ALL the file systems in the volume group go to ONE PHYSICAL DISK!!  
(This is, unless, your underlying disk structure is striped or another form of RAID for performance.)  
Creating separate JFS logs on different physical disks is very important to getting the most out 
of the AIX I/O subsystem.
 
 


/usr/ccs/bin/shlap64:
=====================

The /usr/ccs/bin/shlap64 process is the Shared Library Support Daemon.

The muxatmd, snmpmibd and aixmibd are Simple Network Managaement
Protocol (SNMP) daemons for AIX. All can be turned off by commenting
out the entries that start them in /etc/rc.tcpip. shlap64 is part of
the 64-bit environment and needs to be running if you are using a
64-bit kernel.

The IBM.* programs are part of the Reliable Scalable Cluster
Technology which IBM added to AIX v5 from their SP cluster systems.
These programs provide additional system monitoring (and alerting if
configured to do so). You should probably leave them running.



ntpdate:
========

The ntpdate command sets the local date and time by polling the NTP servers specified to determine the correct time. 
It obtains a number of samples from each server specified and applies the standard NTP clock filter and 
selection algorithms to select the best of the samples.

To set the local date and time by polling the NTP servers at address 9.3.149.107, enter:

/usr/sbin/ntpdate 9.3.149.107

Output similar to the following appears:

28 Feb 12:09:13 ntpdate [18450]: step time server 9.3.149.107
offset 38.417792 sec



/etc/ncs/glbd:
==============

glbd Daemon

Purpose
Manages the global location broker database.

Syntax
/etc/ncs/glbd [ -create { -first [-family FamilyName] | -from HostName } ] [  -change_family FamilyName ] 
[ -listen FamilyList] [ -version ]

Description
The glbd daemon manages the global location broker (GLB) database. The GLB database, part of the 
Network Computing System (NCS), helps clients to clients to locate servers on a network or internet. 
The GLB database stores the locations (specifically, the network addresses and port numbers) of servers 
on which processes are running. The glbd daemon maintains this database and provides access to it.

There are two versions of the GLB daemon, glbd and nrglbd.


RBAC:
=====

Role Based Access Control

AIXr has provided a limited RBAC implementation since AIX 4.2.1.

Most environments require that different users manage different system administration duties. 
It is necessary to maintain separation of these duties so that no single system management user 
can accidentally or maliciously bypass system security. While traditional UNIXr system administration 
cannot achieve these goals, Role Based Access Control can.

RBAC allows the creation of roles for system administration and the delegation of administrative tasks 
across a set of trusted system users. In AIXr, RBAC provides a mechanism through which the administrative 
functions typically reserved for the root user can be assigned to regular system users.

Beginning with AIX 6.1, a new implementation of RBAC provides for a very fine granular mechanism 
to segment system administration tasks. Since these two RBAC implementations differ greatly in functionality, 
the following terms are used:

-Legacy RBAC Mode 
 The historic behavior of AIX roles that was introduced in AIX 4.2.1 
-Enhanced RBAC Mode 
 The new implementation introduced with AIX 6.1 

Both modes of operation are supported. However, Enhanced RBAC Mode is the default on a newly installed AIX 6.1 system. 



llbd:
=====

llbd Daemon


Purpose
Manages the information in the local location broker database. 


Syntax
llbd [-family FamilyName] [ -version] 


Description
The llbd daemon is part of the Network Computing System (NCS). It manages the local location broker (LLB) database, 
which stores information about NCS-based server programs running on the local host. 

A host must run the llbd daemon to support the location broker forwarding function or to allow remote access 
(for example, by the lb_admin tool) to the LLB database. In general, any host that runs an NCS-based server 
program should run an llbd daemon, and llbd should be running before any such servers are started. 
Additionally, any network or internet supporting NCS activity should have at least one host running a 
global location broker daemon (glbd). 

The llbd daemon is started in one of two ways: 

Through the System Resource Controller (the recommended method), by entering on the command line: 

startsrc -s llbd

By a person with root user authority entering on the command line: 

/etc/ncs/llbd &

TCP/IP must be configured and running on your system before you start the llbd daemon. 
(You should start the llbd daemon before starting the glbd or nrglbd daemon.) 



tripwire:
=========

Tripwire data integrity assurance software monitors the reliability of critical system files and directories 
by identifying changes made to them. Tripwire configuration options include the ability to receive alerts 
via email if particular files are altered and automated integrity checking via a cron job. Using Tripwire for 
intrusion detection and damage assessment helps you keep track of system changes. Because Tripwire can 
positively identify files that have been added, modified, or deleted, it can speed recovery from a break-in 
by keeping the number of files which must be restored to a minimum. 

Tripwire compares files and directories against a database of file locations, dates modified, and other data. 
The database contains baselines, which are snapshots of specified files and directories at a specific 
point in time. The contents of the baseline database should be generated before the system is at risk 
of intrusion. After creating the baseline database, Tripwire then compares the current system to the baseline 
and reports any modifications, additions, or deletions. 

While Tripwire is a valuable tool for auditing the security state of Red Hat Linux systems, Tripwire is not 
supported by Red Hat, Inc. Refer to the Tripwire project's website (http://www.tripwire.org/) for more 
information about Tripwire. 


SA-Agent uctsp0:
================

CONTROL-SA is a client/server solution that enables you to manage security systems distributed across multiple
platforms. CONTROL-SA synchronizes accounts and passwords across those systems.
 
On AIX, you can find the binaries and files in /usr/lpp/uctsp0.
On HP, you can find the binaries and files in /usr/local/uctsp0.

To stop the agent:
# su - uctsp0 -c stop-ctsa

To start the agent:
# su - uctsp0 -c start-ctsa


EMC Documentum:
===============

General:
--------

The following components are associated with the Content Server:

- A database containing relationships that relate to stored content (on filesystems).
  This database thus contains metadata (and not content).

- file storage of the actual content being managed.
  The file store and database constitue the Documentum Repository abstraction, called Docbase.

- A set of key processes that implement the Documentum content management solution
  such as the Document Broker.

- A set of housekeeping utilities, including a Web-based Admin tool.


Client Connect:
---------------

Access to Documentum Docbases is controlled through the Documentum 
client file dmcl.ini.

You need to understand the architecture of Content Server and
docbroker and how DMCL connects. Content Server and docbroker are 2
separate processes which are started (usually but not necessarily) on
the same machine. Since they are separate processes they listen on
different ports. Docbroker usually listens on a well-known port 1489.
Content Server will listen on the port you configure in the services
file. When you issue a DMCL connect (which is what DAB does - it is a
documentum client using DMCL) the DMCL first locates a docbroker. It
asks the docbroker for a list of docbases it knows about. The
docbroker returns the names of the docbases and provides details of
the Content Server(s) that service the docbase. This includes the host
and port details. The DMCL then issues requests directly to the
Content Server bypassing the docbroker. Thus you need both the
docbroker and the Content Server port open.




BPS http Listener (Installation and Configuration)
--------------------------------------------------
What is BPS 

Business Process Services (BPS) provides the gateway to access Docbase for a non Documentum user. 
It allows HTTP, SMTP of JMS message to be stored directly in the Docbase. When an http, SMTP or JMS message 
is sent to BPS http listener Servlet URL, email address or JMS queue; the listener intercepts the message 
and processes it to a protocol neutral format. The message is then passed to the BPS message handler. 
The message handler opens a connection to the Docbase and stores the message as virtual document. 
The attachment gets stored as child nodes of the virtual document. 

How http Listener Works 

The http message listener is implemented as a Servlet. It gets installed as a part of BPS web application. 
The URL to access the http listener Servlet is 

http://<servername>:<portnumber>/bps/http/http 

or 

https://<servername>:<portnumber>/bps/http/http 

As you can see from the URL, the http listener can use both http and https protocol. However it should 
be kept in mind that application server uses two separate ports to communicate with http and https protocol. 
If we provide http protocol port number (say 8080) to construct the https URL, it will not work. 
This is a common error one can make while configuring BPS http listener. In the following pages we will 
step through the installation, configuration and testing of BPS http listener. 


Configuring the BPS Handlers 

BPS configuration for handlers are kept in default.xml file. It is be located at drive:\Documentum\config\bps 
1) Navigate to The default.xml file and open it for edit using a ASCII editor like Notepad. 
2) Configuring <connections> element of this file. In the <connections> element we have to provide 
   the connection details to Docbase, such as Docbasename, username, password etc. A sample <connections> 
   element entry should look like this:- 

<connections> 
 <docbase-connection name="connection"> 
  <docbase-name>zebra</docbasename> 
  <user-name>dmadmin</user-name> 
  <password>mypassword</password> 
 </docbase-connection> 
</connections> 

3) Configuring <handlers> element. 

This element specifies the message handlers available to BPS message listeners. Many out of the box handlers 
are provided with BPS but all of them are disabled by surrounding them within XML comment tag <!- -> 

Either enable some of them or point towards your own custom handler class like this 

<handlers> 
.. 
.. 
<handler name="LinkToFolderExample"> 
<service-name> 
com.documentum.bps.handlers.LinkToFolderService 
</service-name> 
<params> 
<param name="folderName" value="/bpsinbound/"/> 
</params> 
</handler> 
</handlers> 

4) Configuring <listeners> element 

Listeners element turns on the SSL capabilities of Local and remote listeners. Set the <allow-non-ssl> 
flag true or false as per your requirements. For our http listener test, we would use non-ssl connection, 
so make sure the value for the element is "true". 

 
5) Save the default.xml. 

A complete default.xml file for our test setup will look something like this. Replace the bold letters 
with your own Docbase, user, password and connection names. You can have multiple connections defined for 
multiple docbases. 

?xml version="1.0? encoding="UTF-8??> 

<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 

<processors> 
<local name="default"/> 
</processors> 
<connections> 
<docbase-connection name="myconnection1?> 
<docbase-name>mydocbase1</docbase-name> 
<user-name>myname1</user-name> 
<password>mypassword1</password> 
</docbase-connection> 
<docbase-connection name="myconnection2?> 
<docbase-name>mydocbase2</docbase-name> 
<user-name>myname2</user-name> 
<password>mypassword2</password> 
</docbase-connection> 
</connections> 
<handlers> 
<handler name="ErrorHandlerService"> 
<service-name>com.documentum.bps.handlers.ErrorHandlerService</service-name> 
</handler> 
<handler name="Redirector"> 
<service-name>com.documentum.bps.handlers.SubjectMessageFilter</service-name> 
</handler> 
<handler name="LinkToFolderExample"> 
<service-name>com.documentum.bps.handlers.LinkToFolderService</service-name> 
<params> 
<param name="folderName" value="/bpsinbound"/> 
</params> 
</handler> 
</handlers> 
<listeners> 
<http-listener> 
<local-listener> 
<allow-non-ssl>true</allow-non-ssl> 
</local-listener> 
<remote-listener> 
<allow-non-ssl>true</allow-non-ssl> 
</remote-listener> 
</http-listener> 
</listeners> 
</config> 

Creating a Test html Page 

We would need a test html page to test the http listener. Create html page out of the code provided below. 
This simple page submits a form to http listener after populating BPS http listener parameters in the form parameters. 

<HTML> 
<h1>BPS http listener and LinkToFolder handler test</h1> 
<form method="post" enctype="multipart/form-data" ACTION="http://localhost:8080/bps/http/http"> 
<input type="hidden" name="DctmBpsHandler" value="LinkToFolderExample"> 
<input type="hidden" name="DctmBpsId" value="4b08ac1980001d29?> 
Connection name: <input type="text" name="DctmBpsConnection" size="20? ><br/> 
File1 to upload: <input type="file" name="file to upload1? id="file1? size="20?> 
<br/> 
File2 to upload: <input type="file" name="file to upload2? id="file2? size="20?> 
<br/> 
<br/> 
<input type="submit" value="Submit"> 
</form> 
</HTML> 

Create a file called test.html out of this code and save it in the bps root folder. 

Testing the Application 

Start the application server where BPS is deployed and then invoke the html page by typing the following URL 
in your browser 

http://<servername>:<portnumber>/bps/test.html 


A page should appear in your browser. If not then please check if your application server is running or 
if it has been installed properly 

Fill up the connection name such as myconnection1 and then select a file to upload and then hit submit. 
This will cause the html form to be submitted to the BPS http listener, which will pass the message 
to LinkToFolder message handler and the file will be stored in bpsinbound folder. Once message handler succeeded, 
it will present a success page.

Locating the Saved Message in the Docbase 

We have configured the LinkToFolder handler to save the message to bpsinbound folder. If you browse 
to the bpsinbound folder, you will found a new virtual document has been created by the LinkToFolder handler. 


Expanding the root virtual document will show the attached file. 

Summary- BPS http Installation and Configuration 

BPS http listener can be installed by selecting proper option in the BPS installer. To run the 
http listener, you will require an application server like Tomcat. The handler is implemented as Servlet. 
Before using the listener and the message handlers, BPS default.xml file needs to be configured. 
Please follow the instruction provided in this Whitepaper.to configure the default.xml file. Once it is configured; 
the http listener is ready for test. Use the test.html file provided in this White Paper to test 
the http listener


Example start and stop of Documentum processes:
-----------------------------------------------

TAKE NOTICE:

First of all, on any documentum server, find the account of the software owner.
Since there are serveral accounts, depending on the site, you must check this
before starting or stopping a Service.
You can allways check for the correct owner by looking at the owner of the
"/appl/emcdctm" directory

Example: on ZD111L13 you check

root@zd111l13:/appl#ls -al
total 16
drwxr-xr-x   4 root     staff           256 Jul 13 15:43 .
drwxr-xr-x  24 root     system         4096 Aug 21 15:09 ..
drwxr-xr-x  13 emcdmeu  emcdgeu        4096 Aug  9 15:04 emcdctm
drwxr-xr-x   3 root     staff           256 Jun 29 15:35 oracle

Now you do a swich user to the owner. In the example it would be "su - emcdmeu"

If you logon as the software owner (e.g."su - emcdmeu"), you have several environment variables
available, like $DOCUMENTUM which points to "/appl/emcdctm".


1. Docbroker processes:
-----------------------

Start
$DOCUMENTUM/dba/dm_launch_Docbroker

Stop
$DOCUMENTUM/dba/dm_stop_Docbroker

the startup calls 
./dmdocbroker -port 1489 $@  >> $tlogfile 2>&1 & 
(/product/5.3/bin/dmdocbroker)

Logs:
tail -f $DOCUMENTUM/dba/log/docbroker.<host name>.1489.log
* for example
tail -f $DOCUMENTUM/dba/log/docbroker.ZD110L12.nl.eu.abnamro.com.1489.log


2. Content Server:
------------------ 

Content servers have Docbrokers and a "Java Method Server"
There is also a service for each repository that has been installed


Start
$DOCUMENTUM/dba/dm_launch_Docbroker
$DOCUMENTUM/dba/dm_start_dmwpreu1
$DM_HOME/tomcat/bin/startup.sh

$DM_HOME/tomcat/bin/shutdown.sh
$DOCUMENTUM/dba/dm_shutdown_dmwpreu1
$DOCUMENTUM/dba/dm_stop_Docbroker

Stop
$DM_HOME/tomcat/bin/shutdown.sh
$DOCUMENTUM/dba/dm_shutdown_dmw_eu
$DOCUMENTUM/dba/dm_stop_Docbroker

Or if there are 2 filestores, like in ETNL:

Start
$DOCUMENTUM/dba/dm_launch_Docbroker
$DOCUMENTUM/dba/dm_start_dmw_et
$DOCUMENTUM/dba/dm_start_dmw_et3
$DM_HOME/tomcat/bin/startup.sh

Stop
$DM_HOME/tomcat/bin/shutdown.sh
$DOCUMENTUM/dba/dm_shutdown_dmw_et
$DOCUMENTUM/dba/dm_shutdown_dmw_et3
$DOCUMENTUM/dba/dm_stop_Docbroker

Logs
*Repository
tail -f $DOCUMENTUM/dba/log/dmw_et.log
*JMS
tail -f $DM_HOME/tomcat/logs/catalina.out


Or:

1) kill all processes that are being run by emcdm user. 
2) Run the following commands as user emcdm:

$DOCUMENTUM/dba/dm_launch_Docbroker
$DOCUMENTUM/dba/dm_start_dmw_et
$DM_HOME/tomcat/bin/startup.sh



3. BPS:
-------


Start
#	As user {NL} emcdm, or {EU} wasemceu
cd $DOCUMENTUM/dfc/bps/inbound/bin
./start_jms_listener.sh

Better is:

nohup ./start_jms_listener.sh &

Stop
#	As user {NL} emcdm, or {EU} wasemceu
ps -ef | grep bps
kill -9 <process id>


4. Index Server:
----------------

 Indexer - server IX
 Index servers have 3 services: Docbroker, Index Server, 
 and Index Agent {per repository}

Start
$DOCUMENTUM/dba/dm_launch_Docbroker
$DOCUMENTUM/fulltext/IndexServer/bin/startup.sh
$DOCUMENTUM_SHARED/IndexAgents/IndexAgent1/startupIndexAgent.sh

Stop
$DOCUMENTUM_SHARED/IndexAgents/IndexAgent1/shutdownIndexAgent.sh
$DOCUMENTUM/fulltext/IndexServer/bin/shutdown.sh
$DOCUMENTUM/dba/dm_stop_Docbroker
 
Logs
tail -f $DOCUMENTUM/dfc/logs/IndexAgent1.log


5. Websphere:
-------------

example 1: Syntax if rc.appserver exists:

su - wasemceu

/etc/rc.appserver start ETM1DAE
/etc/rc.appserver start ETM1DEU
/etc/rc.appserver stop ETM1DEU

/etc/rc.appserver stop ETM1DAN
/etc/rc.appserver stop ETM1DNL

/etc/rc.appserver start ETM1DAN
/etc/rc.appserver start ETM1DNL

example 2: Syntax if just using websphere scripts:

START:

/appl/was51/bin/startNode.sh
/appl/was51/bin/startServer.sh server1
/appl/was51/bin/startServer.sh STM1DNL
/appl/was51/bin/startServer.sh STM1DAN

tail -f /beheer/log/was51/server1/SystemOut.log
tail -f /beheer/log/was51/STM1DNL/SystemOut.log
tail -f /beheer/log/was51/STM1DNL/SystemErr.log
tail -f /beheer/log/was51/STM1DAN/SystemOut.log

STOP:

/appl/was51/bin/stopServer.sh STM1DAN
/appl/was51/bin/stopServer.sh STM1DNL

/appl/was51/bin/stopServer.sh server1
/appl/was51/bin/stopNode.sh



Backup options EMC Documentum:
------------------------------

1 cold backup:
--------------

If you want to backup a docbase the Documentum recommended way is:

1) Stop the Content Server 
2) Stop the database
3) Backup the database using standard database or OS tools as appropriate for your database
4) Backup the Content Store(s) using OS tools.

Using is referred to as a full, cold backup. There are options for hot and/or incremental backups but it does get 
more complicated (and possibly expensive). The full,cold backup is the simplest option available.


2. Online Hot backup:
---------------------

2.1 CYA hot backup software 


2.2 Platform Dynamics:  Recovery management for EMC Documentum


2.3 EMC NetWorker Module for Documentum 




Catalina:
=========
 
The Java Serlvet womb part of Apache Tomcat server. It lets Java Servlets handle HTTP requests. 
Catalina is the name of the Java class of Tomcat from version 4.0
Tomcat's servlet container was redesigned as Catalina in Tomcat version 4.x



XMWLM:
======

Note 1:
-------

xmwlm Command

Purpose

       Provides recording of system performance or WLM metrics.

Syntax

       xmwlm [ -d recording_dir ] [ -n recording_name ] [ -t trace_level ] [ -L ]

Description

The xmwlm agent provides recording capability for a limited set of local system performance metrics. These include
common CPU, memory, network, disk, and partition metrics typically displayed by the topas command. Daily recordings
are stored in the /etc/perf/daily directory. The topasout command is used to output these recordings in raw ASCII or
speadsheet format. The xmwlm agent can also be used to provide recording data from Workload Management (WLM). This is
the default format used when xmwlm is run without any flags. Daily recordings are stored in the /etc/perf/wlm
directory. The wlmmon command can be used to process WLM-related recordings. The xmwlm agent can be started from the
command line, from a user script, or can be placed near the end of the /etc/inittab file. All recordings cover 24-
hour periods and are only retained for two days.

#ps -ef | grep -i xmwlm
root 266378      1   0   Aug 06      - 272:17 /usr/bin/xmwlm -L



Note 2:
-------


IY78009: XMWLM HIGH RESOURCE CONSUMPTION, TOPASOUT COUNTERS 

 A fix is available 
Obtain fix for this APAR


APAR status
Closed as program error.

Error description 
xmwlm daemon may consume well over 1% of CPU resources
some disk counter values may be inaccurate in topasout output
Local fix 
Problem summary 
xmwlm daemon may consume well over 1% of CPU resources
some disk counter values may be inaccurate in topasout output
Problem conclusion 
Reduce SPMI instrumentations internal polling frequency for
filesystem metrics.  Update topasout for certain counter data
types.
Temporary fix 
Comments 
APAR information 
APAR number IY78009 
Reported component name AIX 5.3 
Reported component ID 5765G0300 
Reported release 530 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2005-10-21 
Closed date 2005-10-21 
Last modified date 2005-11-17 

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Publications Referenced


Fix information 
Fixed component name AIX 5.3 
Fixed component ID 5765G0300 
 
 
Note 3:
-------

IY95912: XMWLM LOOPING IN SIGNAL HANDLERS INFINITELY 


     Subscribe to this APAR 
By subscribing, you will receive periodic email alerting you to the status of the APAR, and a link to download the fix once it becomes available.
 
 


 A specific fix for this item is not yet available electronically 
This record will be updated with a link to the fix if the APAR is new.
For APARs older than 365 days, contact your support center.
 


APAR status
Closed as program error.

Error description 
High cpu consumption by xmwlm
Local fix 
Problem summary 
High cpu consumption by xmwlm
Problem conclusion 
Stop xmwlm from looking infinitely in signal handler and
avoid xmwlm from crashing when it has to record more than
4096 metrics by recording only 4096 metrics at max.
Temporary fix 
Comments 
APAR information 
APAR number IY95912 
Reported component name AIX 5.3 
Reported component ID 5765G0300 
Reported release 530 
Status CLOSED PER 
PE NoPE 
HIPER NoHIPER 
Submitted date 2007-03-11 
Closed date 2007-03-11 
Last modified date 2007-03-15 

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:
IY96091
 
 

Second superuser:
=================


For safety reasons, you might want to have a second root user on your system.


Note 1:
-------

-- Creating a second root user

Follow these steps to create a second root user: 

Create a user. 
Manually edit the user ID field and group ID field in the /etc/passwd file. 
Change the user ID to ID 0. 
For a typical user ID, for example, change the entry from: 
   russ:!:206:1::/u/russ:/bin/ksh 

to 
   russ:!:0:0::/u/russ:/bin/ksh 

This creates a user (in this case, russ) with identical permissions to root. 



-- Creating special users with root authority

Special users that have root authority but can only execute one command may also be created. For instance, 
to create a user that can only reboot the system, create a regular user called shutdown and modify the /etc/passwd 
command to change the user and group ID to 0. For example, in AIX 3.2: 

   shutdown:!:0:0::/u/shutdown:/bin/ksh 

Change the initial program from /bin/ksh to /etc/shutdown -Fr: 

   shutdown:!:0:0::/u/shutdown:/etc/shutdown -Fr 

For AIX 4, the /etc/passwd entry for the user called shutdown should be: 

   shutdown:!:0:0::/u/shutdown:/usr/sbin/shutdown -Fr 

The shutdown command on AIX Version 4.1 is located in /usr/sbin. 
Now when user shutdown logs in, the system will shut down and reboot. 


Base AIX error codes:
=====================


Appendix A. Base Operating System Error Codes for Services That Require Path-Name Resolution
The following errors apply to any service that requires path name resolution:

EACCES	 Search permission is denied on a component of the path prefix. 
EFAULT	 The Path parameter points outside of the allocated address space of the process. 
EIO	 An I/O error occurred during the operation. 
ELOOP	 Too many symbolic links were encountered in translating the Path parameter. 
ENAMETOOLONG A component of a path name exceeded 255 characters and the process has the DisallowTruncation attribute (see the ulimit subroutine) or an entire path name exceeded 1023 characters. 
ENOENT	 A component of the path prefix does not exist. 
ENOENT	 A symbolic link was named, but the file to which it refers does not exist. 
ENOENT	 The path name is null. 
ENOTDIR	 A component of the path prefix is not a directory. 
ESTALE	 The root or current directory of the process is located in a virtual file system that is unmounted. 


clsprod@starboss:/usr/include $ cat errlog.h
/* IBM_PROLOG_BEGIN_TAG                                                   */
/* This is an automatically generated prolog.                             */
/*                                                                        */
/* bos53D src/bos/usr/ccs/lib/liberrlog/errlog.h 1.7                      */
/*                                                                        */
/* Licensed Materials - Property of IBM                                   */
/*                                                                        */
/* Restricted Materials of IBM                                            */
/*                                                                        */
/* (C) COPYRIGHT International Business Machines Corp. 2000,2005          */
/* All Rights Reserved                                                    */
/*                                                                        */
/* US Government Users Restricted Rights - Use, duplication or            */
/* disclosure restricted by GSA ADP Schedule Contract with IBM Corp.      */
/*                                                                        */
/* IBM_PROLOG_END_TAG                                                     */
#ifndef H_errlog
#define H_errlog
/* @(#)74        1.7  src/bos/usr/ccs/lib/liberrlog/errlog.h, cmderrlg, bos53D, d2005_09B1 2/24/05 15:34:58 */

/*
 * COMPONENT_NAME: CMDERRLG   system error logging and reporting facility
 *
 * External definitions and declarations for liberrlog.a
 *
 */


#include <sys/types.h>
#include <sys/err_rec.h>

typedef void *errlog_handle_t;

/*
 *  These magic numbers will indicate which version of errlog
 *  entry is being returned.
 *  All users of errlog_entry_t should use only LE_MAGIC.
 */
#define LE_MAGIC_41 0x0C3DF420
/* LE_MAGIC434_INTERUM is an interum 43T magic, before le_errdiag was added. */
#define LE_MAGIC434_INTERUM 0x0C3DF434
#define LE_MAGIC434 0x0C4DF434
#define LE_MAGIC52F 0x0C4DF52F
#define LE_MAGIC53D 0x0C4DF53D
#define LE_MAGIC   LE_MAGIC53D          /* current errlog_open magic # */
/* VALID_LE_MAGIC gives valid magic numbers for an error log record. */
#define VALID_LE_MAGIC(m) (((m) == LE_MAGIC_41) || \
                ((m) == LE_MAGIC434_INTERUM) || ((m) == LE_MAGIC434))
/* VALID_LENTRY_MAGIC gives valid magic numbers for errlog_open(). */
#define VALID_LENTRY_MAGIC(m) (((m) == LE_MAGIC) || ((m) == LE_MAGIC434) ||\
                               ((m) == LE_MAGIC52F))

/*
 * Optional duplicate information.
 */
struct errdup {
    unsigned int        ed_dupcount;
    time32_t            ed_time1;
    time32_t            ed_time2;
};

/* Lengths of the various fields in the structure. */
#define LE_LABEL_MAX            20
#define LE_MACHINE_ID_MAX       32
#define LE_NODE_ID_MAX          32
#define LE_CLASS_MAX            2
#define LE_TYPE_MAX             5
#define LE_RESOURCE_MAX         16
#define LE_RCLASS_MAX           16
#define LE_RTYPE_MAX            16
#define LE_VPD_MAX              512
#define LE_IN_MAX               256
#define LE_CONN_MAX             20
#define LE_DETAIL_MAX           ERR_REC_MAX
#define LE_SYMPTOM_MAX          312
#define LE_ERRDUP_MAX           sizeof(struct errdup)

/* The data structure that contains an errlog entry */
typedef struct errlog_entry {
    unsigned int        el_magic;
    unsigned int        el_sequence;
    char                el_label[LE_LABEL_MAX];
    unsigned int        el_timestamp;
    unsigned int        el_crcid;
    unsigned int        el_errdiag;
    char                el_machineid[LE_MACHINE_ID_MAX];
    char                el_nodeid[LE_NODE_ID_MAX];
    char                el_class[LE_CLASS_MAX];
    char                el_type[LE_TYPE_MAX];
    char                el_resource[LE_RESOURCE_MAX];
    char                el_rclass[LE_RCLASS_MAX];
    char                el_rtype[LE_RTYPE_MAX];
    char                el_vpd_ibm[LE_VPD_MAX];
    char                el_vpd_user[LE_VPD_MAX];
    char                el_in[LE_IN_MAX];
    char                el_connwhere[LE_CONN_MAX];
    unsigned short      el_flags;
    unsigned short      el_detail_length;
    char                el_detail_data[LE_DETAIL_MAX];
    unsigned int        el_symptom_length;
    char                el_symptom_data[LE_SYMPTOM_MAX];
    struct errdup       el_errdup;
} errlog_entry_t;


/* Values for the el_flags element. */
#define LE_FLAG_ERR64           0x01
#define LE_FLAG_ERRDUP          0x100

/*
 *  This structure is used to pass search criteria to errlog_find_first.

 *  To use it an operation is put in em_op.  If it is a leaf operation,
 *  the field in errlog_entry_t to apply the op to is put in em_field and
 *  the value to compare against is put in em_strvalue or em_intvalue.
 *  Boolean values are put in em_intvalue.
 *
 *  To connect operations, a unary or binary operator is put in em_op.
 *  The operation(s) to apply the operator to are put in em_left and,
 *  if it's a binary operator, em_right.
 */

typedef struct errlog_match {
    unsigned int                em_op;
    union {
        struct errlog_match     *emu_left;
        unsigned int            emu_field;
    } emu1;
    union {
        struct errlog_match     *emu_right;
        unsigned int            emu_intvalue;
        unsigned char           *emu_strvalue;
    } emu2;
} errlog_match_t;

#define em_left         emu1.emu_left
#define em_field        emu1.emu_field
#define em_right        emu2.emu_right
#define em_intvalue     emu2.emu_intvalue
#define em_strvalue     emu2.emu_strvalue

/* Operators to use in the match structures for the find functions */
#define LE_OP_EQUAL             0x01
#define LE_OP_NE                0x02
#define LE_OP_SUBSTR            0x03
#define LE_OP_LT                0x04
#define LE_OP_LE                0x05
#define LE_OP_GT                0x06
#define LE_OP_GE                0x07
#define LE_OP_LEAF              0x100
#define LE_OP_NOT               0x101
#define LE_OP_AND               0x201
#define LE_OP_OR                0x202
#define LE_OP_XOR               0x203

/* Flags to combine with the field id to indicate the data type of the field */
#define LE_TYPE                 0xff00
#define LE_TYPE_INT             0x0100
#define LE_TYPE_STRING          0x0200
#define LE_TYPE_BOOLEAN         0x0300

/* Flags to indicate which field to match in the find functions. */
#define LE_MATCH_FIELD          0xff
#define LE_MATCH_SEQUENCE       (0x01|LE_TYPE_INT)
#define LE_MATCH_LABEL          (0x02|LE_TYPE_STRING)
#define LE_MATCH_TIMESTAMP      (0x03|LE_TYPE_INT)
#define LE_MATCH_CRCID          (0x04|LE_TYPE_INT)
#define LE_MATCH_MACHINEID      (0x05|LE_TYPE_STRING)
#define LE_MATCH_NODEID         (0x06|LE_TYPE_STRING)
#define LE_MATCH_CLASS          (0x07|LE_TYPE_STRING)
#define LE_MATCH_TYPE           (0x08|LE_TYPE_STRING)
#define LE_MATCH_RESOURCE       (0x09|LE_TYPE_STRING)
#define LE_MATCH_RCLASS         (0x0a|LE_TYPE_STRING)
#define LE_MATCH_RTYPE          (0x0b|LE_TYPE_STRING)
#define LE_MATCH_VPD_IBM        (0x0c|LE_TYPE_STRING)
#define LE_MATCH_VPD_USER       (0x0d|LE_TYPE_STRING)
#define LE_MATCH_IN             (0x0e|LE_TYPE_STRING)
#define LE_MATCH_CONNWHERE      (0x0f|LE_TYPE_STRING)
#define LE_MATCH_FLAG_ERR64     (0x10|LE_TYPE_BOOLEAN)
#define LE_MATCH_FLAG_ERRDUP    (0x11|LE_TYPE_BOOLEAN)
#define LE_MATCH_DETAIL_DATA    (0x12|LE_TYPE_STRING)
#define LE_MATCH_SYMPTOM_DATA   (0x13|LE_TYPE_STRING)
#define LE_MATCH_ERRDIAG        (0x14|LE_TYPE_INT)

/*
 *  Define the directions find can walk through the errlog file.
 */

#define LE_FORWARD              0x01
#define LE_REVERSE              0x02

/*
 * Define the errors that the functions can return.
 */

#define LE_ERR_INVARG   0x01            /* Invalid input argument */
#define LE_ERR_NOFILE   0x02            /* The errlog file can't be opened */
#define LE_ERR_INVFILE  0x03            /* The errlog file isn't valid */
#define LE_ERR_NOMEM    0x04            /* We're out of memory */
#define LE_ERR_NOWRITE  0x05            /* Can't write entry back */
#define LE_ERR_IO       0x06            /* IO error in the errlog file */
#define LE_ERR_DONE     0x07            /* The find function reached the end */

/*
 * These are the functions that comprise the API
 */
extern int errlog_open(char             *path,
                       int              mode,
                       unsigned int     magic,
                       errlog_handle_t  *handle);

extern int errlog_close(errlog_handle_t handle);

extern int errlog_find_first(errlog_handle_t    handle,
                             errlog_match_t     *filter,
                             errlog_entry_t     *result);

extern int errlog_find_next(errlog_handle_t     handle,
                            errlog_entry_t      *result);

extern int errlog_find_sequence(errlog_handle_t handle,
                                int             sequence,
                                errlog_entry_t  *result);

extern int errlog_set_direction(errlog_handle_t handle,
                                int             direction);

extern int errlog_write(errlog_handle_t         handle,
                        errlog_entry_t          *data);

#endif
clsprod@starboss:/usr/include $



clsprod@starboss:/usr/include/sys $ cat errno.h
/* IBM_PROLOG_BEGIN_TAG                                                   */
/* This is an automatically generated prolog.                             */
/*                                                                        */
/* bos530 src/bos/kernel/sys/errno.h 1.27.1.23                            */
/*                                                                        */
/* Licensed Materials - Property of IBM                                   */
/*                                                                        */
/* (C) COPYRIGHT International Business Machines Corp. 1985,1995          */
/* All Rights Reserved                                                    */
/*                                                                        */
/* US Government Users Restricted Rights - Use, duplication or            */
/* disclosure restricted by GSA ADP Schedule Contract with IBM Corp.      */
/*                                                                        */
/* IBM_PROLOG_END_TAG                                                     */
/* @(#)49       1.27.1.23  src/bos/kernel/sys/errno.h, incstd, bos530 1/25/01 16:31:11 */
/*
 * COMPONENT_NAME: (INCSTD) Standard Include Files
 *
 * FUNCTIONS:
 *
 * ORIGINS: 27,71
 *
 * (C) COPYRIGHT International Business Machines Corp. 1985, 1996
 * All Rights Reserved
 * Licensed Materials - Property of IBM
 *
 * US Government Users Restricted Rights - Use, duplication or
 * disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
 */
/*
 * (c) Copyright 1990, 1991, 1992 OPEN SOFTWARE FOUNDATION, INC.
 * ALL RIGHTS RESERVED
 */

#ifndef _H_ERRNO
#define _H_ERRNO
#include <standards.h>

/*
 *      Error codes
 *
 *      The ANSI, POSIX, and XOPEN standards require that certain values be
 *      in errno.h.  The standards allow additional macro definitions,
 *      beginning with an E and an uppercase letter.
 *
 */

#ifdef _ANSI_C_SOURCE

#ifndef _KERNEL

#if defined(_THREAD_SAFE) || defined(_THREAD_SAFE_ERRNO)
/*
 * Per thread errno is provided by the threads provider. Both the extern int
 * and the per thread value must be maintained by the threads library.
 */
extern  int     *_Errno( void );
#define errno   (*_Errno())

#else

extern int errno;

#endif  /* _THREAD_SAFE || _THREAD_SAFE_ERRNO */

#endif  /* _KERNEL */

#ifdef _ALL_SOURCE

extern  char    *sys_errlist[];
extern  int     sys_nerr;

#endif /* _ALL_SOURCE */

#define EPERM   1       /* Operation not permitted              */
#define ENOENT  2       /* No such file or directory            */
#define ESRCH   3       /* No such process                      */
#define EINTR   4       /* interrupted system call              */
#define EIO     5       /* I/O error                            */
#define ENXIO   6       /* No such device or address            */
#define E2BIG   7       /* Arg list too long                    */
#define ENOEXEC 8       /* Exec format error                    */
#define EBADF   9       /* Bad file descriptor                  */
#define ECHILD  10      /* No child processes                   */
#define EAGAIN  11      /* Resource temporarily unavailable     */
#define ENOMEM  12      /* Not enough space                     */
#define EACCES  13      /* Permission denied                    */
#define EFAULT  14      /* Bad address                          */
#define ENOTBLK 15      /* Block device required                */
#define EBUSY   16      /* Resource busy                        */
#define EEXIST  17      /* File exists                          */
#define EXDEV   18      /* Improper link                        */
#define ENODEV  19      /* No such device                       */
#define ENOTDIR 20      /* Not a directory                      */
#define EISDIR  21      /* Is a directory                       */
#define EINVAL  22      /* Invalid argument                     */
#define ENFILE  23      /* Too many open files in system        */
#define EMFILE  24      /* Too many open files                  */
#define ENOTTY  25      /* Inappropriate I/O control operation  */
#define ETXTBSY 26      /* Text file busy                       */
#define EFBIG   27      /* File too large                       */
#define ENOSPC  28      /* No space left on device              */
#define ESPIPE  29      /* Invalid seek                         */
#define EROFS   30      /* Read only file system                */
#define EMLINK  31      /* Too many links                       */
#define EPIPE   32      /* Broken pipe                          */
#define EDOM    33      /* Domain error within math function    */
#define ERANGE  34      /* Result too large                     */
#define ENOMSG  35      /* No message of desired type           */
#define EIDRM   36      /* Identifier removed                   */
#define ECHRNG  37      /* Channel number out of range          */
#define EL2NSYNC 38     /* Level 2 not synchronized             */
#define EL3HLT  39      /* Level 3 halted                       */
#define EL3RST  40      /* Level 3 reset                        */
#define ELNRNG  41      /* Link number out of range             */
#define EUNATCH 42      /* Protocol driver not attached         */
#define ENOCSI  43      /* No CSI structure available           */
#define EL2HLT  44      /* Level 2 halted                       */
#define EDEADLK 45      /* Resource deadlock avoided            */

#define ENOTREADY       46      /* Device not ready             */
#define EWRPROTECT      47      /* Write-protected media        */
#define EFORMAT         48      /* Unformatted media            */

#define ENOLCK          49      /* No locks available           */

#define ENOCONNECT      50      /* no connection                */
#define ESTALE          52      /* no filesystem                */
#define EDIST           53      /* old, currently unused AIX errno*/

/* non-blocking and interrupt i/o */
/*
 * AIX returns EAGAIN where 4.3BSD used EWOULDBLOCK;
 * but, the standards insist on unique errno values for each errno.
 * A unique value is reserved for users that want to code case
 * statements for systems that return either EAGAIN or EWOULDBLOCK.
 */
#if _XOPEN_SOURCE_EXTENDED==1
#define EWOULDBLOCK     EAGAIN   /* Operation would block       */
#else /* _XOPEN_SOURCE_EXTENDED */
#define EWOULDBLOCK     54
#endif /* _XOPEN_SOURCE_EXTENDED */

#define EINPROGRESS     55      /* Operation now in progress */
#define EALREADY        56      /* Operation already in progress */

/* ipc/network software */

        /* argument errors */
#define ENOTSOCK        57      /* Socket operation on non-socket */
#define EDESTADDRREQ    58      /* Destination address required */
#define EDESTADDREQ     EDESTADDRREQ /* Destination address required */
#define EMSGSIZE        59      /* Message too long */
#define EPROTOTYPE      60      /* Protocol wrong type for socket */
#define ENOPROTOOPT     61      /* Protocol not available */
#define EPROTONOSUPPORT 62      /* Protocol not supported */
#define ESOCKTNOSUPPORT 63      /* Socket type not supported */
#define EOPNOTSUPP      64      /* Operation not supported on socket */
#define EPFNOSUPPORT    65      /* Protocol family not supported */
#define EAFNOSUPPORT    66      /* Address family not supported by protocol family */
#define EADDRINUSE      67      /* Address already in use */
#define EADDRNOTAVAIL   68      /* Can't assign requested address */

        /* operational errors */
#define ENETDOWN        69      /* Network is down */
#define ENETUNREACH     70      /* Network is unreachable */
#define ENETRESET       71      /* Network dropped connection on reset */
#define ECONNABORTED    72      /* Software caused connection abort */
#define ECONNRESET      73      /* Connection reset by peer */
#define ENOBUFS         74      /* No buffer space available */
#define EISCONN         75      /* Socket is already connected */
#define ENOTCONN        76      /* Socket is not connected */
#define ESHUTDOWN       77      /* Can't send after socket shutdown */

#define ETIMEDOUT       78      /* Connection timed out */
#define ECONNREFUSED    79      /* Connection refused */

#define EHOSTDOWN       80      /* Host is down */
#define EHOSTUNREACH    81      /* No route to host */

/* ERESTART is used to determine if the system call is restartable */
#define ERESTART        82      /* restart the system call */

/* quotas and limits */
#define EPROCLIM        83      /* Too many processes */
#define EUSERS          84      /* Too many users */
#define ELOOP           85      /* Too many levels of symbolic links      */
#define ENAMETOOLONG    86      /* File name too long                     */

/*
 * AIX returns EEXIST where 4.3BSD used ENOTEMPTY;
 * but, the standards insist on unique errno values for each errno.
 * A unique value is reserved for users that want to code case
 * statements for systems that return either EEXIST or ENOTEMPTY.
 */
#if defined(_ALL_SOURCE) && !defined(_LINUX_SOURCE_COMPAT)
#define ENOTEMPTY       EEXIST  /* Directory not empty */
#else   /* not _ALL_SOURCE */
#define ENOTEMPTY       87
#endif  /* _ALL_SOURCE */

/* disk quotas */
#define EDQUOT          88      /* Disc quota exceeded */

#define ECORRUPT        89      /* Invalid file system control data */

/* errnos 90-92 reserved for future use compatible with AIX PS/2 */

/* network file system */
#define EREMOTE         93      /* Item is not local to host */

/* errnos 94-108 reserved for future use compatible with AIX PS/2 */

#define ENOSYS          109     /* Function not implemented  POSIX */

/* disk device driver */
#define EMEDIA          110     /* media surface error */
#define ESOFT           111     /* I/O completed, but needs relocation */

/* security */
#define ENOATTR         112     /* no attribute found */
#define ESAD            113     /* security authentication denied */
#define ENOTRUST        114     /* not a trusted program */

/* BSD 4.3 RENO */
#define ETOOMANYREFS    115     /* Too many references: can't splice */

#define EILSEQ          116     /* Invalid wide character */
#define ECANCELED       117     /* asynchronous i/o cancelled */

/* SVR4 STREAMS */
#define ENOSR           118     /* temp out of streams resources */
#define ETIME           119     /* I_STR ioctl timed out */
#define EBADMSG         120     /* wrong message type at stream head */
#define EPROTO          121     /* STREAMS protocol error */
#define ENODATA         122     /* no message ready at stream head */
#define ENOSTR          123     /* fd is not a stream */

#define ECLONEME        ERESTART /* this is the way we clone a stream ... */

#define ENOTSUP         124     /* POSIX threads unsupported value */

#define EMULTIHOP       125     /* multihop is not allowed */
#define ENOLINK         126     /* the link has been severed */
#define EOVERFLOW       127     /* value too large to be stored in data type */

#endif /* _ANSI_C_SOURCE */

#endif /* _H_ERRNO */
clsprod@starboss:/usr/include/sys $


clsprod@starboss:/usr/include $ file sysexits.h
sysexits.h: ascii text
clsprod@starboss:/usr/include $ cat sysexits.h
/* IBM_PROLOG_BEGIN_TAG                                                   */
/* This is an automatically generated prolog.                             */
/*                                                                        */
/* bos530 src/bos/usr/include/sysexits.h 1.6                              */
/*                                                                        */
/* Licensed Materials - Property of IBM                                   */
/*                                                                        */
/* (C) COPYRIGHT International Business Machines Corp. 1989,1991          */
/* All Rights Reserved                                                    */
/*                                                                        */
/* US Government Users Restricted Rights - Use, duplication or            */
/* disclosure restricted by GSA ADP Schedule Contract with IBM Corp.      */
/*                                                                        */
/* IBM_PROLOG_END_TAG                                                     */
/* @(#)30       1.6  src/bos/usr/include/sysexits.h, incstd, bos530 6/16/90 00:14:57 */
#ifndef _H_SYSEXITS
#define _H_SYSEXITS
/*
 * COMPONENT_NAME: (INCSTD) Standard Include Files
 *
 * FUNCTIONS:
 *
 * ORIGINS: 27
 *
 * (C) COPYRIGHT International Business Machines Corp. 1989
 * All Rights Reserved
 * Licensed Materials - Property of IBM
 *
 * US Government Users Restricted Rights - Use, duplication or
 * disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
 */

/*
**  SYSEXITS.H -- Exit status codes for system programs.
**
**      This include file attempts to categorize possible error
**      exit statuses for system programs, notably delivermail
**      and the Berkeley network.
**
**      Error numbers begin at EX__BASE to reduce the possibility of
**      clashing with other exit statuses that random programs may
**      already return.  The meaning of the codes is approximately
**      as follows:
**
**      EX_USAGE -- The command was used incorrectly, e.g., with
**              the wrong number of arguments, a bad flag, a bad
**              syntax in a parameter, or whatever.
**      EX_DATAERR -- The input data was incorrect in some way.
**              This should only be used for user's data & not
**              system files.
**      EX_NOINPUT -- An input file (not a system file) did not
**              exist or was not readable.  This could also include
**              errors like "No message" to a mailer (if it cared
**              to catch it).
**      EX_NOUSER -- The user specified did not exist.  This might
**              be used for mail addresses or remote logins.
**      EX_NOHOST -- The host specified did not exist.  This is used
**              in mail addresses or network requests.
**      EX_UNAVAILABLE -- A service is unavailable.  This can occur
**              if a support program or file does not exist.  This
**              can also be used as a catchall message when something
**              you wanted to do doesn't work, but you don't know
**              why.
**      EX_SOFTWARE -- An internal software error has been detected.
**              This should be limited to non-operating system related
**              errors as possible.
**      EX_OSERR -- An operating system error has been detected.
**              This is intended to be used for such things as "cannot
**              fork", "cannot create pipe", or the like.  It includes
**              things like getuid returning a user that does not
**              exist in the passwd file.
**      EX_OSFILE -- Some system file (e.g., /etc/passwd, /etc/utmp,
**              etc.) does not exist, cannot be opened, or has some
**              sort of error (e.g., syntax error).
**      EX_CANTCREAT -- A (user specified) output file cannot be
**              created.
**      EX_IOERR -- An error occurred while doing I/O on some file.
**      EX_TEMPFAIL -- temporary failure, indicating something that
**              is not really an error.  In sendmail, this means
**              that a mailer (e.g.) could not create a connection,
**              and the request should be reattempted later.
**      EX_PROTOCOL -- the remote system returned something that
**              was "not possible" during a protocol exchange.
**      EX_NOPERM -- You did not have sufficient permission to
**              perform the operation.  This is not intended for
**              file system problems, which should use NOINPUT or
**              CANTCREAT, but rather for higher level permissions.
**              For example, kre uses this to restrict who students
**              can send mail to.
**
*/

# define EX_OK          0       /* successful termination */

# define EX__BASE       64      /* base value for error messages */

# define EX_USAGE       64      /* command line usage error */
# define EX_DATAERR     65      /* data format error */
# define EX_NOINPUT     66      /* cannot open input */
# define EX_NOUSER      67      /* addressee unknown */
# define EX_NOHOST      68      /* host name unknown */
# define EX_UNAVAILABLE 69      /* service unavailable */
# define EX_SOFTWARE    70      /* internal software error */
# define EX_OSERR       71      /* system error (e.g., can't fork) */
# define EX_OSFILE      72      /* critical OS file missing */
# define EX_CANTCREAT   73      /* can't create (user) output file */
# define EX_IOERR       74      /* input/output error */
# define EX_TEMPFAIL    75      /* temp failure; user is invited to retry */
# define EX_PROTOCOL    76      /* remote error in protocol */
# define EX_NOPERM      77      /* permission denied */
# define EX_CONFIG      78      /* configuration error */
# define EX_DB          79      /* database access error */

#endif /* _H_SYSEXITS */


104 sh             2396372       6.338403430       0.002943           return from execve. error ENOEXEC [36 usec]
101 sh             2396372       6.338414084       0.010654           open LR = 10003088



ioctl and related:
==================


The ioctl subroutine performs a variety of control operations on the object associated with the specified open file descriptor. 
This function is typically used with character or block special files, sockets, or generic device support 
such as the termio general terminal interface.

The control operation provided by this function call is specific to the object being addressed, 
as are the data type and contents of the Argument parameter. The ioctlx form of this function can be used to pass 
an additional extension parameter to objects supporting it. The ioct132 and ioct132x forms of this function behave in 
the same way as ioctl and ioctlx, but allow 64-bit applications to call the ioctl routine for an object that does not 
normally work with 64-bit applications.

Performing an ioctl function on a file descriptor associated with an ordinary file results in an error being returned.

EBADF	 The FileDescriptor parameter is not a valid open file descriptor. 
EFAULT	 The Argument or Ext parameter is used to point to data outside of the process address space. 
EINTR	 A signal was caught during the ioctl or ioctlx subroutine and the process had not enabled re-startable subroutines for the signal. 
EINTR	 A signal was caught during the ioctl , ioctlx , ioctl32 , or ioct132x subroutine and the process had not enabled re-startable subroutines for the signal. 
EINVAL	 The Command or Argument parameter is not valid for the specified object. 
ENOTTY	 The FileDescriptor parameter is not associated with an object that accepts control functions. 
ENODEV	 The FileDescriptor parameter is associated with a valid character or block special file, but the supporting device driver does not support the ioctl function. 
ENXIO	 The FileDescriptor parameter is associated with a valid character or block special file, but the supporting device driver is not in the configured state. 
	 Object-specific error codes are defined in the documentation for associated objects. 



tecad error:
============

IZ37728: TECAD_SNMP CRASHES ON AIX 5.3 SP8 OR GREATER
  

 A fix is available 
3.9.0.8-TIV-TEC-IF0106 IBM Tivoli Enterprise Console Version 3.9 Interim Fix

 


APAR status
Closed as program error.

Error description 
TEC 3.9
AIX 5.3 SP8 or greater
When traps are received by the snmp adapter, it crashes

Following is some data from the truss and adapter output:

Truss 1 Output:

open("/usr/lib/nls/msg/en_US/libc.cat", O_RDONLY) = 9
kioctl(9, 22528, 0x00000000, 0x00000000) Err#25 ENOTTY
kfcntl(9, F_SETFD, 0x00000001)   = 0
kioctl(9, 22528, 0x00000000, 0x00000000) Err#25 ENOTTY
kread(9, "\0\001 ?\007\007 I S O 8".., 4096) = 4096
lseek(9, 0, 1)     = 4096
lseek(9, 0, 1)     = 4096
lseek(9, 0, 1)     = 4096
_getpid()     = 101604
lseek(9, 0, 1)     = 4096
lseek(9, 8069, 0)    = 8069
kread(9, " T h e   s y s t e m   c".., 4096) = 4096
close(9)     = 0
__loadx(0x07000000, 0xF01E0438, 0x0000001A, 0xF015B6F8,
0x100140A3) = 0xF015C35C
__loadx(0x07000000, 0xF01E0444, 0x0000001A, 0xF015B6F8,
0x100140A3) = 0xF015C3A4
__loadx(0x07000000, 0xF01E0450, 0x0000001A, 0xF015B6F8,
0x100140A3) = 0xF015C314
__loadx(0x07000000, 0xF01E045C, 0x0000001A, 0xF015B6F8,
0x100140A3) = 0xF015C3EC
__loadx(0x07000000, 0xF01E0468, 0x0000001A, 0xF015B6F8,
0x100140A3) = 0xF015C428
__loadx(0x05000000, 0x2FF1F6A8, 0x00000960, 0xF015B6F8,
0x00000000) = 0x00000000
kread(8, " h o s t s   n i s _ l d".., 4096) = 0
close(8)     = 0
getdomainname(0xF023D178, 1024)   = 0
getdomainname(0xF023D178, 1024)   = 0
getdomainname(0xF023D178, 1024)   = 0
getdomainname(0xF023D178, 1024)   = 0
_getpid()     = 101604
getuidx(1)     = 0
kwrite(7, " 2 7", 2)    Err#32 EPIPE
    Received signal #13, SIGPIPE [default]
*** process killed ***

Truss 2 Output

open("/tmp/tec_ed ", O_WRONLY|O_CREAT|O_TRUNC,
S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH) = 5
kioctl(5, 22528, 0x00000000, 0x00000000) Err#25 ENOTTY
kfcntl(5, F_GETFL, 0x00000008)   = 1
close(5)     = 0
kread(4, " #\r\n #   " $ I d :   @".., 4096) = 0
close(4)     = 0
open("/tmp/tec_ed ", O_WRONLY|O_CREAT|O_APPEND,
S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH) = 4
klseek(4, 0, 0, 0x00000002)   = 0
kioctl(4, 22528, 0x00000000, 0x00000000) Err#25 ENOTTY
kioctl(4, 22528, 0x00000000, 0x00000000) Err#25 ENOTTY
kwrite(4, " S e p   2 2   2 3 : 2 2".., 114) = 114
close(4)     = 0
kread(3, " #   M o n   J u l   2 1".., 4096) = 0
kfcntl(3, F_GETFL, 0x00000008)   = 0
klseek(3, 0, 0, 0x00000000)   = 0
kread(3, " #   M o n   J u l   2 1".., 4096) = 195
kread(3, " #   M o n   J u l   2 1".., 4096) = 0
kfcntl(3, F_GETFL, 0x00000008)   = 0
klseek(3, 0, 0, 0x00000000)   = 0
kread(3, " #   M o n   J u l   2 1".., 4096) = 195
kread(3, " #   M o n   J u l   2 1".., 4096) = 0
close(3)     = 0
kioctl(1, 22528, 0x00000000, 0x00000000) = 0
kwrite(1, 0xF0220C70, 68)   = 68
sigprocmask(0, 0xF029D7B0, 0xF029D7A8)  = 0
kfork()      = 149840
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000,
0xD006DC80, 0x00000000, 0x1004671F, 0x1004671F, 0x00000000) =
0x00000000
_exit(0)

tecad_snmp.err output:

Tue Sep 23 15:05:57 2008  NORMAL: SELECT ,(00), ibtecad/select.c
line 0220: Correct is TRUE
Tue Sep 23 15:05:57 2008  NORMAL: SELECT ,(00), ibtecad/select.c
line 0267: Finished TECAD_EvalSelect, returning TRUE
Tue Sep 23 15:05:57 2008     LOW: KERNEL ,(00), ibtecad/kernel.c
line 0247: Found action is <DirectTalkStatus_Trap>
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0086: Entered Eval_Fetch
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0109: --get FetchVar, i=0
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0120: --calling EvalFetchExpression
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0187: Entered Eval_Fetch_Expression
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0190: -- argc1
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0197: -- argv not null
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0203: -- loop over all fetches, i=0
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0212: -- Current_Expression not null
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0187: Entered Eval_Fetch_Expression
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0190: -- argc0
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0197: -- argv not null
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0236: -- do the required operation
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0245:   -- Expression->Index=6
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0246:   -- argc=0
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0250:   -- argv not NULL
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0255:   -- Expression->Operator not NULL
Tue Sep 23 15:05:57 2008 VERBOSE: KERNEL ,(00), cad/evaluation.c
line 0271: TECAD_GetGlobalEntry Index <6>
Tue Sep 23 15:05:57 2008 VERBOSE: UTILS  ,(00), /configuration.c
line 0521: Entering TECAD_CopyAttributeEntry
Tue Sep 23 15:05:57 2008 VERBOSE: UTILS  ,(00), /configuration.c
line 0161: Entering TECAD_MakeAttributeEntry
Tue Sep 23 15:05:57 2008 VERBOSE: UTILS  ,(00), /configuration.c
line 0185: Leaving TECAD_MakeAttributeEntry
Tue Sep 23 15:05:57 2008 VERBOSE: UTILS  ,(00), /configuration.c
line 0563: Leaving TECAD_CopyAttributeEntry
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0267: -- clear the memory
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0270: Finished Eval_Fetch_Expression
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0222: -- result not null
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0236: -- do the required operation
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0245:   -- Expression->Index=1
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0246:   -- argc=1
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0250:   -- argv not NULL
Tue Sep 23 15:05:57 2008  NORMAL: FETCH  ,(00), libtecad/fetch.c
line 0255:   -- Expression->Operator not NULL
Local fix 
Problem summary 
****************************************************************
* USERS AFFECTED: All TEC users running the SNMP adapter on AIX.
****************************************************************
* PROBLEM DESCRIPTION: When traps are received by the SNMP
*   adapter running on AIX, it crashes.
****************************************************************
* RECOMMENDATION: Apply the maintenance listed below.
****************************************************************
Problem conclusion 
The adapter was being killed due to a SIGPIPE signal.  This
signal will now be ignored.

The fix for this APAR is contained in the following maintenance
packages:
  | interim fix | 3.9.0.8-TIV-TEC-IF0106
Temporary fix 
Comments  




Universal Command:
==================

UC facilitates jobscheduling from Mainframe to AIX and HP-UX systems.

AIX:
# lslpp -La 'UCmdP'
HP:
# swlist -l subproduct UCmd


Tiger:
======

Tiger is a security tool that can be used as a security and intrusion detection system. It works at many
platforms and is provided under the GPL license. So its free software. 
Its written entirely in shell language.


5FC2DD4B  PING TO REMOTE HOST FAILED:
=====================================

AIX ONLY:

In errpt you might find:

[pl101][tdbaprod][/home/tdbaprod] errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
5FC2DD4B   0225151808 I H ent2           PING TO REMOTE HOST FAILED
9F7B0FA6   0225151108 I H ent2           PING TO REMOTE HOST FAILED

LABEL:          ECH_PING_FAIL_BCKP
IDENTIFIER:     5FC2DD4B

Date/Time:       Mon Feb 25 14:41:06 2008
Sequence Number: 2140
Machine Id:      00CB85FF4C00
Node Id:         pl101
Class:           H
Type:            INFO
Resource Name:   ent2
Resource Class:  adapter
Resource Type:   ibm_ech
Location:

Description
PING TO REMOTE HOST FAILED

Probable Causes
CABLE
SWITCH
ADAPTER

Failure Causes
CABLES AND CONNECTIONS

        Recommended Actions
        CHECK CABLE AND ITS CONNECTIONS
        IF ERROR PERSISTS, REPLACE ADAPTER CARD.

Detail Data
FAILING ADAPTER
ent1
SWITCHING TO ADAPTER
ent0
Unable to reach remote host through backup adapter: switching over to primary adapter


-- thread 1:

All our servers every three minutes logs this message:

9F7B0FA6   0602080605 I H ent2           PING TO REMOTE HOST FAILED

The details of the message says it can't ping the default gateway through backup adapter.
Why does it try this? Why does it fail because if we pull the primary cable it switches 
to the backup adapter with no problems.

Cheers


-- thread 2:

Hello:

I've seen similar things happen when the switch is not on "port host" (meaning the port begins receiving and sending 
packets quickly, instead of running Spanning Tree Protocol before going in the FORWARDING state): in this case, 
the EtherChannel sends the ping packets, they are dropped because the switch is still initializing, 
and the cycle continues on and on. Still, 30 minutes sounds like a long time.

You can try the following:

- verify that the EtherChannel switch ports are set to "port host" (i.e., STP should be disabled)

on the VIOS, set the num_retries to a higher value (default is 3) and/or set the retry_time to a higher value (default is 1) 

Does this ONLY happen when updating from FP74 to FP8, or every time the VIOS boots?

Kind regards,


-- thread 3:

Hi All, 
I am getting the following error consistently on one of my servers. when i 
do a entstat -d ent3 | grep "Active channel", it does come back with Active 
channel: primary channel. Could you please provide me with any suggestions 
or steps I can take to fix this error? 

entstat -d ent2 | grep "Active channel"

Hi 
Just Etherchannel or Etherchannel with Backup Adapter connected to a failover Switch just in case everything fails ?? 
If so, please take a read of the following: 
http://publib.boulder.ibm.com/infocenter/clresctr/v xrx/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsct _aix5l53/bl5adm05/bl5adm0559.html 
Hope this helps


-- thread 4:

A VIOS network failover test produces the above error messages, so in that case there is no real problem.



Perl "out of memory!" error:
============================

Note 1:
-------

Q:

I'm getting errors like this 


Out of memory during "large" request for 528384 bytes, total sbrk() is 
16302080 bytes at ... line 176. 


when the server is busy. This is for a background perl script. What 
I would like to do is to check to see if memory for the request is 
available before running the offending the command, and just pause 
everything until memory is available. Is that possible? 


Thanks all, 

A:

You could try investigating $^M... 


Note 2:
-------

In the last episode (Mar 01), Per olof Ljungmark said:
> I'm running imapsync that also uses p5-Mail-IMAPClient-2.2.9 to transfer 
> mailboxes between imap servers.
> 
> The following error occurs when a message has an attachment of more that 
> approx 35MB in size:
> "Out of memory during "large" request for 67112960 bytes, total sbrk() 
> is 487512064 bytes at /usr/local/bin/imapsync line 790."

According to that output, perl was already using 464MB, and a malloc
request for 64MB failed, which is reasonable since the default hard
datasize limit on FreeBSD is 512MB.  To raise it, put this in
/boot/loader.conf and reboot:

kern.maxdsiz="1024M"

> Running 5.3-STABLE a week or so old
> Perl 5.8.6 from ports.
> 
> What bothers me especially is that this error will not occur when I run 
> the same command from a old RH Linux (7.2) box. Appriciate comments 

Also running perl 5.8.6, on the same mailbox?  Maybe different perl
versions allocate memory differently.


Note 3:
-------

Out of memory during "large" request for 528384 bytes, total sbrk() is 2330624 bytes at citationBuilder.pl line 37.

The script parses through a table to build another table. The first table has about 550 entries and the second 
that gets built by the script should have around 3500.

Not sure what to do to try to fix this. Any suggestions?


I am trying to use cpan and met the same problem.
Error message is as follows:

[xxx@alberni ~]$ cpan -i Module::Build


Out of memory during request for 4064 bytes, total sbrk() is 41822208 bytes!
Segmentation fault: 11

I want to use Plagger(http://plagger.org).
Does anyone use Plagger on Textdrive?


Website  Expertise Re: out of memory error in perl script

Are you loading either of the tables in memory completely?

i.e.: instead of this code:


Code:
foreach my $line (<FILE>) # reads the whole file into @_
{
    process($line);
}

Do this: 


Code:
while (my $line = <FILE>) # only reads one line into memory at a time
{
    process($line);
}

Expertise Re: out of memory error in perl scriptHi, stewartj,
Thank you for your response.

Well, I am not sure if it reads all the tables to memory.
I just try to use cpan command which I think built in perl distribution.


Note 4:
-------

Out of memory!
Callback called exit.


Callback called exit" is just a generic message when Perl encounters an unrecoverable error during perl_call_sv( ). 
mod_perl uses perl_call_sv( ) to invoke all handler subroutines. Such problems seem to occur far less often 
with Perl Version 5.005_03 than 5.004. It shouldn't appear with Perl Version 5.6.1 and higher.

Sometimes you discover that your server is not responding and its error_log file has filled up the remaining space 
on the filesystem. When you finally get to see the contents of the error_log file, it includes millions of lines like this:

Callback called exit at -e line 33, <HTML> chunk 1.
This is because Perl can get very confused inside an infinite loop in your code. It doesn't necessarily mean 
that your code called exit( ). It's possible that Perl's malloc( ) went haywire and called croak( ), 
but no memory was left to properly report the error, so Perl gets stuck in a loop writing that same message to STDERR.

Perl Version 5.005 and higher is recommended for its improved malloc.c, and also for other features that improve 
the performance of mod_perl and are turned on by default.


Note 5:
-------

Glenn wrote:
[...]
>>http://perl.apache.org/docs/1.0/guide/troubleshooting.html#Callback_called_exit
> 
> 
> I've followed that advice and explicitly allocated memory into $^M.
> I have the following in my mod_perl_startup.pl, which I run from
> httpd.conf with PerlRequire /path/to/mod_perl_startup.pl
> If 64K is not enough for you, try increasing the allocation.
> 
> Cheers,
> Glenn
> 
> use strict;
>                                                                                 
> ## ----------
> ## ----------
>                                                                                 
> ## This section is similar in scope to Apache::Debug.
> ## Delivers a stack backtrace to the error log when perl code dies.
>                                                                                 
> ## Allocate 64K as an emergency memory pool for use in out of memory situation
> $^M = 0x00 x 65536;
>                                                                                 
> ## Little trick to initialize this routine here so that in the case of OOM,
> ## compiling this routine doesn't eat memory from the emergency memory pool $^M
> use CGI::Carp ();
> eval { CGI::Carp::confess('init') };
>                                                                                 
> ## Importing CGI::Carp sets $main::SIG{__DIE__} = \&CGI::Carp::die;
> ## Override that to additionally give a stack backtrace
> $main::SIG{__DIE__} = \&CGI::Carp::confess;

Brian, you may want to include Glenn's useful tips as well in the patch.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

-- 
Report problems: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html


Note 6:
-------








midaemon, scopeux, measureware:
===============================

The scopeux data collector, the midaemon (measurement interface daemon), and the alarmgen (alarm generator)
process, are all part of HP Openview Measureware software that can run on a node.

A typical process list might show the following:

zd57l08:/home/root>ps -ef | grep /usr/lpp/perf/bin/
    root  176232       1   0   Dec 20      -  0:17 /usr/lpp/perf/bin/midaemon
    root  188536       1   0   Dec 20      -  0:00 /usr/lpp/perf/bin/ttd
    root  200830  254112   0   Dec 20      -  5:06 /usr/lpp/perf/bin/alarmgen -svr 254112 -t alarmgen /var/opt/perf/datafiles//
    root  204918       1   0   Dec 20      - 106:40 /usr/lpp/perf/bin/scopeux
    root  233644       1   0   Dec 20      -  0:07 /usr/lpp/perf/bin/llbd
    root  254112  266370   0   Dec 20      -  6:56 /usr/lpp/perf/bin/agdbserver -t alarmgen /var/opt/perf/datafiles/
    root  266370       1   0   Dec 20      -  3:42 /usr/lpp/perf/bin/perflbd
    root  307362  266370   0   Dec 20      -  8:22 /usr/lpp/perf/bin/rep_server -t SCOPE /var/opt/perf/datafiles/logglob

You can start or stop or view the status of the processes, by using the mwa command:

oranh202:/home/se1223>mwa status
MeasureWare scope status:
WARNING: scopeux    is not active (MWA data collector)

MeasureWare background daemon status:
(Should always be running when the system is up)
    Running ttd                   (Transaction Tracker daemon) pid 1493

MeasureWare server status:
    Running alarmgen              (alarm generator) pid 12547
    Running agdbserver            (alarm database server) pid 12546
    Running perflbd               (location broker) pid 12075

    The following data sources have running repository servers:
                            PID  DATA SOURCE
    Running rep_server    12521  SCOPE


How does mwa start on boot?
---------------------------

There is an entry in "/etc/inittab":

root@zd110l01.nl.eu.abnamro.com:/etc#cat inittab | grep mwa
mwa:2:wait:/etc/rc.mwa start >/dev/console  # Start MeasureWare


Check on whether its running:

oper@zd110l03:/home/oper$ ps -ef | grep /usr/lpp/
    root 147480      1   0   May 15      -  0:10 /usr/lpp/perf/bin/midaemon
    root 151694      1   0   May 02      - 38:59 /usr/lpp/perf/bin/llbd
    root 229394 340068   0   May 15      -  7:30 /usr/lpp/perf/bin/rep_server -t SCOPE /var/opt/perf/datafiles/logglob
    root 258142 393340   0   Jul 18      -  9:04 /usr/lpp/mmfs/bin/aix64/mmfsd64
    root 270588 372908   0   May 15      -  4:07 /usr/lpp/perf/bin/alarmgen -svr 372908 -t alarmgen /var/opt/perf/datafiles//
    root 315572      1   0   May 15      - 40:33 /usr/lpp/perf/bin/scopeux
    root 340068      1   0   May 15      -  4:45 /usr/lpp/perf/bin/perflbd
    root 352326      1   0   May 15      -  0:00 /usr/lpp/perf/bin/ttd
    root 372908 340068   0   May 15      -  6:19 /usr/lpp/perf/bin/agdbserver -t alarmgen /var/opt/perf/datafiles/
    root 393340      1   0   Jul 18      -  0:00 /bin/ksh /usr/lpp/mmfs/bin/runmmfs
  uctsp0 434430      1   0   May 04      - 58:36 /usr/lpp/uctsp0/control-sa/exe/AIX-51/p_ctsce
  uctsp0 503814 434430   0 22:41:01      -  0:00 /usr/lpp/uctsp0/control-sa/exe/AIX-51/./p_ctscd 57671683 57671682 434430
  uctsp0 536756 434430   0 22:41:01      -  0:00 /usr/lpp/uctsp0/control-sa/exe/AIX-51/./p_ctscs 57671681 60817408 434430


-- How to stop and start the agents:

1. Use the /etc/rc.mwa script, if its available on your system:

root@zd110l13:/etc#rc.mwa stop

Shutting down Measureware collection software
         Shutting down scopeux, pid(s) 192688
         Waiting on 192688  (10 more tries)
         The Measureware collector, scopeux has been shut down successfully.

Shutting down the MeasureWare server daemons
         Shutting down the alarmgen process.  This may take a while
         depending upon how many monitoring systems have to be
         notified that MeasureWare Server is shutting down.


         Shutting down the perflbd process

         The perflbd process has terminated

         Shutting down the agdbserver process

         The agdbserver process has terminated

         Shutting down the rep_server processes

         The rep_server processes have terminated

The MeasureWare Server has been shut down successfully
root@zd110l13:/etc#rc.mwa start

The MeasureWare scope collector is being started.
         The Transaction Tracking daemon
         /usr/lpp/perf/bin/ttd has been started.

         The performance collection daemon
         /usr/lpp/perf/bin/scopeux has been started.

The MeasureWare server daemon "llbd" is being started.

The MeasureWare server daemons are being started.
         The MeasureWare Location Broker daemon
         /usr/lpp/perf/bin/perflbd has been started.


root@zd110l13:/etc#


Notes about mwa:
----------------

About MWA(MeasureWare Agent)
Introduction:
MeasureWare Agent software captures performance, resource, and transaction data from your 
HP 9000 server or workstation system.
Using minimal system resources, the software continuously collects, logs, summarizes, and 
timestamps data,and detects alarm conditions on current and historical data across your system.
You can analyze the data using spreadsheet programs, Hewlett-Packard analysis products such as PerfView,
or third-party analysis products.
Also, MeasureWare Agent provides data access to PerfView and sends alarm notifications to PerfView, 
HP OpenView, and IT/Operations.

HP OpenView MeasureWare Agent for UNIX has been renamed to HP OpenView Performance Agent for UNIX.

MeasureWare Agent uses data source integration (DSI) technology to receive, alarm on, and log data 
from external data sources such as applications, databases, networks, and other operating systems.
The comprehensive data logged and stored by MeasureWare Agent allows you to:
 Characterize the workloads in the environment.
 Analyze resource usage and load balance.
 Perform trend analyses on historical data to isolate and identify bottlenecks.
 Perform service-level management based on transaction response time.
 Perform capacity planning.
 Respond to alarm conditions.
 Solve system management problems before they arise.

Starting MWA Automatically
The process of starting MeasureWare Agent automatically whenever the system reboots is controlled 
by the configuration file /etc/rc.config.d/mwa.

This file defines two shell variables, MWA_START and MWA_START_COMMAND.

The default /etc/rc.config.d/mwa configuration file shipped with this version of MeasureWare Agent 
resides in /opt/perf/newconfig/
and assigns the following values to these variables:
MWA_START=1
MWA_START_COMMAND="/opt/perf/bin/mwa start"

When MeasureWare Agent is installed, the file is conditionally copied to /etc/rc.config.d/mwa and will 
not replace any existing /etc/rc.config.d/mwa configuration file that may have been customized by the user 
for a previous version of MeasureWare Agent.
When the file is copied to /etc/rc.config.d/mwa, the variable MWA_START=1 causes MeasureWare Agent to automatically 
start when the system reboots.
The variable MWA_START_COMMAND="/opt/perf/bin/mwa start" causes all MeasureWare Agent processes to initiate 
when the system reboots.

If you want MeasureWare Agent to start at system reboot using special options,
modify the /etc/rc.config.d/mwa file by changing MWA_START_COMMAND from its default value of 
"/opt/perf/bin/mwa start" to the desired value.

For example, to start up scopeux but not the servers, change the value to "/opt/perf/bin/mwa start scope".
To disable MeasureWare Agent startup when the system reboots, change the variable MWA_START=1 to MWA_START=0.

MWA Command:
SYNOPSIS
      mwa [action] [subsystem] [parms]

DESCRIPTION
      mwa is a script  that  is  used  to  start,  stop,  and  re-initialize MeasureWare Agent processes.

 ACTION
-?      List all mwa options.
        If your shell interprets ? as a wildcard character, use an invalid option such as -xxx nstead of -?.
start   Start all or part of MeasureWare Agent.  (default)
stop    Stop all or part of MeasureWare Agent.
restart Reinitialize all or part of MWA. This option causes some processes to be stopped and restarted.
status  List the status of all or part of MWA processes.
version List the version of the all or part of MWA files.

 SUBSYSTEM
all Perform the selected action on  all  MWA components. (default)
    scope Perform the selected action on the scopeux collector.
    The  restart  operation causes the scopeux collector to stop, then restart. 
    This causes the parm and ttd.conf files to be re-read.

server Perform the selected action on the MWA server components. 
          This affects the data repositories as well as the alarm generation subsystem. 
          The restart operation causes all repository server processes to terminate and restart.
          This causes the perflbd.rc and alarmdef files to be re-read.

alarm Perform the selected action on the MWA server alarm component.
         Restart is the only valid option and causes the alarmdef file to be reprocessed.

db  Perform the selected action on  the MWA server db component.

 PARMS
-midaemon <miparms> Provide the midaemon with parameters to initiate it with other than default parameters. 

Example:
phred01:/> mwa status
MeasureWare scope status:
WARNING: scopeux    is not active (MWA data collector)

MeasureWare background daemon status:
(Should always be running when the system is up)
    Running ttd                   (Transaction Tracker daemon) pid 1900

MeasureWare server status:
    Running alarmgen              (alarm generator) pid 2816
    Running agdbserver            (alarm database server) pid 2815
    Running perflbd               (location broker) pid 1945

    The following data sources have running repository servers:
                            PID  DATA SOURCE
    Running rep_server     2810  SCOPE
phred01:/> mwa stop

         Shutting down Measureware collection software..
NOTE:    The Transaction Tracker daemon, ttd  will be left running. pid 1900

         Shutting down the MeasureWare server daemons..
         Shutting down the alarmgen process.  This may take awhile
         depending upon how many monitoring systems have to be
         notified that MeasureWare Server is shutting down.
        
         The alarmgen process has terminated

         Shutting down the perflbd process

         The perflbd process has terminated

         The agdbserver process terminated

         The rep_server processes have terminated

         The MeasureWare Server has been shut down successfully

phred01:/> mwa start

The Transaction Tracker daemon is being started.
         The Transaction Tracker daemon
         /opt/perf/bin/ttd, is already running.

The MeasureWare scope collector is being started.
         The performance collection daemon
         /opt/perf/bin/scopeux has been started.
The MeasureWare server daemons are being started.
         The MeasureWare Location Broker daemon
         /opt/perf/bin/perflbd has been started.

phred01:/> mwa status
MeasureWare scope status:
    Running scopeux               (MWA data collector) pid 12361
    Running midaemon              (measurement interface daemon) pid 1936

MeasureWare background daemon status:
    Running ttd                   (Transaction Tracker daemon) pid 1900

MeasureWare server status:
    Running alarmgen              (alarm generator) pid 12907
    Running agdbserver            (alarm database server) pid 12906
    Running perflbd               (location broker) pid 12369

    The following data sources have running repository servers:
                            PID  DATA SOURCE
    Running rep_server    12905  SCOPE

References:
HP OpenView Performance Agent for HP-UX 10.20 and 11 Installation & Configuration Guide
man mwa(Command)



Thread 1 about scopeaux:
------------------------

Subject: restart vrs. start scopeux

mwa status reports scopeux not running. Manual states to use restart command to retain existing logs (status.scope). 
But, I'm more concerned about the database collected prior to "mysterious" end of scopeux. Will restart (or start) 
of scope (scopeux) preserve existing data?

Thanks.
Vic. 

Once the data is written to the log files, it stays there when scopeux stops and starts. The data is deleted 
after the logfile reachs its size limit and starts to wrap. The oldest data is overwritten first.

I just do a "mwa start scope" to restart scope. I usually don't do a "mwa restart" as sometimes one of the 
processes may not stop, usually perflbd. I do a "mwa stop", check everything "mwa status" then do a "mwa start".

Sometimes when scopeux crashes, it was in the act of writing a record and only a partial record is written. 
This partial record will corrupt the database and scopeux will not start. In this case, the only way to 
start scopeux is to delete the database. It is a good idea to backup the databases frequently if the data is 
important to you.

HTH
Marty

If you only want to work with the Scope Collector itself (I.E. All other MeasureWare processes are running) 
do the following:

mwa start scope
or
mwa restart scope

This will narrow down what part of the MeasureWare product you are working with.

The status.scope file might help you figure out why scope stopped.


To see what may have happened to the scope daemon, look in its status file /var/opt/perf/status.scope. 
You can also use "perfstat -t" to see the last few lines of all the OVPA status files.

Since the "perfstat" command shows glance and OVPA (mwa) status, I recommend using perfstat instead of 
"mwa status" (also less to type!).

Since I'm in a recommending mood, I also recommend AGAINST doing a "mwa start scope" (or restart scope). 
The reason is that its always better to restart the datacomm daemons when the underlying collector is restarted. 
Thus its better to just do a "mwa restart" or "mwa start" instead of restarting scope out from under 
rep_server and friends. 

In any case, if perfstat shows everything running but scopeux, then first find out why scope died 
(by looking at status.scope) before doing any restarts.


Thread 2 about scopeux:
-----------------------

Subject: HELP OVPA: midaemon and scopeux won't start        
Jamie Chui 
 Jun 21, 2007 03:23:12 GMT      

I could not get midaemon and scopeux to start. When using glance, the following error messages appears 
and what does it mean?

midaemon: Mon Jun 11 15:51:05 2007
mi_ki_init - only able to allocate 3 bufsets
Not enough space. 

I am using HP-UX 11.11 with OVPA C.04.55.00. 

Measureware ran for 10 days, and during this period, it had the following error message and then finally one day 
it stopped running. 

**** /opt/perf/bin/scopeux : 06/15/07 13:35:01 ****
WARNING: Measurement Buffers Lost see metric GBL_LOST_MI_TRACE_BUFFERS. (PE221-5
0)

How can I troubleshoot and get it running again?  
 

It looks like your OS is not allocating enough buffer space. You will need to increase your kernel parameters 
pertaining to buffer space and regen the kernel.

HTH
Marty




ctcasd daemon:
==============

The ctcasd daemon is used by the cluster security services library when UNIX-identity-based authentication 
is configured and active within the cluster environment. The cluster security services uses ctcasd 
when service requesters and service providers try to create a secured execution environment through 
a network connection. ctcasd is not used when service requesters and providers establish 
a secured execution environment through a local operating system connection such as a UNIX domain socket.

The daemon is actually part of RSCT, the reliable scalable cluster technology.
The Cluster Security (CtSec) component of RSCT provides a subservice
that performs authentication functions based on host identities.  This
subservice, called "ctcas", comes with a text formatted configuration
file shipped in /usr/sbin/rsct/cfg/ctcasd.cfg. 




Tivoli endpoint, lcfd process:
==============================


Tivoli Management Framework is the software infrastructure for many Tivoli software products. 
Using Tivoli Management Framework and a combination of Tivoli Enterprise products, you can manage 
large distributed networks with multiple operating systems, various network services, diverse system tasks, 
and constantly changing nodes and users. Tivoli Management Framework provides management services that are used 
by the installed Tivoli Enterprise products.

Tivoli Management Framework provides centralized control of a distributed environment, which can include 
mainframes, UNIX(R) operating systems, or Microsoft Windows operating systems. Using Tivoli Enterprise products, 
a single system administrator can perform the following tasks for thousands of networked systems:

-Manage user and group accounts 
-Deploy new or upgrade existing software 
-Inventory existing system configurations 
-Monitor the resources of systems either inside or outside the Tivoli environment 
-Manage Internet and intranet access and control 
-Manage third-party applications

Tivoli Management Framework lets you securely delegate system administration tasks to other administrators, 
giving you control over which systems an administrator can manage and what tasks that administrator 
can perform. Tivoli Management Framework includes the base infrastructure and base set of services 
that its related products use to provide direct control over specific resources in a distributed 
computing environment. Tivoli Management Framework provides a simple, consistent interface to 
diverse operating systems, applications, and distributed services.


>> Architecture overview:

Using this three-tiered hierarchy, the amount of communication with the Tivoli server is reduced. 
Endpoints do not communicate with the Tivoli server, except during the initial login process. 
All endpoint communication goes through the gateway. In most cases, the gateway provides all the support 
an endpoint needs without requiring communication with the Tivoli server.

In a smaller workgroup-size installation, you can create the gateway on the Tivoli server. 
The server can handle communication requirements when fewer computer systems are involved. 
This is not an acceptable option in large deployments. The Tivoli server in a large installation 
will be overloaded if it also serves as a gateway. Refer to Endpoints and gateways for more information 
about endpoint communication.


-- Tivoli servers

The Tivoli server includes the libraries, binaries, data files, and the graphical user interface (GUI) 
(the Tivoli desktop) needed to install and manage your Tivoli environment. The Tivoli server performs 
all authentication and verification necessary to ensure the security of Tivoli data. The following components 
comprise a Tivoli server: 

- An object database, which maintains all object data for the entire Tivoli region. 
- An object dispatcher, which coordinates all communication with managed nodes and gateways. 
  The object dispatcher process is the oserv, which is controlled by the oserv command. 
- An endpoint manager, which is responsible for managing all of the endpoints in the Tivoli region.

When you install the Tivoli server on a UNIX operating system, the Tivoli desktop is automatically installed. 
When you install the Tivoli server on a Windows operating system, you must install Tivoli Desktop for Windows 
separately to use the Tivoli desktop.


-- Managed nodes

A managed node runs the same software that runs on a Tivoli server. Managed nodes maintain their own 
object databases that can be accessed by the Tivoli server. When managed nodes communicate directly 
with other managed nodes, they perform the same communication or security operations that are performed 
by the Tivoli server.

The difference between a Tivoli server and a managed node is that the Tivoli server object database is global 
to the entire region including all managed nodes. In contrast, the managed node database is local to the 
particular managed node.

To manage a computer system that hosts the managed node, install an endpoint on that managed node.


-- Gateways

A gateway controls communication with and operations on endpoints. Each gateway can support thousands of endpoints. 
A gateway can launch methods on an endpoint or run methods on behalf of the endpoint.

A gateway is generally created on an existing managed node. This managed node provides access to the 
endpoint methods and provides the communication with the Tivoli server that the endpoints occasionally require. 
Refer to Endpoints and gateways for more information about gateways.

-- Endpoints
An endpoint provides the primary interface for system management. An endpoint is any system that runs 
the lcfd service (or daemon), which is configured using the lcfd command.

Typically, an endpoint is installed on a computer system that is not used for daily management operations. 
Endpoints run a very small amount of software and do not maintain an object database. The majority of systems 
in a Tivoli environment should be endpoints. The Tivoli desktop is not installed with the endpoint software. 
If you choose to run a desktop on an endpoint, you must install Tivoli Desktop for Windows or 
telnet to a UNIX managed node. Refer to Endpoints and gateways for more information about endpoints.


--- Autostart:

thread:

Autostart of the LCFD process on AIX server
  
 Technote (FAQ) 
  
Problem 
The AIX server comes preloaded with the Tivoli Endpoint software installed. How can you make this process 
autostart at bootup?  
  
Solution 
Create the /etc/inittab entry:

mkitab "rctma1:2:wait:/etc/rc.tma1 > /dev/console 2>&1 # Tivoli Management Agent"

Create the startup file "/etc/rc.tma1"
#!/bin/sh
#
# Start the Tivoli Management Agent
#
if [ -f /Tivoli/lcf/dat/1/lcfd.sh ]; then
/Tivoli/lcf/dat/1/lcfd.sh start
fi

When the OS reboot the LCFD process will automatically start.  

thread:

aix comes with tivoli agents installed (but not configured). you should configure the last.cfg file (usually, 
you can find this in /../lcf/dat/1) add a line 'lcs.logininterfaces=gatewayip+9494' this line will instruct 
the endpoint to report on you existing tivoli gateway.

Hi, previous info is correct accept an writing mistake.
It should be lcs.login_interfaces=ipoftmrserver
stop the endpoint (sh lcfd.sh stop) and start it (sh lcfd.sh start)
After succ. registration you should see an lcf.dat file in the lcf/dat/1 folder.
 
 
 

Note 1:
-------

thread:

Q:

Dear friends, 

When i run "lslpp -l" i have in a line : "Tivoli_Management_Agent.client.rte 
3.7.1.0 COMMITTED Management Framework Endpoint 
Runtime" ". 
What's the purpose of this fileset ? 

Ps : I have Aix 5.3 

thank's a lot. 

A:

The purpose of Tivoli Management Agent
Reply from nlx6976 on 3/6/2006 5:56:00 PM  

Its an agent that runs on your system as part of the Tivoli Distributed 
Monitoring. It reports various things about your sysem back to the Tivoli 
Enterprise Console - usually your help desk. The basic monitors include 
things like file system usage (e.g if a FS is more than 80% used the system 
gets flagged at the console), or monitoring log files. Basically you can 
configure it to monitor whatever you want.


Note 2:
-------


Problem 
The AIX server comes preloaded with the Tivoli Endpoint software installed. How can you make this process 
autostart at bootup?  
  
Solution 
Create the /etc/inittab entry:

mkitab "rctma1:2:wait:/etc/rc.tma1 > /dev/console 2>&1 # Tivoli Management Agent"

Create the startup file "/etc/rc.tma1"
#!/bin/sh
#
# Start the Tivoli Management Agent
#
if [ -f /Tivoli/lcf/dat/1/lcfd.sh ]; then
/Tivoli/lcf/dat/1/lcfd.sh start
fi

When the OS reboot the LCFD process will automatically start.  


Note 3:
-------

The lcfd.log file, found on each endpoint in the lcf/dat directory, contains logging messages for upcall methods, 
downcall methods, and the login activities of the endpoint. You also can view this log file from the http interface. 
In addition, lcfd.log can have different levels of debugging information written to it. 
To set the level of debugging, use the lcfd command with the -dlevel option, which sets the log_threshold option 
in the last.cfg file. Set the log_threshold at level 2 for problem determination, because level 3 often provides 
too much information.

Of the three log files, the lcfd.log file is sometimes the most useful for debugging endpoint problems. 
However, remote access to the endpoint is necessary for one-to-one contact.

Endpoint log messages have the following format: 

timestamp level app_name messageThe message elements are as follows: 

timestamp 
  Displays the date and time that the message was logged. 
level 
  Displays the logging level of the message. 
app_name 
  Displays the name of the application that generated the message. 
message 
  Displays the full message text. The content of message is provided by the application specified in app_name. 

The default limit of the log file is 1 megabyte, which you can adjust with the lcfd (or lcfd.sh) command 
with the -D log_size =max_size option. The valid range is 10240 through 10240000 bytes. When the maximum size 
is reached, the file reduces to a size of approximately 200 messages and continues to log. 

In addition to these three log files, the following files help troubleshoot endpoint problems 
located on the endpoint: 

last.cfg 
  A text file that contains the endpoint and gateway login configuration information from the last time 
  the endpoint successfully logged in to its assigned gateway. Use this file to review the configuration settings 
  for an endpoint. 
lcf.id 
  A text file that contains a unique ID number to represent the endpoint. This file is uniquely generated 
  if the TMEID.tag file does not exist. 
lcf.dat 
  A binary file that contains the gateway login information. You cannot modify this information; however, you can 
  view network configuration information from the http interface. 
  Of these files, the last.cfg file can be useful in determining problems with an endpoint. 
  The last.cfg file resides in the \dat subdirectory of the endpoint installation and also can be viewed 
  from the http interface. This file contains configuration information for the endpoint. 

The following example shows the contents of a last.cfg file: 

lcfd_port=9495
lcfd_preferred_port=9495
gateway_port=9494
protocol=TCPIP
log_threshold=1
start_timeout=120
run_timeout=120
lcfd_version=41100
logfile=C:\Program Files\Tivoli\lcf\dat\1\lcfd.log
config_path=C:\Program Files\Tivoli\lcf\dat\1\last.cfg
run_dir=C:\Program Files\Tivoli\lcf\dat\1
load_dir=C:\Program Files\Tivoli\lcf\bin\w32-ix86\mrt
lib_dir=C:\Program Files\Tivoli\lcf\bin\w32-ix86\mrt
cache_loc=C:\Program Files\Tivoli\lcf\dat\1\cache
cache_index=C:\Program Files\Tivoli\lcf\dat\1\cache\Index.v5
cache_limit=20480000
log_queue_size=1024
log_size=1024000
udp_interval=300
udp_attempts=6
login_interval=1800
lcs.machine_name=andrew1
lcs.crypt_mode=196608
lcfd_alternate_port=9496
recvDataTimeout=2
recvDataNumAttempts=10
recvDataQMaxNum=50
login_timeout=300
login_attempts=3

When you change endpoint configuration with the lcfd command, the last.cfg file changes. Therefore, you should 
not modify the last.cfg file. If you require changes, use the lcfd command to make any changes. 
However, running the lcfd command requires stopping and restarting the endpoint.

Another useful tool for endpoint problem determination is the output from the wtrace command. 
The wtrace command is useful for tracking upcall and downcall method failures. To learn more about the wtrace command, 
see Troubleshooting the Tivoli environment.


sample logfile "root@zd110l13:/beheer/Tivoli/lcf/dat/1/lcfd.log"


Nov 15 09:14:13 1 engineUpdate Sending msg amRaAdd
Nov 15 09:14:13 1 engineUpdate Sending msg amTmrRemove
Nov 15 09:14:13 1 engineUpdate Sending msg amMpeRemove
Nov 15 09:14:13 1 engineUpdate Sending msg amRaRemove
Nov 15 09:14:13 1 engineUpdate Sending msg amTmrAdd
Nov 15 09:14:13 1 engineUpdate Sending msg amMpeAdd
Nov 15 09:14:13 1 engineUpdate Sending msg amRaAdd
Nov 15 09:14:13 1 engineUpdate Sending msg amRaAdd
Nov 15 09:14:13 1 engineUpdate Sending msg amRaAdd
Nov 15 09:14:13 1 engineUpdate Sending msg amRaAdd
Nov 15 09:14:13 1 engineUpdate Sending msg amRaAddTasks
Nov 15 09:14:13 1 engineUpdate Sending msg amEndPush
Nov 15 09:18:48 1 lcfd Spawning: /beheer/Tivoli/lcf/dat/1/cache/bin/aix4-r1/TME/Tmw2k/tmw2k_ep, ses: 215c89e6
Nov 15 09:28:46 1 lcfd Spawning: /beheer/Tivoli/lcf/dat/1/cache/bin/aix4-r1/TME/Tmw2k/tmw2k_ep, ses: 215c8a22
Nov 15 09:33:51 1 lcfd Spawning: /beheer/Tivoli/lcf/dat/1/cache/bin/aix4-r1/TME/Tmw2k/tmw2k_ep, ses: 215c8a4e
Nov 15 09:48:55 1 lcfd Spawning: /beheer/Tivoli/lcf/dat/1/cache/bin/aix4-r1/TME/Tmw2k/tmw2k_ep, ses: 215c8ac6
Nov 15 10:03:58 1 lcfd Spawning: /beheer/Tivoli/lcf/dat/1/cache/bin/aix4-r1/TME/Tmw2k/tmw2k_ep, ses: 215c8b5b
Nov 15 10:19:02 1 lcfd Spawning: /beheer/Tivoli/lcf/dat/1/cache/bin/aix4-r1/TME/Tmw2k/tmw2k_ep, ses: 215c8c4a
Nov 15 10:34:05 1 lcfd Spawning: /beheer/Tivoli/lcf/dat/1/cache/bin/aix4-r1/TME/Tmw2k/tmw2k_ep, ses: 215c8cb6


root@zd110l05:/#find . -name "*tecad*" -print

./tmp/.tivoli/.tecad_logfile.fifo.zd110l05.aix-default
./tmp/.tivoli/.tecad_logfile.lock.zd110l05.aix-default
./tmp/.tivoli/.tecad_logfile.fifo.zd110l05.aix-defaultlogsourcepipe
./etc/Tivoli/tecad
./etc/Tivoli/tecad.1011792
./etc/Tivoli/tecad.1011792/bin/init.tecad_logfile
./etc/Tivoli/tec/tecad_logfile.cache
./etc/rc.tecad_logfile
./etc/rc.shutdown-pre-tecad_logfile
./etc/rc.tecad_logfile-pre-tecad_logfile
./etc/rc.tivoli_tecad_mqseries
find: 0652-023 Cannot open file ./proc/278708.
find: 0652-023 Cannot open file ./proc/315572.
find: 0652-023 Cannot open file ./proc/442616.
find: 0652-023 Cannot open file ./proc/475172.
./beheer/Tivoli/lcf/dat/1/cache/out-of-date/init.tecad_logfile
./beheer/Tivoli/lcf/dat/1/cache/out-of-date/tecad-remove-logfile.sh
./beheer/Tivoli/lcf/dat/1/cache/bin/aix4-r1/TME/TEC/adapters/bin/tecad_logfile.cfg
./beheer/Tivoli/lcf/dat/1/LCFNEW/CTQ/logs/trace_mqs_start_tecad__MQS_CC.Q3P0063__1__p1052790.log
./beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/bin/tecad_logfile
./beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/bin/init.tecad_logfile
./beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/bin/tecad-remove-logfile.sh
./beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/aix-default/etc/C/tecad_logfile.fmt
./beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/aix-default/etc/tecad_logfile.err
./beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/aix-default/etc/tecad_logfile.conf
./beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/aix-default/etc/tecad_logfile.cds
./beheer/Tivoli/lcf/bin/aix4-r1/TME/MQS/bin/tecad_mqseries.cfg
./beheer/Tivoli/lcf/bin/aix4-r1/TME/MQS/bin/tecad_mqseries.mqsc
./beheer/Tivoli/lcf/bin/aix4-r1/TME/MQS/bin/tecad_mqseries_nontme
./beheer/Tivoli/lcf/bin/aix4-r1/TME/MQS/bin/tecad_mqseries_tmegw
./beheer/Tivoli/lcf/bin/generic_unix/TME/MQS/sh/mqs_start_tecad.sh
./beheer/Tivoli/lcf/bin/generic_unix/TME/MQS/sh/mqs_stop_tecad.sh
./beheer/Tivoli/lcf/bin/generic_unix/TME/MQS/teccfg/tecad_mqseries.Q3P0063.cfg



dircmp:
=======

  
Linux / Unix dircmp command

About dircmp
Lists files in both directories and indicates whether the files in the directories are the same and/or different.

Syntax
dircmp [-d] [-s] [-w n] directoryone directorytwo.

-d Compare the contents of files with the same name in both directories and output a list telling what 
must be changed in the two files to bring them into agreement. The list format is described in diff(1). 
-s Does not tell you about the files that are the same. 
-w n Change the width of the output line to n characters. The default width is 72. 
directoryone The first directory for comparing. 
directorytwo The second directory for comparing. 

Examples
dircmp dir1 dir2 - Compares the directory dir1 with the directory dir2. Below is an example of the output 
you may receive when typing this command.

Feb 8 17:18 2001 Comparison of help issues Page 1

directory .
same ./favicon.ico
same ./logo.gif
same ./question.gif


kmcrca:
=======

Part of the IBM Tivoli OMEGAMON XE for WebSphere MQ suite.


KMCRCA Starts IBM Tivoli OMEGAMON XE for WebSphere MQ Configuration
KMQIRA Starts IBM Tivoli OMEGAMON XE for WebSphere MQ Monitoring
KRARLOFF Converts the historical data file to a neutral file format for use with
various analytical programs



FLASHCOPY:
==========

Some notes about flashcopy implementations:


Note 1:
=======

What is FlashCopy?
FlashCopy is a function designed to create an instant "copy" of some data. When an administrator issues a 
FlashCopy command that essentially says "make a copy of this data," SVC via FlashCopy immediately provides 
the appearance of having created a copy of the data, when in reality it creates the physical copy 
in the background before moving that copy to an alternative data-storage device, which can take some time 
depending on the size of the backup copy. However, it creates the appearance of having completed 
the copy instantaneously, so customers can have a backup copy available as soon as the command is issued, 
even though copying to a different storage medium takes place behind the scenes.

"Because it operates very quickly in this way, FlashCopy allows customers to make a copy and immediately 
move on to other work without having to wait for the data to actually physically be copied from one place 
to another," says Saul. "In that regard, SVC FlashCopy is very similar to FlashCopy on the DS8000, for example, 
with the difference being SVC FlashCopy operates on most storage devices attached to the SVC, spanning many 
different disk systems."



Note 2:
=======

FlashCopy
FlashCopy is an IBM feature supported on ESS (Enterprise Storage Servers) that allows you to make nearly 
instantaneous Point in Time copies of entire logical volumes or data sets. The HDS (Hitachi Data Systems) 
implementation providing similar function is branded as ShadowImage. Using either implementation, 
the copies are immediately available for both read and write access. 

-- FlashCopy Version 1
The first implementation of FlashCopy, Version 1 allowed entire volumes to be instantaneously "copied" to 
another volume by using the facilities of the newer Enterprise Storage Subsystems (ESS). 

Version 1 of FlashCopy had its limitations however. Although the copy (or "flash" of a volume occurred 
instantaneously, the FlashCopy commands were issued sequentially and the ESS required a brief moment 
to establish the new pointers. Because of this minute processing delay, the data residing on two volumes 
that were FlashCopied are not exactly time consistent. 

-- FlashCopy Version 2
FlashCopy Version 2 introduced the ability to flash individual data sets and more recently added support 
for "consistency groups". FlashCopy consistency groups can be used to help create a consistent point-in-time 
copy across multiple volumes, and even across multiple ESSs, thus managing the consistency of dependent writes. 

FlashCopy consistency groups are used in a single-site scenario in order to create a time-consistent copy of data 
that can then be backed-up and sent offsite, or in a multi-site Global Mirror for ESS implementation to force 
time consistency at the remote site. 

The implementation of consistency groups is not limited to FlashCopy. Global Mirror for z/Series (formerly known 
as XRC or eXtended Remote Copy) also creates consistency groups to asynchronously mirror disk data from one site 
to another over any distance. 



Note 3:
-------

http://www.ibm.com/developerworks/forums/thread.jspa?messageID=13967589


Q:

Using target volume from FlashCopy on same LPAR as source volume going thru VIO server 
Posted: Jun 28, 2007 12:21:09 PM       Reply  
 
Synopsis:

DS4500 logical drive mapped to a p5 550 VIO server, then mapped to an AIX partition. Without interrupting 
the source drive, created a flashcopy of the drive and mapped it to the same VIO server, then again to 
the same partition. This caused duplicate VGID on the system. Had to varyoff and export the volume group 
to run recreatevg against the flashcopy hdisk and make a new volume group with it. This works fine 
the first time, however after I varyoff the vg and export it, then disable the flashcopy, and re-create 
it I cannot import or varyon the vg on the partition. importvg and recreatevg both say the hdisk belongs 
to a different vg so they don't work. The varyvg fails because the descriptors are not constitent.

How do I create a flashcopyvg on this partition using virtual disk from the VIO so that the process 
is repeatable and thus scriptable without having to interrupt the source volume group everytime I do this. 
The intent is to be able run a backup process against the flashcopy then disable it and do it again a few hours 
later and repeat it several times each day. We are using legacy vsam instead of a DB and need to keep the 
data accessible to our CICS system, while being able to capture point in time backups throughout the day.  
 

A:
 
Did you rmdev the vpath and hdisks before recreating the flash copy? Then you will need to run recreatevg again, 
as restarting the flash copy will change the pvid back to the same as the source volume.

Why not just attach the flash copy to another host? Then you won't need to run recreate vg and you could assign 
the flash copy to the original host if you need to recover the data. 
 




==============================
2. NOTES ABOUT SHELL PROGRAMS:
==============================

-------------------------------------------------------------
NOTE 1:

# sh dothat         (the dothat file contains commands)

# chmod 755 dothat  (or use: chmod +x dothat
# dothat            (now it's executable)
-------------------------------------------------------------
NOTE 2:

# now=`date`    (variable is output of a command)`
# echo $now

This means that commands are read from the string between two ` `.
Usage in a nested command goes like this:
font=`grep font \`cat filelist\``

-------------------------------------------------------------
NOTE 3:

To extend the PATH variable, on most systems use a statement like the following example:

$ export PATH=$PATH:$ORACLE_HOME/bin
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib

PATH=.:/usr/bin:/$HOME/bin:/net/glrr/files1/bin  
export PATH  


-------------------------------------------------------------
NOTE 4:

Positional parameters:


program, function or shell                         $0
argument 1 through 9                               $1 .. $9
nth argument                                       ${n}
number of positional parameters                    $#
every positional parameter                         $@, $*
decimal value returned by last executed cmd        $?     if the former was successful this will be 0
pid of shell                                       $$
pid of last backgrounded command                   $!


Examples of usage of $n parameter:
----------------------------------

# dothis grapes apples pears

$1=grapes, $2=apples, $3=pears


# cat makelist
sort +1 -2 people | tr -d "[0-9]" | pr -h Distribution | lp

  This will only work on that filename, that is in this example, people.

# cat makelist
sort +1 -2 $1 | tr -d "[0-9]" | pr -h Distribution | lp

  This will work on ANY file given as an argument to makelist

  # makelist file1
  # makelist file2

-------------------------------------------------------------
NOTE 5:

# echo "Hi there $LOGNAME"
Hi there Albert

# echo 'Hi there $LOGNAME'
Hi there $LOGNAME

# echo "You owe me \$$amount" | mail 

Single-quotes are literal quotes.
Double-quotes can have their contents expanded

-------------------------------------------------------------
NOTE 6: 

How to set variables:
--------------------

- A variable name must begin with a letter and can contain letters, digits, and underscores,
but no special characters.

- A variable is set with an assignment of NAME=value
Be sure not to have any white space before or after the equals sign =.
Double quotes are used when white space is present in the text string you are assigning to the variable.
So here are a few examples:

ME=bill
BC="bill clinton"

Now the shell can react and use the variable $ME and it substitutes the value for that variable.

Note that there must be no spaces around the "=" sign: VAR=value works; VAR = value doesn't work


Local and environment variables:
--------------------------------

variables that you set are local to the current shell unless you mark them for excport.
Variables marked for export are called environment variables, and will be made available
to any command that the shell creates. The following command marks the variable BC for export:

export BC

You can list local variables by typing the set command.
You can list the environment variables by using the env command.

-------------------------------------------------------------
NOTE 7:

for file in list_of_values
do
  sort +1 -2 $file | tr -d "[0-9]" | pr -h Distribution | lp
done


if test $# -eq 0
  then echo "You must give a filename"
  exit 1
fi


-eq=equal, -ne=not equal, -gt=greater than, -lt=less then, -ge=greater or equal, -le=less or equal

-------------------------------------------------------------
NOTE 8:

UNIX$ for file in `ls /local/ssl/misc/*` 
> do 
> echo I found a config file $file
> echo Its type is `/usr/bin/file $file`
> done

-------------------------------------------------------------
NOTE 9:

If a script is to accept arguments then these can be referred to as ` $1 $2 $3..$9'. 
There is a logical limit of nine arguments to a Bourne script, but Bash handles the next arguments as `${10}'. 
`$0' is the name of the script itself. 

Here is a simple Bash script which prints out all its arguments. 

#!/bin/bash
# 
# Print all arguments (version 1)
#

for arg in $*
do
  echo Argument $arg
done

echo Total number of arguments was $#


The `$*' symbol stands for the entire list of arguments and `$#' is the total number of arguments.

-------------------------------------------------------------
NOTE 10: Start and End of Command

A command starts with the first word on a line or if it's the second command on a line 
with the first word after a";'.
A command ends either at the end of the line or whith a ";". So one can put several commands onto one line:

print -n "Name: "; read name; print ""


One can continue commands over more than one line with a "\" immediately followed by a newline sign 
which is made be the return key:

grep filename | sort -u | awk '{print $4}' | \
uniq -c >> /longpath/file

-------------------------------------------------------------
NOTE 11:

Bash and the Bourne shell has an array of tests. They are written as follows. 
Notice that `test' is itself not a part of the shell, but is a program which works out 
conditions and provides a return code. See the manual page on `test' for more details. 

test on file chracteristics:

test -f file        True if the file is a plain file 
test -d file        True if the file is a directory 
test -r file        True if the file is readable 
test -w file        True if the file is writable 
test -x file        True if the file is executable 
test -h file        True if the file is a symbolic link
test -s file        True if the file contains something 
test -g file        True if setgid bit is set 
test -u file        True if setuid bit is set 

string comparisons:

test s1 = s2        True if strings s1 and s2 are equal 
test s1 != s2       True if strings s1 and s2 are unequal 

numeric comparisons:

test x -eq y        True if the integers x and y are numerically equal 
test x -ne y        True if integers are not equal 
test x -gt y        True if x is greater than y 
test x -lt y        True if x is less than y 
test x -ge y        True if x>=y 
test x -le y        True if x <= y 

! Logical NOT operator 
-a Logical AND 
-o Logical OR 

Note that an alternate syntax for writing these commands is to use the square brackets,
instead of writing the word test. 

 [ $x -lt $y ]   "=="    test $x -lt $y

Just as with the arithmetic expressions, Bash 2.x provides a syntax for conditionals 
which are more similar to Java and C. While arithmetic C-like expressions can be used within double parentheses, 
C-like tests can be used within double square brackets. 

 [[ $var == "OK" || $var == "yes" ]]

This C-like syntax is not allowed in the Bourne shell, but is equivalent to 

[ $var = "OK" -o $var = "yes" ]

which is valid in both shells. 

Arithmetic C-like tests can be used within double parentheses so that under Bash 2.x the following tests are equivalent: 

 [ $x -lt $y ]   "==" (( x < y ))


Test is used by virtually every shell script written. It may not seem that way, because test 
is not often called directly. test is more frequently called as [. [ is a symbolic link to test, 
just to make shell programs more readable. If is also normally a shell builtin (which means that the shell 
itself will interpret [ as meaning test, even if your Unix environment is set up differently):

$ type [
[ is a shell builtin

$ which [
/usr/bin/[

$ ls -l /usr/bin/[
lrwxrwxrwx 1 root root 4 Mar 27 2000 /usr/bin/[ -> test

This means that '[' is actually a program, just like ls and other programs, so it must be surrounded by spaces:

if [$foo == "bar" ]    # this is wrong !

will not work; it is interpreted as if test$foo == "bar" ]. 

if [ $foo == "bar" ]   # this is OK !


Test is a simple but powerful comparison utility. For full details, run man test on your system, 
but here are some usages and typical examples. 

Test is most often invoked indirectly via the if and while statements. It is also the reason you will come 
into difficulties if you create a program called test and try to run it, as this shell builtin 
will be called instead of your program! 

The syntax for if...then...else... is:

if [ ... ]
then
  # if-code
else
  # else-code
fi

Note that fi is if backwards! This is used again later with case and esac.

Also, be aware of the syntax - the "if [ ... ]" and the "then" commands must be on different lines. 
Alternatively, the semicolon ";" can seperate them:

if [ ... ]; then
  # do something
fi

You can also use the elif, like this:

if  [ something ]; then
 echo "Something"
 elif [ something_else ]; then
   echo "Something else"
 else
   echo "None of the above"
fi

This will echo "Something" if the [ something ] test succeeds, otherwise it will test [ something_else ], 
and echo "Something else" if that succeeds. 
If all else fails, it will echo "None of the above". 


Example:

#!/bin/ksh


if [ `whoami` != root ]             # remember string comparison: test s1 != s2
then
  echo RUN AS ROOT !!!
  exit
fi


Sometimes you will encounter the $variable==value syntax, like in

if (( FILE_FOUND == 0 ))
then

FILE_FOUND is nummeric, or integer, and it means: If FILE_FOUND equals 0 then....


Some more testing examples:
---------------------------

     if [ -s file ]         # file is more than 0 bytes, it contains something
     then
        whatever code
     fi

Checking char/strings:

     if [ $myvar = "hello" ]
     then
        echo "We have a match"
     fi

Checking numbers:

  n1 -eq n2      Check to see if n1 equals n2.
  n1 -ne n2      Check to see if n1 is not equal to n2.
  n1 -lt n2      Check to see if n1 < n2.
  n1 -le n2      Check to see if n1 <= n2.
  n1 -gt n2      Check to see if n1 > n2.
  n1 -ge n2      Check to see if n1 >= n2.

     if [ $# -gt 1 ]
     then
        echo "ERROR: should have 0 or 1 command-line parameters"
     fi

Boolean operators
  !     not
  -a    and
  -o    or

if [ $num -lt 10 -o $num -gt 100 ]
then
  echo "Number $num is out of range"
elif [ ! -w $filename ]
  then
  echo "Cannot write to $filename"
fi


if [ $myvar = "y" ]
then
   echo "Enter count of number of items"
   read num
      if [ $num -le 0 ]
      then
         echo "Invalid count of $num was given"
      else
         ... do whatever ...
      fi
fi




-------------------------------------------------------------
NOTE 12:

alphabet="a b c d e"			# Initialise a string
count=0					# Initialise a counter
for letter in $alphabet			# Set up a loop control
do					# Begin the loop
    count=`expr $count + 1`		# Increment the counter
    echo "Letter $count is [$letter]"	# Display the result
done					# End of loop


alphabet="a b c d e"						# Initialise a string
count=0								# Initialise a counter
while [ $count -lt 5 ]						# Set up a loop control
do								# Begin the loop
    count=`expr $count + 1`					# Increment the counter
    position=`bc $count + $count - 1`   			# Position of next letter
    letter=`echo "$alphabet" | cut -c$position-$position`	# Get next letter 
    echo "Letter $count is [$letter]"				# Display the result
done								# End of loop


if [ -f $dirname/$filename ]
then
    echo "This filename [$filename] exists"
elif [ -d $dirname ]
then
    echo "This dirname [$dirname] exists"
else
    echo "Neither [$dirname] or [$filename] exist"
fi


if [$1 -le $2] ; then
    echo "$2 is de grootste "
else
    echo "$1 is de grootste"
fi

-------------------------------------------------------------
NOTE 13: Loops and conditionals:

loops:

  for-do-done
  while-do-done
  until-do-done

conditionals:

  if-then-else-fi
  case-esac
  &&
  ||



IF
==
The basic type of condition is "if". 

if [ $? -eq 0 ] ; then
	print we are okay
else
	print something failed
fi

IF the variable $? is equal to 0, THEN print out a message. Otherwise (else), print out a different message. 
FYI, "$?" checks the exit status of the last command run. 
The final 'fi' is required. This is to allow you to group multiple things together. 
You can have multiple things between if and else, or between else and fi, or both.
You can even skip the 'else' altogether, if you dont need an alternate case. 

if [ $? -eq 0 ] ; then
	print we are okay
	print We can do as much as we like here
fi


if [ -f /tmp/errlog ] 
   then
      rm /tmp/errlog
   else
      echo "no errorlog found"
fi


if [ ! -f /tmp/errlog ]
   then

#!/usr/bin/ksh
if [ `cat alert.log|wc -l` -gt 1 ]
then
   echo "something you want to say if alert.log contains more than 1 line"
else
   echo "something else you want to say"
fi


#  Gebaseerd op: 
#  full core SAA running: 4 mxs processes, all hk BS_ processes       #
#  hk mode              : 0 mxs processes, all hk BS_ processe        #
#  not running          : 0 mxs processes, 0 hk BS_ processe          #

integer NO_MXS
integer NO_BS


NO_MXS=`ps -ef | grep -i MXS | grep -v grep | wc -l`
NO_BS=`ps -ef |  grep -i BS_ | grep -v grep | wc -l`

#echo $NO_MXS
#echo $NO_BS

if (( NO_MXS==4 ))
then
   echo "Running"
else
   if (( NO_BS==0 ))
   then
       echo "Notrunning"
   else
       echo "HousekeepingMode"
   fi
fi




CASE
====
The case statement functions like 'switch' in some other languages. Given a particular variable, 
jump to a particular set of commands, based on the value of that variable. 
While the syntax is similar to C on the surface, there are some major differences; 

The variable being checked can be a string, not just a number 
There is no "fall through". You hit only one set of commands 
To make up for no 'fall through', you can 'share' variable states 
You can use WILDCARDS to match strings 

echo input yes or no
read  answer
case $answer in
	yes|Yes|y)
		echo got a positive answer
		# the following ';;' is mandatory for every set
		# of comparative xxx)  that you do
		;;
	no)
		echo got a 'no'
		;;
	q*|Q*)
		#assume the user wants to quit
		exit
		;;
		
	*)
		echo This is the default clause. we are not sure why or
		echo what someone would be typing, but we could take
		echo action on it here
		;;
esac

Sometimes you want to break out a while loop, which contains a case, like in this example:

#!/bin/sh

echo "Please talk to me ..."
while :
do
  read INPUT_STRING
  case $INPUT_STRING in
	hello)
		echo "Hello yourself!"
		;;
	bye)
		echo "See you again!"
		break
		;;
	*)
		echo "Sorry, I don't understand"
		;;
  esac
done
echo 
echo "That's all folks!"

In this example, the program loops if the user typed "hello". But if the user types "bye", the "break" statement will
quit the loop.

Note:  ":" evaluates to "true", but you might also have used "while true".


&& and ||
=========

The simples conditional in the Bourne shell is the double ampersand &&.
When two commands are separated by a double ampersand, the second command executes
only if the first command returns a zero exit status (succesful completion)

Example:

ls -ld /usr/bin > /dev/null && echo "Directory Found"

The opposite of && is the ||. When two commands are separated by ||, the second command executes
only if the first command returns a nonzero exit status (indicating failure).

Example:

ls -d /usr/foo || echo "No Directory Found"

If the directory does not exist, the message is displayed.


Loops

WHILE
=====
The basic loop is the 'while' loop; "while" something is true, keep looping.
There are two ways to stop the loop. The obvious way is when the 'something' is no longer true. 
The other way is with a 'break' command. 


keeplooping=1;
while [[ $keeplooping -eq 1 ]] ; do
	read quitnow
	if [[ "$quitnow" = "yes" ]] ; then
		keeplooping=0
	fi
	if [[ "$quitnow" = "q" ]] ; then
		break;
	fi
done



UNTIL
=====
The other kind of loop in ksh, is 'until'. The difference between them is that 'while' implies looping while 
something remains true.
'until', implies looping until something false, becomes true 

until [[ $stopnow -eq 1 ]] ; do
	echo just run this once
	stopnow=1;
	echo we should not be here again.
done



FOR
===
A "for loop", is a "limited loop". It loops a specific number of times, to match a specific number of items. 
Once you start the loop, the number of times you will repeat is fixed. 
The basic syntax is 

for i in eat run jump play
do
    echo See Albert $i
done

  See Albert eat
  See Albert run
  See Albert jump
  See Albert play


for i in "eat run jump play"
do
    echo See Albert $i
done

  See Albert eat run jump play


for var in one two three ; do
	echo $var
done

Whatever name you put in place of 'var', will be updated by each value following "in". 
So the above loop will print out 

one
two
three

But you can also have variables defining the item list. They will be checked ONLY ONCE, when you start the loop. 

list="one two three"
for var in $list ; do
	echo $var
	# Note: Changing this does NOT affect the loop items
	list="nolist"
done

The two things to note are: 
It stills prints out "one" "two" "three" 
Do NOT quote "$list", if you want the 'for' command to use multiple items 
If you used "$list" in the 'for' line, it would print out a SINGLE LINE, "one two three" 


for i in 1 2 3 4 5 6 7
do
    cp x.txt $i
done


-------------------------------------------------------------
NOTE 14: Arrays

Arrays 
Yes, you CAN have arrays in ksh, unlike old bourne shell. The syntax is as follows: 

# This is an OPTIONAL way to quickly null out prior values
set -A array
#
array[1]="one"
array[2]="two"
array[3]="three"
three=3

print ${array[1]}
print ${array[2]}
print ${array[3]}
print ${array[three]}


Briefly, an array contains a collection of values (elements) that may be accessed individually or as a group.  
Although newer versions of the Korn shell support more than one type of array, this tip will only apply 
to indexed arrays.

When assigning or accessing array elements, a subscript is used to indicate each element's position 
within the array.  The subscript is enclosed by brackets after the array name:
 
arrayname[subscript]
 
The first element in an array uses a subscript of 0, and the last element position (subscript value) 
is dependent on what version of the Korn shell you are using.  Review your system's Korn shell (ksh) 
man page to identify this value.

In this first example, the colors red, green, and blue are assigned to the first three positions of an array 
named colors:
 
$ colors[0]=RED
$ colors[1]=GREEN
$ colors[2]=BLUE
 
Alternatively, you can perform the same assignments using a single command:
 
$ set -A colors RED GREEN BLUE
 
Adding a dollar sign and an opening brace to the front of the general syntax and a closing brace on the end 
allows you to access individual array elements:
 
${arrayname[subscript]}

Using the array we defined above, let's access (print) each array element one by one:
 
$ print ${colors[0]}
RED
$ print ${colors[1]}
GREEN
$ print ${colors[2]}
BLUE
$
 
If you access an array without specifying a subscript, 0 will be used:
 
$ print ${colors[]}
RED
$

 
The while construct can be used to loop through each position in the array:
 
$ i=0
$ while [ $i -lt 3 ]
> do
> print ${colors[$i]}
> (( i=i+1 ))
> done
RED
GREEN
BLUE
$
 
Notice that a variable (i) was used for the subscript value each time through the loop.  
 
As another example:

array=(red green blue yellow magenta)
len=${#array[*]}
echo "The array has $len members. They are:"
i=0
while [ $i -lt $len ]; do
	echo "$i: ${array[$i]}"
	let i++
done





Special variables
There are some "special" variables that ksh itself gives values to. Here are the ones I find interesting 
PWD - always the current directory 
RANDOM - a different number every time you access it 
$$ - the current process id (of the script, not the user's shell) 
PPID - the "parent process"s ID. (BUT NOT ALWAYS, FOR FUNCTIONS) 
$? - exit status of last command run by the script 
PS1 - your "prompt". "PS1='$PWD:> '" is interesting. 
$1 to $9 - arguments 1 to 9 passed to your script or function 

Tweaks with variables
Both bourne shell and KSH have lots of strange little tweaks you can do with the ${} operator. 
The ones I like are below. 


To give a default value if and ONLY if a variable is not already set, use this construct: 


APP_DIR=${APP_DIR:-/usr/local/bin}

(KSH only)
You can also get funky, by running an actual command to generate the value. For example 


DATESTRING=${DATESTRING:-$(date)}


(KSH only)
To count the number of characters contained in a variable string, use ${#varname}. 


  echo num of chars in stringvar is ${#stringvar}

-------------------------------------------------------------
NOTE 15:

Appending dates to files and such:

Example 1:
----------

# mv logfile logfile.`date`
# mv logfile logfile.`date + %Y.%m.%d`

Example 2:
----------

MS korn shell:
# now=`date -u %d`;export now
# echo $now
24


------------------------------------------------------------
NOTE 16: tput

What is tput?

The tput command initializes and manipulates your terminal session through the terminfo database. Using tput, you can alter 
several terminal capabilities, such as moving or altering the cursor, changing text properties, 
and clearing specific areas of the terminal screen. 

Command-line introduction to tput

The tput command, like most commands in UNIX, can be used either at your shell command line or inside a shell script. 
To gain a better understanding of tput, this article starts with the command line, and then continues into shell script examples. 

Cursor attributes

Moving the cursor or altering its attributes can be helpful in UNIX shell scripts or at the command line. 
There may be times when you're required to enter sensitive information, such as a password, or enter information in two 
different areas of the screen. Using tput can help you in such conditions. 

Moving the cursor

Moving the cursor's position on the respective device is easily done with tput. Using the cup option, or cursor position, in tput, 
you can move the cursor to any X or Y coordinates in the device's rows and columns. 
The top left coordinates of the device are 0,0. 

To move the cursor to the fifth column (X) and the first row (Y) on a device, simply execute tput cup 5 1. 
Another example would be tput cup 23 45, which would move the cursor to the forty-fifth row in the twenty-third column. 

Moving the cursor and displaying information

Another useful cursor position trick is to move the cursor, execute a command to display information, 
and then return to the previous cursor location: 

(tput sc ; tput cup 23 45 ; echo "Input from tput/echo at 23/45" ; tput rc)
 

Let's break down the subshell commands:

tput sc

The current cursor location must be saved first. To save the current cursor position, include the sc option, 
or "save cursor position." 

tput cup 23 45
 
After the cursor location has been saved, the cursor coordinates will be moved to 23,45. 

echo "Input from tput/echo at 23/45"
 
Display information to stdout.

tput rc
 
When the information has been displayed, the cursor must return to the original location that was saved with tput sc. 
To return the cursor to its last saved location, include the rc option, or "restore cursor position." 


------------------------------------------------------------
NOTE 17: Doing some arithmetic


CleanOldArchiveFiles()
{
cd $T2_ARCH_DIR
COUNT_BEFORE=$(find ${T2_ARCH_DIR} -type f -name "T2*" -exec ls -al {} \; | wc -l)
PRESENT_DIR=`pwd`

if [ $PRESENT_DIR==$T2_ARCH_DIR ]       # Let.s make sure we are in the right directory.
   then
      find . -name "T2*" -mtime +30 -exec rm {} \;
      AF_EXITCODE=$?
      if (( AF_EXITCODE == 0 ))
      then
        EXIT_CODE=0
      else
        EXIT_CODE=1
      fi
fi

COUNT_AFTER =$(find ${T2_ARCH_DIR} -type f -name "T2*" -exec ls -al {} \; | wc -l)

DELTA=`expr $COUNT_BEFORE - $COUNT_AFTER`

}



------------------------------------------------------------
NOTE 18: shell script debugging

-x option to debug a shell script

Run a shell script with -x option, like in 

$ bash -x script-name

Use of set builtin command
Bash or korn shell offers debugging options which can be turn on or off using set command.
=> set -x : Display commands and their arguments as they are executed.
=> set -v : Display shell input lines as they are read.

set

set -x        # if put in top of script: debugging mode
set -e        # if put in top of script: exit at first error

Note 1:
------

Write your shell scripts this way to make them easier to test and debug. 

Shell script debugging is not easy. You have to put set -x or the echo command into the script. You may want to test the script 
in a test environment. And at least before publishing the script, you have to delete the debug lines. 
The following tip gives a hint about how to solve this problem without errors.

Put the following lines at the beginning of the script:

if [[ -z $DEBUG ]];then
  alias DEBUG='# ' 
else
  alias DEBUG=''
fi

Everywhere you put a line that is only for testing, write it in the following way:

DEBUG set -xOr echo a parameter:

DEBUG echo $PATHOr set a parameter that is valid only during the test:

DEBUG export LOGFILE=$HOME/tmpBefore executing the script, set the DEBUG variable in the shell:

# export DEBUG=yesDuring the execution of the script, the DEBUG lines will be executed. 
If you publish the script, you can forget the deletion of the debug lines; they will not disturb the functionality.

Sample Script
#!/usr/bin/ksh

# test script to show the DEBUG alias

if [[ -z $DEBUG ]];then
  alias DEBUG='# '
else
  alias DEBUG=''
fi

LOG_FILE=/var/script.out
DEBUG LOG_FILE=$HOME/tmp/script.out

function add {
    DEBUG set -x

    a=$1
    b=$2

    return $((a + b))
}


# MAIN

DEBUG echo "test execution"

echo "$(date) script execution" >>$LOG_FILE
echo "if you do not know it:"

add 2 2
echo "  2 + 2 = $?"



------------------------------------------------------------
NOTE 19: Examples



Example 1:
----------

#!/usr/bin/ksh
# Monitor the SPL p550 server
# By Albert
# version 0.1

umask 022

date=`date +%d-%m-%y`
time=`date +%H:%M`
emailers=albertvandersel@zonnet.nl

echo "$date $time" > /tmp/topper
df >> /tmp/topper

mailx -r albertvandersel@zonnet.nl -s "::: Disk info p550 :::" $emailers < /tmp/topper
rm /tmp/topper

exit 0



cat /export/home/fas/RSP/RSP.log | grep "ORA-01000" > /tmp/brokencursor.err
mailx -r noreply@ricoh-europe.com -s "::: Process info NLIHblabla-08 :::" $emailers < /tmp/topper


'mailx' to send someone an email


Example 2:
----------

#!/bin/ksh
# Monitor rsp logfile
#
PATH=/usr/ucb:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/etc:/usr/opt/SUNWmd/sbin
export PATH

umask 022

date=`date +%d-%m-%y`
time=`date +%H:%M`

emailers=nobuya.horii@ricoh-europe.com,Nathan.Bohn@firepond.com

cat /export/home/fas/RSP/RSP.log | grep "ORA-01000" > /tmp/brokencursor.err

if [ -s /tmp/brokencursor.err ] 
   then
      # echo "$date $time" > /tmp/brokencursor.err
      mailx -r noreply@ricoh-europe.com -s "::: Check on ORA-01000 :::" $emailers < /tmp/brokencursor.err
   else
      echo "all OK" >> /tmp/brokencursor.log
fi

/bin/rm /tmp/brokencursor.err

exit 0



Example 3: Automatic startup of an application
----------------------------------------------

#!/bin/ksh

# name: spl
# purpose: script that will start or stop the spl stuff.


case "$1" in
start )
        echo "starting spl"
        echo "su - ccbsys -c '/prj/spl/SPLS3/bin/splenviron.sh -e SPLS3 -c "spl.sh -t start"'"
        su - ccbsys -c '/prj/spl/SPLS3/bin/splenviron.sh -e SPLS3 -c "spl.sh -t start"'
        ;;
stop )
        echo "stopping spl"
        echo "su - ccbsys -c '/prj/spl/SPLS3/bin/splenviron.sh -e SPLS3 -c "spl.sh -t stop"'"
        su - ccbsys -c '/prj/spl/SPLS3/bin/splenviron.sh -e SPLS3 -c "spl.sh -t stop"'
        ;;
* )
        echo "Usage: $0 (start | stop)"
        exit 1
esac



Example 4: scheduled cleanup of logs
------------------------------------

#!/usr/bin/ksh

SAVEPERIOD=5

echo "/prj/spl/splapp/SPLQ3"| \
while read DIR
do
   cd $(DIR)
   find . -type f -mtime +$(SAVEPERIOD) -exec rm {} \;
done

exit 0


Example 5: some testing examples
--------------------------------

Initialise()
{
	export SPLcomplog=$SPLSYSTEMLOGS/initialSetup.sh.log
        if [ -f $SPLcomplog ]
        then
           rm -f $SPLcomplog
           export RSP=$?
           if [ $RSP -ne 0 ]
           then
              echo "ERROR - Cannot remove the old Log file $SPLcomplog "
              exitFunc $RSP
           fi
        fi
        touch $SPLcomplog
        export RSP=$?
        if [ $RSP -ne 0 ]
        then
           echo "ERROR - Cannot create Log file $SPLcomplog "
           exitFunc $RSP
        fi
	export TMP1=$SPLSYSTEMLOGS/initialSetup.sh.tmp
}

exitFunc()
{
	export RSP=$1
	Log "Exiting $SCRIPTNAME with return code $RSP"
	if [ -f $TMP1 ]
	then
		rm -f $TMP1 > /dev/null 2>&1
	fi
	exit $RSP
}

testDBconnection()
{
	Log "Testing Database connection parameters entered in configureEnv.sh"
	if [ `which db2|wc -w` -gt 1 ]
	then
		Log "ERROR : cannot find \"db2\" Program. This is a PATH prerequisit to the Install"
		exitFunc 1
	fi
	. cisconnect.sh > $TMP1 2>&1
	export RSP=$?
	if [ $RSP -ne 0 ]
	then
		Log "ERROR : connecting to Database:"
		Log -f "$TMP1"
		Log "ERROR : Rerun configureEnv.sh and ensure database connection parameters are correct"
		Log "ERROR : Check DB2 Connect configuration to ensure connection is o.K."
		exitFunc $RSP
	fi
		
}


Other example:

check_cron() {
# check of commando door cron of met de hand wordt uitgevoerd #
CRON_PID=`ps -ef | grep check_sudo | grep -v grep | awk '{print $3}'`
    if [[ `ps -p ${CRON_PID} | grep -v TIME | awk '{print $4}'` == "cron" ]]
    then
        CRON_RUN="yes"
        # Genereer een sleeptime nummer, voorkom daarmee dat alle clients tegelijk de Distroserver benaderen #
        random_sleeptime
    else
        CRON_RUN="no"
        SLEEPTIME="1"
    fi
}



Example 6:
----------

P550:/home/reserve/bin $ cat CheckAppl
#!/usr/bin/ksh
#
#
# variabelen initialisatie
appl=/prj/etm/1.5.20
#
# to start trace option execute set -x
if [ $1 = d ]
then
        set -x
fi
#set -x
#
# cleanscreen
clear
echo "Just a moment"
#
for i in `cat /etc/cistab | sed -e 's/:/ /g'| awk '{ print $1 }'`
do
        #aantal processen
        # initialisatie
        aantal_processen=0
        aantal_jsl=0
        aantal_jrepsvr=0
        aantal_bbl=0
        aantal_tuxfulladm=0
        aantal_tuxfullall=0
        #
        aantal_processen=`ps -ef|grep $i| grep -v grep | wc -l`
        if [ $aantal_processen = 0 ]
        then
                status=DOWN
        else
                status=UP
                # aantal JSL
                aantal_jsl=`ps -ef|grep $i | grep -i JSL | wc -l`
                # aantal JREPSVR
                aantal_jrepsvr=`ps -ef|grep $i | grep -i JREPSVR | wc -l`
                # aantal_BBL
                aantal_bbl=`ps -ef|grep $i | grep -i BBL | wc -l`
                # aantal_TUXFULLadm
                aantal_tuxfulladm=`ps -ef|grep $i | grep -i tuxfulladm | wc -l`
                # aantal_TUXFULLall
                aantal_tuxfullall=`ps -ef|grep $i | grep -i tuxfullall | wc -l`

        fi

        if [ $status = UP ] ; then

                 echo "$i $status BBL($aantal_bbl) TUXFULLall($aantal_tuxfullall) TUXFULLadm($aantal_tuxfulladm)    JSL( $aantal_jsl )(  JREPSVR(  $aantal_jrepsvr ) "
        else
                echo $i $status
        fi
done   | sort +1

# check logs
echo
echo "Check backup to rmt0"
echo "--------------------"
tail -2 /opt/back*/backup_to_rmt0.log
echo
echo "Check backup to rmt1"
echo "--------------------"
tail -7 /opt/backupscripts/backup_to_rmt1.log

echo
echo "Check backup from 520"
echo "---------------------"
ls -l /backups/520backups/oradb/conv.dmp
ls -l /backups/520backups/splenvs/*tar*


Example 7:
----------

#!/bin/sh

getinfo() {
        USER=$1
    PASS=$2
    DB=$3

    CONN="${USER}/${PASS}@${DB}"

    echo "
    set linesize 1000
    set pagesize 1000
    set trimspool on
    SELECT CIS_RELEASE_ID,':', CM_RELEASE_ID
    FROM CI_INSTALLATION;
    " | sqlplus -S $CONN | grep '[0-9a-zA_Z]'
}

if [ $# -gt 0 ]
then
        DB="$1"
else
        DB="$SPLENVIRON"
fi

if [ "x$DB" = x ]
then
        echo "dbvers: no environment"
        exit 1
fi

getinfo cisread cisread $DB | sed -e 1d -e 's/[         ]//g'


Example 8:
----------

#!/usr/bin/sh

. $HOME/.profile >/dev/null 2>&1

[ $# -ne 1 ] && exit 1

MARKER=/home/cissys/etc/marker-file

if [ $1 = "setmarker" ]
then
        /bin/touch $MARKER

        exit 0
fi

if [ $1 = "cleanup" ]
then
        [ \! -f $MARKER ] && exit 1

        for DIR  in `cut -d: -f4 /etc/cistab`
        do
                 /usr/bin/find $DIR \! -newer $MARKER -type f -exec rm -f {} \;
        done

        exit 0
fi

if [ $1 = "runbatch" ]
then
        for ETM  in `cut -d: -f1 /etc/cistab`
        do
                DIR1=`grep $ETM /etc/cistab|cut -d: -f3`
                DIR2=`grep $ETM /etc/cistab|cut -d: -f4`
                $DIR1/bin/splenviron.sh -q -e $ETM -c cdxcronbatch.sh \
                        >>$DIR2/cdxcronbatch.out 2>&1
        done

        exit 0
fi

exit 1


Example 9:
----------

date >> /opt/backupscripts/backupdatabases.log

cd /backups/oradb

if [ -f spltrain.dmp ]
   then
      echo "backup of spltrain is OK" >> /opt/backupscripts/backupdatabases.log
   else
      echo "error backup of spltrain " >> /opt/backupscripts/backupdatabases.log
fi


Example 10:
-----------

#!/usr/bin/ksh

# target : print configuration
#set -x
DAY=`date +%Y%m%d`
HOSTNAME=`hostname`
LOG=/home/reserve/log/ListConfiguration/LsConf.$DAY.$HOSTNAME.P522

date     >> $LOG
echo "++++++++++++++++++++++++++++++++++++++++++++" >> $LOG
echo "lsdev -P " >> $LOG
lsdev -P >> $LOG
echo "++++++++++++++++++++++++++++++++++++++++++++" >> $LOG
echo "lsdev -C " >> $LOG
lsdev -C >> $LOG
echo "++++++++++++++++++++++++++++++++++++++++++++" >> $LOG
echo "lsconf">> $LOG
lsconf   >> $LOG
echo "++++++++++++++++++++++++++++++++++++++++++++" >> $LOG
for i in `lsdev -C | awk '{ print $1 }`
do
        echo "lsdev -C -l $i" >> $LOG
        lsattr -E -l $i >> $LOG
        echo >> $LOG
done
echo "++++++++++++++++++++++++++++++++++++++++++++" >> $LOG
for i in `lsvg`
do
        lsvg -p $i
        lsvg -l $i
done >> $LOG
echo "++++++++++++++++++++++++++++++++++++++++++++" >> $LOG
for i in `lspv | awk '{ print $1 }'`
do
        lspv -l $i
        lspv -L $i
        lspv -M $i
        lspv -p $i
done >>$LOG


echo "++++++++++++++++++++++++++++++++++++++++++++" >> $LOG
echo "/etc/passwd">> $LOG
cat /etc/passwd >> $LOG
echo "++++++++++++++++++++++++++++++++++++++++++++" >> $LOG
echo "/etc/group" >> $LOG
cat /etc/group  >> $LOG


Example 11:
-----------

Make dynamic Oracle exports from a shell script. You do not need to list exp statements per database,
this will be extracted from som file, like /etc/oratab.

#!/usr/bin/ksh
DATE=`date +%Y%m%d`
HOSTNAME=`hostname`
ORACONF=/etc/rc.oratab
set -x
# MAKE SURE THE ENVIRONMENT IS OK
ORACLE_BASE=/apps/oracle; export ORACLE_BASE
ORACLE_HOME=/apps/oracle/product/9.2; export ORACLE_HOME
LIBPATH=/apps/oracle/product/9.2/lib; export LIBPATH
ORACLE_TERM=xterm;export ORACLE_TERM
export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data
LD_LIBRARY_PATH=$ORACLE_HOME/lib; export LD_LIBRARY_PATH
export TNS_ADMIN=/apps/oracle/product/9.2/network/admin
export ORAENV_ASK=NO

PATH=/usr/local/bin:/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java131/jre/bin;export PATH
PATH=$ORACLE_HOME/bin:$PATH;export PATH

# SAVE THE FORMER BACKUPS: LETS KEEP 1 EXTRA DAY ONLINE
# Lets copy the current file to another filesystem:
cd /backups/oradb
# Now lets save the current file on the same filesystem in 1dayago
cd /backups/oradb/1dayago
mv spl*dmp /backups/oradb/2dayago

cd /backups/oradb
mv spl*dmp /backups/oradb/1dayago

# Now create a new export file in /backups/oradb

ExpOracle()
{
set -x
        for i in `cat ${ORACONF} | grep -v \# | awk '{ print $1 }'`
        do
                SID_NAME=$i
                BOOT=`grep $SID_NAME $ORACONF | awk '{ print $2}'`
                if [ $BOOT = Y ] ;
                then
                        su - oracle -c "
                        ORACLE_SID=${SID_NAME}
                        export ORACLE_SID
                        cd /backups/oradb
                        exp system/cygnusx1@$SID_NAME file=$SID_NAME.$HOSTNAME.$DATE.dmp full=y statistics=none
                        EOF "
                fi
                sleep 5
                if [ -f $SID_NAME.$HOSTNAME.$DATE.dmp ]
                then
                        echo "backup of $SID_NAME is OK" >> /opt/backupscripts/backupdatabases.log
                else
                        echo "error backup of $SID_NAME " >> /opt/backupscripts/backupdatabases.log
                fi

        done
}

ExpOracle


Example 12:
-----------

Running sqlplus from a shell script.

$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF 1> $tmp_file 2>1
   set heading off feedback off
   whenever sqlerror exit
   select 'DB_NAME=' || name from v\$database;
   .. # possible other stuff
   exit
EOF

sqlplus /nolog << EOF
connect / as sysdba
startup
EOF


Example 13:
-----------

Kill a paricular process, that runs from "/dir/dir/abc" :

kill `ps -ef | grep /dir/dir/abc | grep -v grep | awk '(print $2)'`


Example 14:
-----------

#!/usr/bin/ksh
#
# description: start and stop the Documentum Content Server environment from dmadmin account
# called by:   dmadmin
#

DOCBASE_NM1=dmw_et
DOCBASE_NM2=dmw_et3

function log
{
        echo $(date +"%Y/%m/%d %H.%M.%S %Z") 'documentum.sh:' ${@}
}


# See how we were called.
case $1 in

  start)
    # Starting DocBroker
    cd $DOCUMENTUM/dba
    ./dm_launch_Docbroker
    ./dm_start_$DOCBASE_NM1
    ./dm_start_$DOCBASE_NM2
    # Starting Tomcat services
    cd $DM_HOME/tomcat/bin
    ./startup.sh
  ;;

  stop)
    # Stopping Tomcat services
    cd $DM_HOME/tomcat/bin
    ./shutdown.sh
    # Stopping DocBroker
    cd $DOCUMENTUM/dba
    ./dm_shutdown_$DOCBASE_NM1
    ./dm_shutdown_$DOCBASE_NM2
    ./dm_stop_Docbroker
  ;;

  clean_logs)
    # Call myself to stop stuff
    ${0} stop
    # Stopping Tomcat services
    find $DOCUMENTUM/dba/log -type f -name "*" -exec rm -rf {} \;
    # Call myself to restart stuff
    ${0} start
  ;;

  clean_performance)
    # Call myself to stop stuff
    ${0} stop
    # Stopping Tomcat services
    find $DOCUMENTUM/dba/log -type d -name "perftest*" -exec rm -rf {} \;
    find $DOCUMENTUM/dba/log -type d -name "perfuser*" -exec rm -rf {} \;
    find $DOCUMENTUM/dba/log -type d -name "RefAppUser*" -exec rm -rf {} \;
    # Call myself to restart stuff
    ${0} start
  ;;

  kill)
    cd $DOCUMENTUM/dba
    ./dm_launch_Docbroker -k
  ;;

  *)
  echo "Usage: $0 {start|stop|kill|clean_logs|clean_performance}"
  exit 1
esac

exit 0


Example 15: 
-----------

# Check bij aanloggen of voor dit domein een dmgr of nodeagent draait
WASUSR=`whoami`
echo ""

DMAOK=`ps -ef | grep $WASUSR | grep dmgr | grep -v grep`
if [ ! "$DMAOK" = "" ]; then
  echo "Deployment Manager is running!"
else
  echo "NOTE: Deployment Manager is NOT running!"
fi
echo ""

WASOK=`ps -ef | grep $WASUSR | grep nodeagent | grep -v grep`
if [ ! "$WASOK" = "" ]; then
  echo "Nodeagent is running!"
else
  echo "NOTE: Nodeagent is NOT running!"
fi
echo ""

if [ "$WASOK" = "" ] || [ "$DMAOK" = "" ]; then
  echo "The $WASUSR-menu can be used to start the Deployment manager or Nodeagent."
  echo "Type 'menu' on the command-prompt followed by ENTER to start the menu."
  echo ""
fi



if (( $# == 1 )) && [[ "${1}" = "?" ]]
if [ "$WASOK" = "" ] || [ "$DMAOK" = "" ]

Example 16:
-----------

Example with input from user:

echo ""
read CLEARMSG?"Press C or c to clear this message or any other key to keep it : "
if [ "${CLEARMSG}" = "C" ] || [ "${CLEARMSG}" = "c" ]; then
  if [ -f ExtraMessage.txt ]; then
    rm ~/ExtraMessage.txt
  fi
fi


Example 17:
-----------

# Check arguments
if [ ${#} != 3 ]
then
    log "Usage: ${0} <enviroment> <installFilesFolder> <installTarget>"
    exit 1
fi


#
if [ -z "$1" ]
then
	echo "use : build.sh PROGNAME
e.g.  build.sh  CLBDSYNC
"
	exit 1
fi


if [ "$OPSYS" = "AIX" ]||[ "$OPSYS" = "HP-UX" ]||[ "$OPSYS" = "Linux" ]
then
   ....
else
   ....



Example 18:
-----------


Read Input from User and from Files

- Read in a Variable
From a user we read with: read var. Then the users can type something in. One should first print something like: 

print -n "Enter your favorite haircolor: ";read var; print "". 

The -n suppresses the newline sign.

- Read into a File Line for Line
To get each line of a file into a variable iteratively do:

{ while read myline;do
   # process $myline
done } < filename

- To catch the output of a pipeline each line at a time in a variable use:

last | sort | {
while read myline;do
   # commands
done }



Example 19:
-----------

#!/bin/sh
# ****************************************************************************
# This script is used to start  Tomcat
# It calls the startup.sh script under $CATALINA_HOME/bin.
#
# ****************************************************************************

#JAVA_OPTS="-Xms512m -Xmx512m -XX:NewSize=128m -XX:SurvivorRatio=8 -verbosegc"
JAVA_OPTS="-Xms384m -Xmx384m"
export JAVA_OPTS
CATALINA_BASE=$SPLEBASE/tomcatBase 
export CATALINA_BASE

if [ ! -d $CATALINA_HOME/bin ]
then
   echo "Unable to find directory $CATALINA_HOME/bin"
else
   $CATALINA_HOME/bin/startup.sh
fi


Example 20:
-----------

export SCRIPTNAME=$0
export SPLQUITE=N
export SPLCOMMAND=""
export SPLENVIRON=""
export MYID=`id |cut -d'(' -f2|cut -d')' -f1`
export SPLSUBSHELL=ksh

# Current Platform and OS
export OPSYS=`uname -s`          
case $OPSYS in
        SunOS)  export OPSYSVER=`uname -r`
#               Supplied and Supported Tuxedo Version
                TUXVERS=tuxedo8.1
#               Supplied and Supported Cobol Application Server
                SPLCOBDIR=/opt/SPLcobAS40sp2
#               Supplied and Supported Java Version
		AWK=nawk
                ;;
        AIX)   export OPSYSVER=`oslevel | cut -c1-3`
               TUXVERS=tuxedo8.1
#              Supplied and Supported Cobol Application Server
               SPLCOBDIR=/opt/SPLcobAS40sp2
#              Supplied and Supported Java Version
               AWK=nawk
               ;;
	Linux) export OPSYSVER=`uname -r| cut -c1-3`
               TUXVERS=tuxedo8.1
#              Supplied and Supported Cobol Application Server
               SPLCOBDIR=/opt/SPLcobAS40sp2
               AWK=awk
	       SPLSUBSHELL=bash
               ;;
	      
        HP-UX)   export OPSYSVER=`uname -r|sed 's/^[A-Z]\.//'|cut -c1-2`
               TUXVERS=tuxedo8.1
#              Supplied and Supported Cobol Application Server
               SPLCOBDIR=/opt/SPLcobAS40sp2
#              Supplied and Supported Java Version
               AWK=awk
               ;;
esac




Example 21:
-----------

Get a certain number of columns from df -k output:

df -k |awk '{print $1,$2,$5,$8}' |grep -v "Filesystem" > /tmp/df.txt 


#!/usr/bin/ksh
for i in `df -k |awk '{print $7}' |grep -v "Filesystem"'`
do
   echo "Albert"

done


#!/usr/bin/ksh
cd ~
rm -rf /root/alert.log
echo "Important alerts in errorlog: " >> /root/alert.log
errpt | grep -i STORAGE >> /root/alert.log
errpt | grep -i QUORUM >> /root/alert.log
errpt | grep -i ADAPTER >> /root/alert.log
errpt | grep -i VOLUME >> /root/alert.log
errpt | grep -i PHYSICAL >> /root/alert.log
errpt | grep -i STALE >> /root/alert.log
errpt | grep -i DISK >> /root/alert.log
errpt | grep -i LVM >> /root/alert.log
errpt | grep -i LVD >> /root/alert.log
errpt | grep -i UNABLE >> /root/alert.log
errpt | grep -i USER >> /root/alert.log
errpt | grep -i CORRUPT >> /root/alert.log
cat /root/alert.log

if [ `cat alert.log|wc -l` -eq 1 ]
then
   echo "No critical errors found."
fi

echo " "
echo "Filesystems that might need attention, e.g. %used:"
df -k |awk '{print $4,$7}' |grep -v "Filesystem"|grep -v tmp  > /root/tmp.txt
cat /root/tmp.txt | sort -n | tail -3



Example 22:
-----------

for sid in `grep -v "^#" /etc/oratab | sed -e 's/:.*$//'`
do
echo "/data/oracle/$sid/admin/bdump/alert_$sid.log:/beheer/log/history/oracle/$sid:cat::::" >> /tmp/test
done


Notes:
------

lsnrctl>set Log_status off
mv Listener.log to listenerold.log
lsnrctl>set Log_Status on 

% cd /u01/app/oracle/product/9.2.0/network/log
% lsnrctl set log_status off
% mv listener.log listener.old
% lsnrctl set log_status on

case $IN in
start)
for dbase in `grep -v "^#" /etc/oratab | sed -e 's/:.*$//'`
do
su - $dbase -c "/beheer/oracle/cluster/orapkg.sh start"
done;;

for dbase in `grep -v "^#" /etc/oratab | sed -e 's/:.*$//'`
do
echo $dbase
done;;

for sid in `grep -v "^#" /etc/oratab | sed -e 's/:.*$//'`
do
echo "/data/oracle/$sid/admin/bdump/alert_$sid.log:/beheer/log/history/oracle/$sid:cat::::" >> /tmp/test
done



Example 23: Get a substring of a filename:
------------------------------------------

Note 1:
-------

Get substring of a filename:

LONGFILE=myfile
set filename=`awk '{print substr($LONGFILE,1,3);exit}'`
echo $filename


Note 2:
-------

filename=whatever
first3chars=`echo $filename | cut -c 1-3`
echo $first3chars

Note 3:
-------

Q:
hi

i need to name a file with a substring of a another file name.

i.e. if the old filename is abc.txt , the new filename should be abc_1.txt
i should get the substring of the file name and then name the new one

please let me know how to do it

A:
#!/bin/sh
F=abc.txt
F1="${F%.*}_1.${F##*.}"
echo "F1=$F1

A:
t1=`basename $0 .txt`
t2=${t1}_1.txt
echo "new file name:-${t2}"



Example 24:
-----------

split a file in a shell script:

split command:
--------------

To split large files into smaller files in Unix, use the split command. At the Unix prompt, enter:

split [options] filename prefix 
Replace filename with the name of the large file you wish to split. Replace prefix with the name you wish 
to give the small output files. You can exclude [options], or replace it with either of the following:

  -l linenumber

  -b bytes 

If you use the  -l  (a lowercase L) option, replace linenumber with the number of lines you'd like 
in each of the smaller files (the default is 1,000). If you use the  -b  option, replace bytes with the number of bytes 
you'd like in each of the smaller files.

The split command will give each output file it creates the name prefix with an extension tacked to the end 
that indicates its order. By default, the split command adds aa to the first output file, proceeding through 
the alphabet to zz for subsequent files. If you do not specify a prefix, most systems use  x .

Examples
In this simple example, assume myfile is 3,000 lines long: 
  split myfile 
This will output three 1000-line files: xaa, xab, and xac.

Working on the same file, this next example is more complex: 
  split -l 500 myfile segment 
This will output six 500-line files: segmentaa, segmentab, segmentac, segmentad, segmentae, and segmentaf.

Finally, assume myfile is a 160KB file: 
  split -b 40k myfile segment 
This will output four 40KB files: segmentaa, segmentab, segmentac, and segmentad.

split -l 15000 originalFile.txt

for f in x*
do
runDataProcessor $f > $f.out &
done

wait

for k in *.out
do
cat $k >> combinedResult.txt
done




csplit command:
---------------

csplit [ -f Prefix ] [ -k ] [ -n Number ] [  -s ] File Argument ...

$ csplit orginal.txt 11 72 98

the csplit command would create four files: the xx00 file would contain lines 1-10, the xx01 file would contain lines 11-71, 
the xx02 file would contain lines 72-97, the  xx03 file would contain lines 98-108.

The Argument parameter can also contain the following symbols and pattern strings:
/Pattern/
Creates a file that contains the segment from the current line up to, but not including, the line containing the specified pattern. 
The line containing the pattern becomes the current line.


Example:

csplit -k -s -f part. split_file /^-----/ "{100}" 2>/dev/null

ls part.?? | while read file; do
   sed '/^-----/ d' ${file} > ${file}.new && mv ${file}.new ${file}
   echo "File ${file} contains:"
   cat ${file}
done

exit 0
$ ./split_it.sh 
File part.00 contains:
a
b
c
File part.01 contains:
d
e
f
File part.02 contains:
g
h
i
File part.03 contains:
j
k
l

Example:

To split the text of book into a separate file for each chapter, enter:

$ csplit book "/^ Chapter *[k.0-9]k./" {9}

This creates 10 files, xx00 through xx09. The xx00 file contains the front matter that comes before the first chapter. 
Files xx01 through xx09 contain individual chapters. Each chapter begins with a line that contains only the word Chapter and the chapter number.

Example:

To specify the prefix chap for the files created from book, enter: 

$ csplit -f chap book "/^ Chapter *[k.0-9]k./" {9}

This splits book into files named chap00 through chap09.

Example:

#!/bin/sh
#
# Split a file
#
if [ $# -lt 2 ]
then
echo "Syntax is ./lineSplit.sh <file name> <split line number>"
echo "Example: ./lineSplit.sh bp.out 25000"
exit 1
fi
file=$1
split=$2
lineMax=`wc -l $file | awk '{print $1}'`
counter=1
i=1
while [ $counter -lt $lineMax ]
do
  split1=`expr $counter`
  split2=`expr $counter + $split - 1`
  sed -n "$split1","$split2"p $file > fragment."$i"
  i=`expr $i + 1`
  counter=`expr $split2 + 1`
done
                                                                                
sed -n "$counter",\$p $file > fragment."$i"



Example 25:
-----------


Use of the IFS, or internal field separator, variable		
		
If you consider some line of text, you can tell your shell what is the field seperator.		
In normal text, a separator will be a 'space", or a "tab" etc.., but if you look for example at your $PATH		
variable, your field seperator will be a ":" symbol.		
		
#!/usr/bin/ksh		
IFS=:		# Here we say that the field seperator is the : character.
for p in $PATH		
do		
  if [ -x $p/$1 ]		
  then		
    echo $p/$1		
    return		
  fi		
done		
echo "No $1 found in path"		
return 1		



Example 26:
-----------

How to handle long text in a variable.

A very long text, spanning multiple lines, can be handled as in the following example:

  MAIL_TEXT="Goodmorning,"
  MAIL_TEXT="${MAIL_TEXT}\n\nThere's no file found in ${ARCH_DIR} younger than 1 day."
  MAIL_TEXT="${MAIL_TEXT}\nThat means that we have one of the following situations:"
  MAIL_TEXT="${MAIL_TEXT}\n- Something went wrong trying to receive the file."
  MAIL_TEXT="${MAIL_TEXT}\n- Or it was a special day where no files are send."

FIXED_TEXT="Uit een periodieke controle blijkt dat het SWIFT certificaat van onderstaande gebruikers," 
FIXED_TEXT="${FIXED_TEXT}\ngebruikt voor SWIFT diensten, binnenkort zal verlopen."
FIXED_TEXT="${FIXED_TEXT}\De gebruiker wordt verzocht op de aangegeven omgeving met zijn SWIFT account de binnen een week in te loggen"
FIXED_TEXT="${FIXED_TEXT}\zodat zijn certificaat wordt vernieuwd. Inloggen geschied met  SWIFT Alliance WebStation."
FIXED_TEXT="${FIXED_TEXT}\Let op:  elke omgeving gebruikt een eigen certificaat. Log voor elke melding apart in op de aangegeven omgeving."


Example 27: again checking status
---------------------------------



ABC_INST=$1


typeset -i ABC_PROCS

STATUS=""



Status_ABC()
{
  # Description: Function which gives the status of all processes: Running or Notrunning.
  # Calling: Status_CD4SN

  case ${ABC_INST} in
       aa)
          ABC_PROCS=$(ps -ef | grep Command | grep -v grep | grep "s 1" | wc -l)
          ;;
       bb)
          ABC_PROCS=$(ps -ef | grep Command | grep -v grep | grep "s 2" | wc -l)
          ;;
       cc)
          ABC_PROCS=$(ps -ef|grep $USER |grep -v grep|grep "abc.$(uname -n)" |wc -l)
          ;;
        *)
          echo "Wrong parameter. Use \"aa\", \"bb\" of \"cc\"
          exit
          ;;
  esac
  if [[ ${ABC_PROCS} = 2 ]]
  then
    STATUS="Running"
  else
    STATUS="Notrunning"
  fi
  echo "${STATUS}"
}

# MAIN
Status_ABC


Note:
-----

typeset

This command creates a shell variable, assigns it a value, and specifies certain attributes for the variable, 
such as integer and read-only.

The syntax is:

set typeset [-HLRZfilprtux[n] [name [=value]]...] 

where name is the shell variable to be created, value is to be assigned according to the options set.

The following example makes year read-only.

 
$ typeset -r year=2000
$ echo $year
$ year=2001
ksh: year: is readonly 




Example 28:
-----------


Put some output of some commands in html:

#!/bin/bash

# system_page - A script to produce a system information HTML file

##### Constants

TITLE="System Information for $HOSTNAME"
RIGHT_NOW=$(date +"%x %r %Z")
TIME_STAMP="Updated on $RIGHT_NOW by $USER"

##### Functions

function system_info
{
    echo "<h2>System release info</h2>"
    echo "<p>Function not yet implemented</p>"

}   # end of system_info


function show_uptime
{
    echo "<h2>System uptime</h2>"
    echo "<pre>"
    uptime
    echo "</pre>"

}   # end of show_uptime


function drive_space
{
    echo "<h2>Filesystem space</h2>"
    echo "<pre>"
    df
    echo "</pre>"

}   # end of drive_space


function home_space
{
    # Only the superuser can get this information

    if [ "$(id -u)" = "0" ]; then
        echo "<h2>Home directory space by user</h2>"
        echo "<pre>"
        echo "Bytes Directory"
        du -s /home/* | sort -nr
        echo "</pre>"
    fi

}   # end of home_space



##### Main

cat <<- _EOF_
  <html>
  <head>
      <title>$TITLE</title>
  </head>
  <body>
      <h1>$TITLE</h1>
      <p>$TIME_STAMP</p>
      $(system_info)
      $(show_uptime)
      $(drive_space)
      $(home_space)
  </body>
  </html>
_EOF_


Or...

#!/bin/bash

# system_page - A script to produce a system information HTML file

##### Constants

TITLE="System Information for $HOSTNAME"
RIGHT_NOW=$(date +"%x %r %Z")
TIME_STAMP="Updated on $RIGHT_NOW by $USER"

##### Functions

function system_info
{
    echo "<h2>System release info</h2>"
    echo "<p>Function not yet implemented</p>"

}   # end of system_info


function show_uptime
{
    echo "<h2>System uptime</h2>"
    echo "<pre>"
    uptime
    echo "</pre>"

}   # end of show_uptime


function drive_space
{
    echo "<h2>Filesystem space</h2>"
    echo "<pre>"
    df
    echo "</pre>"

}   # end of drive_space


function home_space
{
    # Only the superuser can get this information

    if [ "$(id -u)" = "0" ]; then
        echo "<h2>Home directory space by user</h2>"
        echo "<pre>"
        echo "Bytes Directory"
        du -s /home/* | sort -nr
        echo "</pre>"
    fi

}   # end of home_space


function write_page
{
    cat <<- _EOF_
    <html>
        <head>
        <title>$TITLE</title>
        </head>
        <body>
        <h1>$TITLE</h1>
        <p>$TIME_STAMP</p>
        $(system_info)
        $(show_uptime)
        $(drive_space)
        $(home_space)
        </body>
    </html>
_EOF_

}

function usage
{
    echo "usage: system_page [[[-f file ] [-i]] | [-h]]"
}


##### Main

interactive=
filename=~/system_page.html

while [ "$1" != "" ]; do
    case $1 in
        -f | --file )           shift
                                filename=$1
                                ;;
        -i | --interactive )    interactive=1
                                ;;
        -h | --help )           usage
                                exit
                                ;;
        * )                     usage
                                exit 1
    esac
    shift
done


# Test code to verify command line processing

if [ "$interactive" = "1" ]; then
	echo "interactive is on"
else
	echo "interactive is off"
fi
echo "output file = $filename"


# Write page (comment out until testing is complete)

# write_page > $filename







=====================
3. BOOT and Shutdown:
=====================


3.1 Shutdown systems:
=====================


3.1.1 Shutdown a Solaris system:
================================

Under Solaris, you need to use 

  init or shutdown are normally best: they run the kill scripts
  halt or reboot do not run the kill scripts properly

  /usr/sbin/shutdown -i5 -g0      -- this let the system go to the powerdown state
  /usr/sbin/shutdown -i6 -g0 -y   -- this let the system reboot
  /usr/sbin/shutdown -i0 -g0      -- shuts everything down, unmounts all fs

  shutdown -i6  (is a reboot in Solaris8)

  shutdown [-y no interactive confirmations] [-g grace period in seconds] [-i init state] [message]

- If you say init 6, or shutdown -i6, the system reboots an restart into a runstate as defined as the default
  in the inittab file.

- If you say init 0, the system cleanly shuts down, and you can power of the system
  If you say init 5, is equivalent to the poweroff command, and the system cleanly shuts down, 
  and you can power of the system


to achieve the desired effect. Be sure to read the man page for shutdown for your operating system. 
With no argument, shutdown will take the system into single user mode. 

- The /usr/sbin/reboot command: 
  is used when you want to reboot. The system does not go through the shutdown scripts. 
  Also, it usually sync's the filesystem. 
  Thus, the following is a safe bet on all Unixes: 
  sync;sync;sync;reboot

- The /usr/sbin/halt command: 
  syncs the filesystem and stops the processor. No shutdown scripts are fired up. 

- The fastboot/fasthalt command: 
  The fasthalt command halts the processor and creates a /fastboot file to tell the system to skip the 
  fsck operation upon reboot 

- The sync command: completes pending filesystem writes to disk (in other words, the buffer cache is dumped to disk). 
  Most Unix shutdown, reboot, and halt commands will do a sync. However, the reboot, fastboot, or halt commands will not 
  go through the shutdown scripts. 

If you manually sync, it is customary to do it multiple times (as we saw before). This is partly Unix superstition and 
part fact. 
The first sync is supposed to schedule a sync, not actually perform it. The second and subsequent syncs force the sync. 

sync<enter>

sync<enter>

init 0<enter>


- Shutdown scripts:
Like startup scripts, the system initialization directories (usually /etc/rcN.d) contains shutdown scripts which are fired up
by init during an orderly shutdown (i.e. when either the init command is used to change the runlevel or when the 
shutdown command is used). 
The usual convention is to use the letter K in front of a number, followed by a service name, such as K56network. 
The number determines the order in which the scripts are fired up when the system transitions into a particular run level. 


3.1.2 Shutdown an AIX system:
=============================

You can use the init, shutdown and halt commands. The shutdown command stops the system in an orderly fashion.

Bring the system from multi-user mode to maintenance mode, use
# shutdown -m

To restart the system, use
# shutdown -r

To gracefully shutdown the system, use
# shutdown

If you need a customized shutdown sequence, you can create a file called /etc/rc.shutdown.
If this file exists, it is called by the shutdown command and is executed first.
This can be usefull for example, if you need to close a database prior to a shutdown.
If rc.shutdown fails (non zero return code value), the shutdown cycle is terminated.

Example rc.shutdown:
--------------------

#cat /etc/rc.shutdown

#!/bin/ksh


# stop Control-SA/Agent
/etc/rc.ctsa stop
/etc/rc.mwa stop
/etc/rc.opc stop

# Stop TSM dsmcad en scheduler
/etc/rc.dsm stop


# Stop TSCM client
/opt/IBM/SCM/client/jacclient stop

# /etc/rc.shutdown SHOULD always end with a # Stop db2 instances as last line
/etc/rc.ihs stop
/etc/rc.ihs stop des
/etc/rc.appserver stop PRM1DES
/etc/rc.nodeagent stop
/etc/rc.dmgr stop
# Stop db2 instances
/etc/rc.db2_udb stop all

/etc/rc.directoryserver stop
 #Stop the Tivoli Enterprise Console Logfile Adapter
if [ -f /beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/bin/init.tecad_logfile ]; then
   /beheer/Tivoli/lcf/bin/aix4-r1/TME/TEC/adapters/bin/init.tecad_logfile stop aix-default >/dev/null 2>&1
   echo "Tivoli Enterprise Console Logfile Adapter stopped."
fi
exit 0





3.1.3 Shutdown a Linux system:
==============================

Note 1: Redhat systems:

To shut down Red Hat Linux, issue the shutdown command. You can read the shutdown man page for complete details, 
but the two most common uses are: 

/sbin/shutdown -h now
/sbin/shutdown -r now
 
You must run shutdown as root. After shutting everything down, the -h option will halt the machine, 
and the -r option will reboot. 

Non-root users can use the reboot and halt commands to shutdown the system while in runlevels 1 through 5. 
However, not all Linux operating systems support this feature. 

If your computer does not power itself down, be careful not turn off the computer until you see a message indicating 
that the system is halted. 




3.2 Booting:
============


3.2.1 Generic description booting a unix system:
================================================
 
1. Or some rom menu, or some command prompt, or automatic procedure,
provides for finding or selecting initial bootstraps code
on some device
 
2. kernel is loaded (unix, vmunix etc.. in /, or /boot, or /kernel, or some place else)
3. start init
4. run scripts

SunOs:    /vmunix
Solaris8 = SunOs 5.8: /kernel/unix
AIX: /unix



3.2.2 Bootprocedure Solaris:
============================


1. Boot overview
----------------

- Openboot PROM: 

  After a dignostice fase, it shows either 
    - the "ok" prompt or 
    - via Openboot parameter "auto_boot" goes on with the bootprocedure.
      Thanks to the Openboot parameter "boot_device" the path to the bootblk is known.

  You can also use the
  "ok boot alias/physical_device_name" to boot from the specified device, like for example
  "ok boot disk3"

  You can also use options as 
  "ok boot -a=interactive boot, -r=reconfiguration boot, -s=boot to single-user state, -v=verbose mode"

  The command "ok boot disk5 kernel/unix -s", the PROM will look for the primary bootprogram bootblk
  on the alias disk5, which could be a physical device as /iommu/sbus/espdma@f,400000/esp@f,800000/sd@0,0
  The primary startup command will then load "ufsboot". This will then load the kernel as specified.

  Thus, after the simple boot command, the boot process goes on in the following manner:

- bootblk  (from the default boot-device)
- bootblk find ufsboot (plus filesystem drivers)
- ufsboot starts the kernel, its merges genunix and unix (mounts the root fs)
- kernel mounts other fs
- start sched (swapper)
- start /sbin/init 
- init starts demons in rc scripts

You can also view the startup information via:

$ more /var/adm/messages
$ /usr/sbin/dmesg 

2. Booting with Other system file:
----------------------------------

1. login as root

2. create a backup copy of the /etc/system file
   cp /etc/system /etc/system.orig

3. Halt the system
   /usr/sbin/shutdown -y -g0 -i0

4. at the OK prompt, boot the system using the interactive option
   OK boot -a

5. You will be prompted to enter a filename for the kernel, and a default
   directory for modules. Enter a return for each of these questions.
   When prompted to use the default /etc/system file:

   Name of system file [etc/system]:

   enter the following:

   /etc/system.orig 


3. More about init:

init uses the /etc/inittab File
When you boot the system or change run levels with the init or shutdown command, the init daemon starts processes 
by reading information from the /etc/inittab file. This file defines three important items for the init process: 

-The system's default run level
-What processes to start, monitor, and restart if they terminate
-What actions to be taken when the system enters a new run level

Each entry in the /etc/inittab file has the following fields:

id:rstate:action:process


in /etc we find the links:
rc0 -> /sbin/rco
rc1 -> /sbin/rc1
rc2 -> /sbin/rc2
rc3 -> /sbin/rc3
rc4 -> /sbin/rc4
rc5 -> /sbin/rc5
rc6 -> /sbin/rc6

in /sbin we find the scripts rc0 - rc6, rcS. These are not links, but true shell scripts.
In /etc  we find the links   rc0 - rc6, rcS.

In /etc we find the (true) directories /etc/rc#.d.
So suppose the runlevel=3

1. init reads inittab

Contents /etc/inittab

\u@\h[\w]> more inittab
ap::sysinit:/sbin/autopush -f /etc/iu.ap
ap::sysinit:/sbin/soconfig -f /etc/sock2path
fs::sysinit:/sbin/rcS                   >/dev/console 2<>/dev/console </dev/console
is:3:initdefault:
p3:s1234:powerfail:/usr/sbin/shutdown -y -i5 -g0 >/dev/console 2<>/dev/console
s0:0:wait:/sbin/rc0                     >/dev/console 2<>/dev/console </dev/console
s1:1:wait:/usr/sbin/shutdown -y -iS -g0 >/dev/console 2<>/dev/console </dev/console
s2:23:wait:/sbin/rc2                    >/dev/console 2<>/dev/console </dev/console
s3:3:wait:/sbin/rc3                     >/dev/console 2<>/dev/console </dev/console
s5:5:wait:/sbin/rc5                     >/dev/console 2<>/dev/console </dev/console
s6:6:wait:/sbin/rc6                     >/dev/console 2<>/dev/console </dev/console
fw:0:wait:/sbin/uadmin 2 0              >/dev/console 2<>/dev/console </dev/console
of:5:wait:/sbin/uadmin 2 6              >/dev/console 2<>/dev/console </dev/console
rb:6:wait:/sbin/uadmin 2 1              >/dev/console 2<>/dev/console </dev/console
sc:234:respawn:/usr/lib/saf/sac -t 300
co:234:respawn:/usr/lib/saf/ttymon -g -h -p "`uname -n` console login: " -T sun
-d /dev/console -l console -m ldterm,ttcompat


2. init knows the runlevel, default it's 3


For each rc script in the /sbin directory, there is a corresponding directory named /etc/rcn.d that contains 
scripts to perform various actions for that run level. 
For example, /etc/rc2.d contains files used to start and stop processes for run level 2.


# ls /etc/rc2.d
K20spc@             S70uucp*            S80lp*
K60nfs.server*      S71rpc*             S80spc@
K76snmpdx*          S71sysid.sys*       S85power*
K77dmi*             S72autoinstall*     S88sendmail*
README              S72inetsvc*         S88utmpd*
S01MOUNTFSYS*       S73nfs.client*      S89bdconfig@
S05RMTMPFILES*      S74autofs*          S91leoconfig*
S20sysetup*         S74syslog*          S92rtvc-config*
S21perf*            S74xntpd*           S92volmgt*
S30sysid.net*       S75cron*            S93cacheos.finish*
S47asppp*           S76nscd*            S99audit*
S69inet*            S80PRESERVE*        S99dtlogin*
 

The /etc/rcn.d scripts are always run in ASCII sort order. The scripts have names of the form:

[K,S][0-9][0-9][A-Z][0-99]

Files beginning with K are run to terminate (kill) a system process. Files beginning with S are run to start a system process.

Run control scripts are also located in the /etc/init.d directory. 
These files are linked to corresponding run control scripts in the /etc/rc*.d directories.
One advantage of having individual scripts for each run level is that you can run scripts in the /etc/init.d directory individually 
to turn off functionality without changing a system's run level.

- Start and Stop of an individual process

The advantage to have individual scripts, is that you can stop or start individual processes
by running such a script, without rebooting or changing the run level.


Turn off functionality. 
# /etc/init.d/filename stop
 
Restart functionality
# /etc/init.d/filename start

For example, if you want to restart the NFS server, you can do the following:

# /etc/init.d/nfs.server stop
# /etc/init.d/nfs.server start


Use the ps and grep commands to verify whether the service has been stopped or started.
# ps -ef | grep service 


  Adding a Run Control Script:
  ----------------------------

  All scripts are in /etc/init.d
  You create the neccesary links in the corresponding /etc/rcn.d directory

  The /sbin/rcN scripts run the /etc/rcN.d scripts

  If you want to add a run control script to start and stop a service, 
  copy the script into the /etc/init.d directory and create links in the rc*.d 
  directory you want the service to start and stop. 
  See the README file in each /etc/rc*.d directory for more information on naming run control scripts. 
  The procedure below describes how to add a run control script.

  How to Add a Run Control Script
  Become superuser.

  Add the script to the /etc/init.d directory. 

  # cp filename /etc/init.d 
  # chmod 744 /etc/init.d/filename
  # chown root:sys /etc/init.d/filename

  Create links to the appropriate rc*.d directory.

  # cd /etc/init.d
  # ln filename /etc/rc2.d/Snnfilename
  # ln filename /etc/rcn.d/Knnfilename 

  (or 
  cd /etc/rc2d
  ln /etc/init.d/filename S22filename
  )

  Use the ls command to verify that the script has links in the specified directories.

  # ls /etc/init.d/ /etc/rc2.d/ /etc/rcn.d/
 
  Example-Adding a Run Control Script

  # cp xyz /etc/init.d
  # cd /etc/init.d
  # ln xyz /etc/rc2.d/S100xyz
  # ln xyz /etc/rc0.d/K100xyz
  # ls /etc/init.d /etc/rc2.d /etc/rc0.d


#!/bin/ksh

# name: spl
# purpose: script that will start or stop the spl stuff.


case "$1" in
start )
        echo "starting spl"
        echo "su - ccbsys -c '/prj/spl/SPLS3/bin/splenviron.sh -e SPLS3 -c "spl.sh -t start"'"
        su - ccbsys -c '/prj/spl/SPLS3/bin/splenviron.sh -e SPLS3 -c "spl.sh -t start"'
        ;;
stop )
        echo "stopping spl"
        echo "su - ccbsys -c '/prj/spl/SPLS3/bin/splenviron.sh -e SPLS3 -c "spl.sh -t stop"'"
        su - ccbsys -c '/prj/spl/SPLS3/bin/splenviron.sh -e SPLS3 -c "spl.sh -t stop"'
        ;;
* )
        echo "Usage: $0 (start | stop)"
        exit 1
esac


3. /sbin/rc3 script will be started


Contents /sbin/rc3

#!/sbin/sh
#       Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T
#         All Rights Reserved

#       THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF AT&T
#       The copyright notice above does not evidence any
#       actual or intended publication of such source code.

#ident  "@(#)rc3.sh     1.12    94/12/19 SMI"   SVr4.0 1.11.2.2
#       "Run Commands" executed when the system is changing to init state 3,
#       same as state 2 (multi-user) but with remote file sharing.

PATH=/usr/sbin:/usr/bin
set `/usr/bin/who -r`
if [ -d /etc/rc3.d ]
   then
        for f in /etc/rc3.d/K*
        {
                if [ -s ${f} ]
                then
                        case ${f} in
                                *.sh)   .        ${f} ;;        # source it
                                *)      /sbin/sh ${f} stop ;;   # sub shell
                        esac
                fi
        }

for f in /etc/rc3.d/S*
        {
                if [ -s ${f} ]
                then
                        case ${f} in
                                *.sh)   .        ${f} ;;        # source it
                                *)      /sbin/sh ${f} start ;;  # sub shell
                        esac
                fi
        }
fi
 
modunload -i 0 & > /dev/null 2>&1

if [ $9 = 'S' -o $9 = '1' ]
then
  echo 'The system is ready.'
fi
       

4. From /sbin/rc3 all K* and S* scripts in /etc/rc3.d will be run

Oracle is installed on this host,  so there should be a /etc/rc3.d/S99oracle or similar
script. Now there indeed exists the S88dbora script.


5. There is an S88dbora script, so it will be called:

Oracle S88dbora script in /etc/rc3.d


Example:
--------

mt -f /dev/rm  rewind
tar -xvf /dev/rmt1.1  fielname
mt -f /dev/rmt0.1 fsf 2 (voor drie) (daarna staat tapepointer op begin 4)

fsf bsf

\u@\h[\w]> more S88dbora
#!/bin/sh

# 
# Startup for Oracle Databases
#

ORACLE_HOME=/opt/oracle/product/8.0.6
ORACLE_OWNER=oracle

if [ ! -f $ORACLE_HOME/bin/dbstart ] ;then
  echo "Oracle startup: cannot start"
  exit
fi

case "$1" in
  'start')
        # Start the Oracle databases
        su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/dbstart" > /dev/null 2>&1
        su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/lsnrctl start" > /dev/null 2>&1
        su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/lsnrctl dbsnmp_start" > /dev/null 2>&1
        ;;

  'stop')
        # Stop the Oracle databases
        su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/lsnrctl dbsnmp_stop" > /dev/null
 2>&1
        su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/lsnrctl stop" > /dev/null 2>&1
        su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/dbshut" > /dev/null 2>&1
        ;;

  *)
        echo "Usage: $0 { start | stop }"
        ;;
esac

Another example:
----------------

more /etc/init.d/dbora

nlih30207858-08:/etc/init.d $ more dbora
# Set ORA_HOME to be equivalent to the ORACLE_HOME
# from which you wish to execute dbstart and
# dbshut
# set ORA_OWNER to the user id of the owner of the
# Oracle database in ORA_HOME
ORA_HOME=/u01/app/oracle/product/9.2
ORA_OWNER=oraclown
if [ ! -f $ORA_HOME/bin/dbstart -o ! -d $ORA_HOME ]
then
echo "Oracle startup: cannot start"
exit
fi
case "$1" in
'start')
# Start the Oracle databases and listener:
su - $ORA_OWNER -c "$ORA_HOME/bin/lsnrctl start" &
su - $ORA_OWNER -c $ORA_HOME/bin/dbstart &
;;
'stop')
# Stop the Oracle databases and listener:
su - $ORA_OWNER -c $ORA_HOME/bin/lsnrctl stop &
su - $ORA_OWNER -c $ORA_HOME/bin/dbshut &
;;
esac


6. To start the database(s) en listener(s), the dbstart script is run:

\u@\h[\w]> more dbstart
:
#
# $Header: dbstart.sh.pp 1.1 95/02/22 14:37:29 rdhoopar Osd<unix> $ dbstart.sh.p
p Copyr (c) 1991 Oracle
#

###################################
#
# usage: dbstart
#
# This script is used to start ORACLE from /etc/rc(.local).
# It should ONLY be executed as part of the system boot procedure.
#
#####################################

ORATAB=/var/opt/oracle/oratab

trap 'exit' 1 2 3
case $ORACLE_TRACE in
    T)  set -x ;;
esac

# Set path if path not set (if called from /etc/rc)
case $PATH in
    "") PATH=/bin:/usr/bin:/etc
        export PATH ;;
esac

#
# Loop for every entry in oratab file and and try to start
# that ORACLE
#

cat $ORATAB | while read LINE
do
    case $LINE in
        \#*)            ;;      #comment-line in oratab
        *)
#       Proceed only if third field is 'Y'.
        if [ "`echo $LINE | awk -F: '{print $3}' -`" = "Y" ] ; then
            ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
            if [ "$ORACLE_SID" = '*' ] ; then
                ORACLE_SID=""
            fi
#           Called programs use same database ID
            export ORACLE_SID
            ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
#           Called scripts use same home directory
            export ORACLE_HOME
#           Put $ORACLE_HOME/bin into PATH and export.
            PATH=$ORACLE_HOME/bin:/bin:/usr/bin:/etc ; export PATH

            PFILE=${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora

#       Figure out if this is a V5, V6, or V7 database. Do we really need V5?
            if [ -f $ORACLE_HOME/bin/sqldba ] ; then
                VERSION=`$ORACLE_HOME/bin/sqldba command=exit | awk '
                        /SQL\*DBA: (Release|Version)/ {split($3, V, ".") ;
                        print V[1]}'`
            else
                if test -f $ORACLE_HOME/bin/svrmgrl; then
                        VERSION="7.3"

                else
                        VERSION="5"
                fi
            fi

           if test  -f $ORACLE_HOME/dbs/sgadef${ORACLE_SID}.dbf  -o \
                     -f $ORACLE_HOME/dbs/sgadef${ORACLE_SID}.ora
            then
                STATUS="-1"
            else
                STATUS=1
            fi
            case $STATUS in
                1)  if [ -f $PFILE ] ; then
                        case $VERSION in
                            5)  ior w pfile=$PFILE
                                ;;

                            6)  sqldba command=startup
                                ;;

                            7)  sqldba <<EOF
connect internal
startup
EOF
                                ;;

                           7.3) svrmgrl <<EOF
connect internal
startup
EOF
                                ;;
                        esac

                        if test $? -eq 0 ; then
                            echo ""
                            echo "Database \"${ORACLE_SID}\" warm started."
                        else
                            echo ""
                            echo "Database \"${ORACLE_SID}\" NOT started."
                        fi
                    else
                        echo ""
                        echo "Can't find init file for Database \"${ORACLE_SID}\
"."
                        echo "Database \"${ORACLE_SID}\" NOT started."
                    fi
                    ;;

                -1) echo ""
                    echo "Database \"${ORACLE_SID}\" possibly left running when
system went down (system crash?)."
                    echo "Notify Database Administrator."
                    case $VERSION in
                        5)  ior c
                            ;;

                        6)  sqldba "command=shutdown abort"
                            ;;

                        7)  sqldba <<EOF
connect internal
shutdown abort
EOF
                            ;;

                      7.3)  svrmgrl <<EOF
connect internal
shutdown abort
EOF

                           ;;
                    esac

                    if test $? -eq 0 ; then
                        if [ -f $PFILE ] ; then
                            case $VERSION in
                                5)  ior w pfile=$PFILE
                                    ;;

                                6)  sqldba command=startup
                                    ;;

                                7)  sqldba <<EOF
connect internal
startup
EOF
                                    ;;
                              7.3)  svrmgrl <<EOF
connect internal
startup
EOF
                                    ;;
                            esac
                            if test $? -eq 0 ; then
                                echo ""
                                echo "Database \"${ORACLE_SID}\" warm started."
                            else
                                echo ""
                                echo "Database \"${ORACLE_SID}\" NOT started."
                            fi
                        else
                            echo ""
                            echo "Can't find init file for Database \"${ORACLE_S
ID}\"."
                            echo "Database \"${ORACLE_SID}\" NOT started."
                        fi
                    else
                        echo "Database \"${ORACLE_SID}\" NOT started."
                    fi
                    ;;
            esac
        fi
        ;;
    esac
done




environment oracle user

DBPASSWORD=abc
DBPASSWORDFE=mrx
DBUSER=xyz
DBUSERFE=mry
EDITOR=vi
HOME=/opt/home/oracle
HZ=100
INPUTRC=/usr/local/etc/inputrc
LD_LIBRARY_PATH=/opt/oracle/product/8.0.6/lib
LESSCHARSET=latin1
LOG=/var/opt/oracle
LOGNAME=oracle
MANPATH=/usr/share/man:/usr/openwin/share/man:/usr/opt/SUNWmd/man:/opt/SUNWsymon
/man:/opt/SUNWswusg/man:/opt/SUNWadm/2.2/man:/opt/local/man
NLS_LANG=american_america.we8iso8859p1
OPENWINHOME=/usr/openwin
ORACLE_BASE=/opt/oracle
ORACLE_HOME=/opt/oracle/product/8.0.6
ORACLE_SID=ORCL
ORA_NLS33=/opt/oracle/product/8.0.6/ocommon/nls/admin/data
PATH=/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/ucb:/usr/openwin/bin:/opt/orac
le/product/8.0.6/bin
PROGRAMS=/opt/local/bin/oracle
PS1=\u@\h[\w]>
SHELL=/sbin/sh
TERM=vt100
TZ=MET
\u@\h[\w]>



3.2.3 Bootprocedure AIX 5.x: 
============================

http://publib.boulder.ibm.com/infocenter/pseries/index.jsp?topic=/com.ibm.aix.doc/aixbman/admnconc/under_sys.htm


Understanding the Boot Process
During the boot process, the system tests the hardware, loads and runs the operating system, 
and configures devices. To boot the operating system, the following resources are required:

. A boot image that can be loaded after the machine is turned on or reset. 
. Access to the root (/) and /usr file systems.

There are three types of system boots:

Hard Disk Boot           A machine is started for normal operations. For more information, 
                         see Understanding System Boot Processing. 
Diskless Network Boot    A diskless or dataless workstation is started remotely over a network. 
                         A machine is started for normal operations. One or more remote file servers provide the files 
                         and programs that diskless or dataless workstations need to boot. 
Maintenance Boot         A machine is started from a hard disk, network, tape, or CD-ROM in maintenance mode. 
                         A system administrator can perform tasks such as installing new or updated software and running 
                         diagnostic checks. For more information, see Understanding the Maintenance Boot Process. 

During a hard disk boot, the boot image is found on a local disk created when the operating system was installed. 
During the boot process, the system configures all devices found in the machine and initializes other basic software 
required for the system to operate (such as the Logical Volume Manager). At the end of this process, 
the file systems are mounted and ready for use. For more information about the file system used during boot processing, 
see Understanding the RAM File System.

The same general requirements apply to diskless network clients. They also require a boot image and access 
to the operating system file tree. Diskless network clients have no local file systems and get all their 
information by way of remote access.


Understanding System Boot Processing:

Most users perform a hard disk boot when starting the system for general operations. 
The system finds all information necessary to the boot process on its disk drive.

When the system is started by turning on the power switch (a cold boot) or restarted with the 
reboot or shutdown commands (a warm boot), a number of events must occur before the system is ready for use. 
These events can be divided into the following phases:

. Read Only Storage (ROS) Kernel Init Phase 
. Base Device Configuration Phase 
. Maintenance Boot Phase.

-- ROS Kernel Init Phase:

The ROS kernel resides in firmware. Its initialization phase involves the following steps:

1. The firmware checks to see if there are any problems with the system motherboard. Control is passed to ROS, 
which performs a power-on self-test (POST). 

2. The ROS initial program load (IPL) checks the user bootlist, a list of available boot devices. 
This boot list can be altered to suit your requirements using the bootlist command. If the user boot list 
in non-volatile random access memory (NVRAM) is not valid or if a valid boot device is not found, 
the default boot list is then checked. In either case, the first valid boot device found in the boot list 
is used for system startup. If a valid user boot list exists in NVRAM, the devices in the list are checked in order. 
If no user boot list exists, all adapters and devices on the bus are checked. In either case, devices are checked 
in a continuous loop until a valid boot device is found for system startup. 

Note:
The system maintains a default boot list located in ROS and a user boot list stored in NVRAM, 
for a normal boot. Separate default and user boot lists are also maintained for booting from the Service key position.

3. When a valid boot device is found, the first record or program sector number (PSN) is checked. 
If it is a valid boot record, it is read into memory and is added to the IPL control block in memory. 
Included in the key boot record data are the starting location of the boot image on the boot device, 
the length of the boot image, and instructions on where to load the boot image in memory. 

4. The boot image is read sequentially from the boot device into memory starting at the location 
specified in NVRAM. The disk boot image consists of the kernel, a RAM file system, and base customized 
device information (customized reduced ODM). 

5. Control is passed to the kernel, which begins system initialization. 

6. The kernel runs init, which runs phase 1 of the "/sbin/rc.boot" script.
When the kernel initialization phase is completed, base device configuration begins.


-- Base Device Configuration Phase:

The init process starts the rc.boot script. Phase 1 of the rc.boot script performs the base device configuration, 
and it includes the following steps:

. The boot script calls the restbase program to build the customized Object Data Manager (ODM) database 
  in the RAM file system from the compressed customized data. 
. The boot script starts the configuration manager, which accesses phase 1 ODM configuration rules to configure 
  the base devices. 
. The configuration manager starts the sys, bus, disk, SCSI, and the Logical Volume Manager (LVM) and 
  rootvg volume group configuration methods. 
. The configuration methods load the device drivers, create special files, and update the customized data 
  in the ODM database.

-- System Boot Phase:

The System Boot Phase involved the following steps:

The init process starts phase 2 running of the rc.boot script. Phase 2 of rc.boot includes the following steps: 
.Call the ipl_varyon program to vary on the rootvg volume group. 
.Mount the hard disk file systems onto their normal mount points. 
.Run the swapon program to start paging. 
.Copy the customized data from the ODM database in the RAM file system to the ODM database in the hard disk file system. 
.Exit the rc.boot script.

- After phase 2 of rc.boot, the boot process switches from the RAM file system to the hard disk root file system. 
- Then the init process runs the processes defined by records in the /etc/inittab file. 
One of the instructions in the /etc/inittab file runs phase 3 of the rc.boot script, 

  cat /etc/inittab | grep rc.boot
  brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot

which includes the following steps: 
.Mount the /tmp hard disk file system. 
.Start the configuration manager phase 2 to configure all remaining devices. 
.Use the savebase command to save the customized data to the boot logical volume 
.Exit the rc.boot script.

At the end of this process, the system is up and ready for use.

- To display the current runlevel:

# who -r
# cat /etc/.init.state

- To display a history of previous runlevels:

# /usr/lib/acct/fwtmp < /var/adm/wtmp | grep run-level

- To change the runlevel:

# telinit m    # m in 0-9,   S,s,M,m,   a,b,c,    Q,q

S,s,M,m: maintenance mode
0,1    : reserved
2      : normal
3-9    : user defined
Q,q    : reparse inittab file


The telinit command directs the actions of the init process by taking a one character parameter
and signaling the init process to perform the appropriate action. 
So the telinit command sets the system at a specific runlevel.


Some important PCI systems an pSeries LED codes:
------------------------------------------------


hd4= /, hd5=boot, hd6=paging, hd2=/usr, hd3=/tmp, hd9var=/var
/etc is in "/", so the ODM database is in "/".

Describe LED codes (121, 223, 229, 551, 552, 553, 581, OC31, OC32) 


-reduced ODM from BLV copied into RAMFS: OK=510, NOT OK=LED 548: 
-LED 511: bootinfo -b is called to determine the last bootdevice
-ipl_varyon of rootvg: OK=517,ELSE 551,552,554,556: 
-LED 555,557: mount /dev/hd4 on temporary mountpoint /mnt
-LED 518: mount /usr, /var
-LED 553: syncvg rootvg, or inittab problem
-LED 549
-LED 581: tcp/ip is being configured, and there is some problem

Last phases in the boot is where cfgcon is called, to configure the console.
cfgcon LED codes include:
C31: Console not yet configured.
C32: Console is an LFT terminal
C33: Console is a TTY
C34: Console is a file on disk
C99: Could not detect a console device

LED 551: ipl_varyon of rootvg

201           : Damaged boot image
223-229       : Invalid boot list
551,555,557   : Corrupted filesystem, corrupted JFS log
552,554,556   : Superblock corrupted, corrupted customized ODM database
553           : Corrupted /etc/inittab file


More Detail on LED codes:
-------------------------


105
CPU planar board is not securely seated in the adapter slot on the microchannel bus.


--------------------------------------------------------------------------------

200 
Key is in SECURE mode and the system will NOT boot until the key is turned to either 
NORMAL or SERVICE mode.


--------------------------------------------------------------------------------

201 
LV hd5 (boot logical volume) has been corrupted. To correct this situation, perform the following: 

. Boot system in service mode. Either boot the system from boot diskettes or boot tape OF THE SAME VERSION AND 
LEVEL AS THE SYSTEM. 
. To perform system maintenance functions from the INSTALL and MAINTENANCE menu, enter the following command, 
where hdisk0 is the drive that contains the boot logical volume (/blv) 
/usr/sbin/getrootfs hdisk0 

. From maintenance mode make sure /tmp has at least enough free disk space to create the tape image when 
the 'bosboot' command is executed. 
. Make sure /dev/hd6 is swapped on via the lsps -a command. 
You don't want to get 'paging space low' messages when creating a new boot image on /dev/hd5. Recreate 
a new boot image by executing the command: 
bosboot -a -d /dev/hdisk0 
Turn key to normal mode 
shutdown -Fr 


--------------------------------------------------------------------------------
221 

The NVRAM is potentially corrupted. To correct this situtation, perform the following steps:

Boot system in service mode. Either boot the system from boot diskettes or boot tape 
Select option to perform system maintenance functions from the INSTALL and MAINTENANCE menu. 
Enter the following command: 
/usr/sbin/getrootfs hdisk0 from maintenance mode 
Enter the command 
bootlist -m normal hdisk0 or whatever your boot drive name is (eg., hdisk1) 
shutdown -Fr 
If the above method fails, try the following:

Shutdown your machine and unplug your system battery before you power up. 
Wait 30 minutes for battery to drain. 
Reconnect battery. 
Once you power up and a 221 is displayed on your LED 
flip the key to service mode then back to normal mode 
plug in system battery 
Once this is done, the NVRAM should return to normal.


--------------------------------------------------------------------------------

223/229 
Cannot boot in normal mode from any of the devices listed in the NVRAM bootlist.

Typically the cause of this problem is the machine has just been moved and the SCSI adapter card is not 
firmly seated in the adapter slot on the microchannel bus. Make sure the card is seated properly and all 
internal and external SCSI connectors are firmly attached. 
Another possibility is that a NEW SCSI device has been added to the system and there are two or more devices 
with the same SCSI ID. 


--------------------------------------------------------------------------------

233 
Attempting to IPL from devices specified in NVRAM device list. If diagnostics indicate a bad drive is 
suspected, BEFORE replacing the physical volume, replace the LOGIC ASSEMBLY on the drive housing first. 
Saves time in retrying to rebuild a system especially if full backups haven't been made recently.


--------------------------------------------------------------------------------

552
BAD ERROR. The VG rootvg could not be varied on. Most likely scenario is that the VGDA on the default 
boot drive (hdisk0) got hammered/corrupted. To resolve this problem, try the following:

1) Boot system in service mode. Either boot the system from boot diskettes or boot tape
2) Select option to perform system maintenance functions from the INSTALL and MAINTENANCE menu.
3) Enter the following command:/usr/sbin/getrootfs hdisk0 from maintenance mode. If there are at least two PVs in the VG rootvg, if one fails to work with this command, try any of the remaining PVs (eg, /etc/continue hdisk0 or /etc/continue hdisk1)
4) If the importvg command fails, as should the varyonvg command, then perform the following from the command line:

exportvg <VG_NAME> EXAMPLE: exportvg vg2 removes LV references from ODM but wont write any info to VGDA
importvg -y <VG_NAME> <PV_NAME> EXAMPLE: importvg -y vg2 hdisk1  restores ODM database from information read from VGDA
varyonvg -m1 <VG_NAME>  EXAMPLE: varyonvg vg2 This command will INSURE that the ODM database MATCHES 
the characteristics stored in the VGDA (syncs VGDA to ODM)

5) If no error messages are reported by importvg or varyonvg, then goto step '11'
6) Execute the command: mount
7) If /dev/ram0 is the only mounted filesystem, try the following script entered interactively from the command line: EXAMPLE: for VG rootvg - if it fails to varyon

for i in hd2 hd3 hd4
    do 
        synclvodm rootvg $i
        if [ "$?" -eq 0 ]; then
            fsck -fp /dev/$i
        fi
    done

8) If there are no error messages from the synclvodm command or the fsck command, then mount the following 
file systems:

mount /dev/hd3 /tmp
mount /dev/hd2 /usr
mount /dev/hd4 /mnt

9) If there are no error messages from these mount commands, then goto step '11'
10) If the previous step fails or the log redo process fails or indicates any filesystems with an 
unknown log device, then do the following 2 steps:    

/etc/aix/logform /dev/hd8 ( Answer 'y' to the "Destroy /dev/hd8 (y)?" prompt )
LogForm will reformat the log logical volume. The next IPL will take a little longer.

11) Turn key to normal mode
12) Shutdown the system via the command shutdown -Fr. If this doesn't appear to be working, type the following at the command line:

    sync; sync;
    halt

13) If the problem still persists, consult your local SE before you attempt to RE-INSTALL your system.


--------------------------------------------------------------------------------

553 
Your /etc/inittab file has been corrupted or truncated. To correct this situation, perform the following:

boot system in service mode. Either boot the system from boot diskettes or boot tape select option 5 
(perform system maintenance) from the INSTALL and MAINTENANCE menu. 
Enter the command /etc/continue hdisk0 from maintenance mode. 
Check to see that you have free space on those file systems that are mounted on logical volumes /dev/hd3 and /dev/hd4. 
If they are full, erase files that aren't needed. 
Some space needs to be free on these logical volumes for the system to boot properly. 
Check to see if the /etc/inittab file looks ok. If not, goto the next step, else consult your local SE 
for further advice. 
Place the MOST recent 'mksysb' tape into the tape drive. If you don't have a 'mksysb' tape, get your 
INSTALL/MAINT floppy and insert into your diskette drive. 
Extract the /etc/inittab file from the media device mentioned. 
Change directories to root (eg., cd /) first, then execute the following command: 
restore -xvf/dev/fd0 ./etc/inittab - if a floppy disk 
restore -xvf/dev/rmt0 ./etc/inittab - if a tape device 
This will restore the contents of the /etc/inittab file to a reasonable format to boot the system up with. 
Depending on how current the /etc/inittab file is, you may have to manually add, modify, or delete the 
contents of this file. 

shutdown -Fr 


--------------------------------------------------------------------------------

581 
This LED is displayed when the /etc/rc.net script is executed.

Verify this script is correct or if modifications have been made since the system was last rebooted. 
Any errors logged during the execution of this script are sent to the /tmp/rc.net.out file. 
top of page


--------------------------------------------------------------------------------

727
Printer port is being configured BUT there is NO cable connected to the configured port on the 16-port 
concentrator OR the RJ-45 cable from the concentrator back to the 64-port card isn't connected.

Either remove the printer in question from the ODM database (eg., rmdev -l lp0 -d) OR 
Reconnect the printer cable back to the port on the 16-port concentrator OR 
Re-connect the 16-port concentrator back to the 64-port adapter card. 
To determine WHICH concentrator box that printer is connected to

Count the number of 727s displayed on the LED 
Subtract two (first two 727s deal with the native serial ports). 
For example, if the LED count is 17 (minus the two for the native ports), then the second concentrator 
box is the problem. 


--------------------------------------------------------------------------------

869 
Most likely scenario is that you have two or more SCSI devices with the same SCSI id on one SCSI controller. 
To correct this situation...

Change one of the conflicting SCSI devices to use an UNUSED SCSI address (0-7). 
If this case fails, RESET your SCSI adapter(s). 

--------------------------------------------------------------------------------




Steps Required to Obtain a System Dump
If your CONSOLE device is still operational, perform the following steps:

sysdumpdev -l This will determine which device has been assigned as the primary and secondary dump devices 
sysdumpstart -p (initiate dump to primary device) 
sysdumpstart -s (initiate dump to secondary device) 
sysdumpdev -z (indicates if a NEW dump exists) 
sysdumpdev -L (indicates info about a previous dump) 
Press keyboard sequence: CTRL-ALT-NUMPAD1 (for primary device) 
Press keyboard sequence: CTRL-ALT-NUMPAD2 (for secondary device)

Insert a tape in the tape device you wish to dump the kernel data to /usr/sbin/snap -gfkD -o /dev/rmt0

If your system is hung, the user MUST initiate or force a dump of the kernel data via the following:

Turn the Key Mode Switch to the SERVICE position 
Press the RESET button 



Other remarks AIX 5.x bootprocess:
======================================


ROS IPL (Read Only Storage Initial Program Load). This phase includes a power-on selftest, the location
of the bootdevice, and loading of the boot kernel into memory.

At boottime,once the POST is completed, the system will search the boot list for a
bootable image. The system will attempt to boot from the first entry in the bootlist.
Pressing the F5 key (or 5) during boot, will invoke the service bootlist, which includes
the CDROM.

  Note: If you want to install AIX on a machine, insert the product media, start the machine,
        press the F5 key (or 5) to let it boot from CD, then press 1 (graphic display) or
        2 (ascii terminal) to define your terminal as the Console

In normal operation of AIX, to view the normal boot list, use
# bootlist -m normal -o

fd0
cd0
hdisk0

The bootlist can be changed using the same command, for example
# bootlist -m normal hdisk0 cd0

To see or trace the bootprocess, use the alog command.

Because no console is available during the bootphase, the boot messages are collected
in a special file, which by default is /var/adm/ras/bootlog.

To view the boot log, use
# alog -o -t boot

To record the current date and time in alog file named /tmp/mylog, enter
# date | alog -f /tmp/mylog

To see the list the logs defined in the alog database, run
# alog -L

AIX uses the default runlevel 2. This is the normal multi-user mode.
Runlevels 0,1 are reserved, 2 is normal, and 3-9 are configurable by the Administrator.

At a certain stage, /etc/init is started, and invokes
/sbin/rc.boot 3, and runs the entries in /etc/inittab.


Example of an AIX /etc/inittab file:
------------------------------------

init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
load64bit:2:wait:/etc/methods/cfg64 >/dev/console 2>&1 # Enable 64-bit execs
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
nimclient:2:once:/usr/sbin/nimclient -S running
piobe:2:wait:/usr/lib/lpd/pio/etc/pioinit >/dev/null 2>&1  # pb cleanup
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability daemon
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6
l7:7:wait:/etc/rc.d/rc 7
l8:8:wait:/etc/rc.d/rc 8
l9:9:wait:/etc/rc.d/rc 9
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
itess:23456789:once:/usr/IMNSearch/bin/itess -start search >/dev/null 2>&1
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
httpdlite:23456789:once:/usr/IMNSearch/httpdlite/httpdlite -r /etc/IMNSearch/httpdlite/httpdlite.conf & >/dev/console 2>&1
ha_star:h2:once:/etc/rc.ha_star >/dev/console 2>&1
dt_nogb:2:wait:/etc/rc.dt
cons:0123456789:respawn:/usr/sbin/getty /dev/console
srv:2:wait:/usr/bin/startsrc -s sddsrv > /dev/null 2>&1
perfstat:2:once:/usr/lib/perf/libperfstat_updt_dictionary >/dev/console 2>&1
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
lsof:2:once:/usr/lpp/aix4pub/lsof/mklink
monitor:2:once:/usr/lpp/aix4pub/monitor/mklink
nmon:2:once:/usr/lpp/aix4pub/nmon/mklink
ptxnameserv:2:respawn:/usr/java14/jre/bin/tnameserv -ORBInitialPort 2279 2>&1 >/dev/null # Start jtopasServer
ptxfeed:2:respawn:/usr/perfagent/codebase/jtopasServer/feed 2>&1 >/dev/null # Start jtopasServer
ptxtrend:2:once:/usr/bin/xmtrend -f /etc/perf/jtopas.cf -d /etc/perf/Top -n jtopas 2>&1 >/dev/null # Start trend
direct:2:once:/tmp/script_execute_after_reboot_pSeries 2>>/tmp/pSeries.050527_16:56.log
fmc:2:respawn:/usr/opt/db2_08_01/bin/db2fmcd #DB2 Fault Monitor Coordinator
smmonitor:2:wait:/usr/sbin/SMmonitor start > /dev/console 2>&1 # start SMmonitor daemon


The inittab is reread by the init daemon every 60 secs. 
To add records into the inittab file, you should use the mkitab command. 
For example, to add an entry for tty4, enter the following command:

# mkitab "tty4:2:respawn:/usr/sbin/getty /dev/tty4"


Other observations:
-------------------

{dbserver2:root}/etc/rc.d -> cd /etc/rc.d
{dbserver2:root}/etc/rc.d -> ls -al
total 40
drwxr-xr-x  11 root     system         4096 Oct 08 2002  .
drwxr-xr-x  30 root     system        12288 Aug 08 11:41 ..
drwxr-xr-x   2 root     system          256 May 27 16:56 init.d
-r-xr--r--   1 root     system         1586 Sep 16 2002  rc
drwxr-xr-x   2 root     system          256 May 27 17:00 rc2.d
drwxr-xr-x   2 root     system          256 Oct 08 2002  rc3.d
drwxr-xr-x   2 root     system          256 Oct 08 2002  rc4.d
drwxr-xr-x   2 root     system          256 Oct 08 2002  rc5.d
drwxr-xr-x   2 root     system          256 Oct 08 2002  rc6.d
drwxr-xr-x   2 root     system          256 Oct 08 2002  rc7.d
drwxr-xr-x   2 root     system          256 Oct 08 2002  rc8.d
drwxr-xr-x   2 root     system          256 Oct 08 2002  rc9.d


The bootlist command:
---------------------

Purpose
Displays and alters the list of boot devices available to the system.

Syntax
bootlist [ { -m Mode } [ -r ] [  -o  ] [ [  -i ] [ -V ] [ -F ]| [ [ -f File ] [  Device [ Attr=Value ... ] ... ] ] ] [ -v ]


The bootlist command allows the user to display and alter the list of possible boot devices from which 
the system may be booted. When the system is booted, it will scan the devices in the list and attempt to 
boot from the first device it finds containing a boot image. 

The AIX "bootlist" command can be used to select the boot disk. This is useful if you want to test 
different AIX levels on the same system. 

For example, assume hdisk0 has AIX 4.2.1 installed and hdisk1 AIX 4.3.3 installed. Use one of the following "bootlist" 
commands** to select which version will come up on the next reboot: 

bootlist -m normal hdisk0      # Reboots to AIX421
bootlist -m normal hdisk1      # Reboots to AIX433 

The second disk can be installed from CD, a "mksysb" tape, or using AIX 4.3's "alt_disk_install" capability. 
Both CD and mksysb installs require downtime. The "alt_disk_install" allows you to install the second disk from 
a "mksysb" or clone your existing OS while the system is running 

** Comment: In practice, I recommend the following "bootlist" syntax which specifies that if hdisk0 fails to boot, 
try booting from hdisk1, then tape, and finally CD ROM. 

bootlist -m normal hdisk0 hdisk1 rmt cd 


The bosboot command:
--------------------

Purpose
Creates boot image.

Syntax
For General Use:
bosboot -Action [ -d Device ] [ -Options ... ]

To Create a Device Boot Image:
bosboot -a [ -d Device ] [ -p Proto ] [ -k Kernel ] [ -I | -D ] [ -l LVdev ] [ -L] [ -M { Norm | Serv | Both } ] [ -T Type ] [ -b FileName ] [ -q ]

Description
The bosboot command creates the boot image that interfaces with the machine boot ROS (Read-Only Storage) 
EPROM (Erasable Programmable Read-Only Memory).

The bosboot command creates a boot file (boot image) from a RAM (Random Access Memory) disk file system and a kernel. 
This boot image is transferred to a particular media that the ROS boot code recognizes. 
When the machine is powered on or rebooted, the ROS boot code loads the boot image from the media into memory. 
ROS then transfers control to the loaded images kernel.


Examples

- To make a disk bootable, or recreate the boot image, use:
# bosboot -a -d /dev/hdiskn

- To create a boot image on the default boot logical volume on the fixed disk from which the system is booted, enter: 
bosboot -a

- To create a bootable image called /tmp/tape.bootimage for a tape device, enter: 
bosboot -ad /dev/rmt0 -b /tmp/tape.bootimage

- To copy a given tape boot image to a tape device, enter: 
bosboot -w /tmp/tape.bootimage -d rmt0

- To create a boot image file for an Ethernet boot, enter: 
bosboot -ad /dev/ent0 -M both

- When you have migrated a disk like disk0 to disk1, and you need to make the second disk bootable,
proceed as follows:

bosboot -a -d /dev/DestinationDiskNumber  # bosboot -ad  /dev/hdiskxx

Then:
bootlist -m normal DestinationDiskNumber

Then:
mkboot -c -d /dev/SourceDiskNumber



3.2.5 Bootprocedure Linux:
==========================

Note 1: Redhat system
---------------------

1. on a x86 system, the BIOS loads.
2. BOS loads the MBR of the first (primary) disk

Once loaded, the BIOS tests the system, looks for and checks peripherals and then locates a valid device 
with which to boot the system. Usually, it first checks any floppy drives and CD-ROM drives present for 
bootable media, then it looks to the system's hard drives. The order of the drives searched for booting 
can often be controlled with a setting in BIOS. Often, the first hard drive set to boot is the C drive or 
the master IDE device on the primary IDE bus. The BIOS loads whatever program is residing in the first sector 
of this device, called the Master Boot Record or MBR, into memory. The MBR is only 512 bytes in size and 
contains machine code instructions for booting the machine along with the partition table. Once found and loaded 
the BIOS passes control whatever program (the bootloader) is on the MBR. 

3. bootloader in MBR

Linux boot loaders for the x86 platform are broken into at least two stages. The first stage is a small 
machine code binary on the MBR. Its sole job is to locate the second stage boot loader and load the first part 
of it into memory. Under Red Hat Linux you can install one of two boot loaders: GRUB or LILO. 
GRUB is the default boot loader, but LILO is available for those who require it for their hardware setup 
or who prefer it.  

> If you are using LILO under Red Hat Linux, the second stage boot loader uses information on the MBR 
  to determine what boot options are available to the user. This means that any time a configuration change 
  is made or you upgrade your kernel manually, you must run the /sbin/lilo -v -v command to write the appropriate 
  information to the MBR. For details on doing this, see the Section called LILO in Chapter 4. 

> GRUB, on the other hand, can read ext2 partitions and therefore simply loads its configuration file 
  - /boot/grub/grub.conf - when the second stage loader is called. 

Once the second stage boot loader is in memory, it presents the user with the Red Hat Linux initial, 
graphical screen showing the different operating systems or kernels it has been configured to boot. 
If you have only Red Hat Linux installed and have not changed anything in the 

/etc/lilo.conf or /boot/grub/grub.conf, 

you will only see one option for booting. 
If you have configured the boot loader to boot other operating systems, this screen gives you the opportunity 
to select it. Use the arrow keys to highlight the operating system and press [Enter]. If you do nothing, 
the boot loader will load the default selection. 

4. Kernel

Once the second stage boot loader has determined which kernel to boot, it locates the corresponding 
kernel binary in the /boot/ directory. The proper binary is the /boot/vmlinuz-2.4.x-xx file that corresponds 
to the boot loader's settings. Next the boot loader places the appropriate initial RAM disk image, 
called an initrd, into memory. The initrd is used by the kernel to load any drivers not compiled into it 
that are necessary to boot the system. This is particularly important if you have SCSI hard drives or 
are using the ext3 file system [1]. 

When the kernel loads, it immediately initializes and configures the computer's memory. 
Next it configures the various hardware attached to the system, including all processors and I/O subsystems, 
as well as any storage devices. It then looks for the compressed initrd image in a predetermined location 
in memory, decompresses it, mounts it, and loads all necessary drivers. Next it initializes file system-related 
virtual devices, such as LVM or software RAID before unmounting the initrd disk image and freeing up all 
the memory it once occupied. 

After the kernel has initialized all the devices on the system, it creates a root device, mounts the root partition 
read-only, and frees unused memory. 

At this point, with the kernel loaded into memory and operational. However, with no user applications to give 
the user the ability to provide meaningful input to the system, not much can be done with it. 

To set up the user environment, the kernel starts the /sbin/init command.
 
5. init

The init program coordinates the rest of the boot process and configures the environment for the user. 
When the init command starts, it becomes the parent or grandparent of all of the processes that start up 
automatically on a Red Hat Linux system. First, it runs the /etc/rc.d/rc.sysinit script, which sets 
your environment path, starts swap, checks the file systems, and so on. Basically, rc.sysinit takes care of 
everything that your system needs to have done at system initialization. For example, most systems use a clock, 
so on them rc.sysinit reads the /etc/sysconfig/clock configuration file to initialize the clock. 
Another example is if you have special serial port processes which must be initialized, rc.sysinit will 
execute the /etc/rc.serial file. 

This is what init runs:

/sbin/init
          -> runs /etc/rc.d/rc.sysinit
          -> runs /etc/inittab
          -> inittab contains default runlevel: init runs all processes for that runlevel /etc/rc.d/rcN.d/ , 
          -> runs /etc/rc.d/rc.local

Below is an example listing for a runlevel 5, /etc/rc.d/rc5.d/ directory: 

K01pppoe -> ../init.d/pppoe
K05innd -> ../init.d/innd
K10ntpd -> ../init.d/ntpd
K15httpd -> ../init.d/httpd
K15mysqld -> ../init.d/mysqld
K15pvmd -> ../init.d/pvmd
K16rarpd -> ../init.d/rarpd
K20bootparamd -> ../init.d/bootparamd
K20nfs -> ../init.d/nfs
..
..
K80nscd -> ../init.d/nscd
K84ypserv -> ../init.d/ypserv
K90ups -> ../init.d/ups
K96irda -> ../init.d/irda
S05kudzu -> ../init.d/kudzu
S06reconfig -> ../init.d/reconfig
S08ipchains -> ../init.d/ipchains
S10network -> ../init.d/network
S12syslog -> ../init.d/syslog
..
etc..

As you can see, none of the scripts that actually start and stop the services are located in the 
/etc/rc.d/rc5.d/ directory. Rather, all of the files in /etc/rc.d/rc5.d/ are symbolic links pointing 
to scripts located in the /etc/rc.d/init.d/ directory. Symbolic links are used in each of the rc directories 
so that the runlevels can be reconfigured by creating, modifying, and deleting the symbolic links without 
affecting the actual scripts they reference. 

As usual, the K* scripts are kill/stop scripts, and the S* scripts are started in sequence by number.

In runlevel 5, /etc/inittab runs a script called /etc/X11/prefdm. The prefdm script runs the preferred 
X display manager, gdm if you are running GNOME or kdm if you are running KDE, based on the contents 
of the /etc/sysconfig/desktop/ directory. 

The last thing the init program does is run any scripts located in /etc/rc.d/rc.local. 
At this point, the system is considered to be operating at runlevel 5. 
You can use this file to add additional commands necessary for your environment. For instance, you can start 
additional daemons or initialize a printer. 

- Differences in the Boot Process of Other Architectures
Once the Red Hat Linux kernel loads and hands off the boot process to the init command, 
the same sequence of events occurs on every architecture. So the main difference between each architecture's 
boot process is in the application used to find and load the kernel. 

For example, the Alpha architecture uses the aboot boot loader, while the Itanium architecture uses 
the ELILO boot loader. 

- Runlevels
SysV Init
The SysV init is a standard process used by Red Hat Linux to control which software the init command 
launches or shuts off on a given runlevel. SysV init chosen because it is easier to use and more flexible 
than the traditional BSD style init process. 

The configuration files for SysV init are in the /etc/rc.d/ directory. Within this directory, 
are the rc, rc.local, and rc.sysinit scripts as well as the following directories: 

init.d
rc0.d
rc1.d
rc2.d
rc3.d
rc4.d
rc5.d
rc6.d
 

The init.d directory contains the scripts used by the init command when controlling services. 
Each of the numbered directories represent the six default runlevels configured by default under Red Hat Linux. 

The default runlevel is listed in /etc/inittab. To find out the default runlevel for your system, 
look for the line similar to the one below near the top of /etc/inittab: 

id:3:initdefault:
 
Generally, Red Hat Linux operates in runlevel 3 or runlevel 5 - both full multi-user modes. 
The following runlevels are defined in Red Hat Linux: 

0 - Halt 
1 - Single-user mode 
2 - Not used (user-definable) 
3 - Full multi-user mode 
4 - Not used (user-definable) 
5 - Full multi-user mode (with an X-based login screen) 
6 - Reboot 

If you are using LILO, you can enter single-user mode by typing "linux single" at the LILO boot: prompt. 

If you are using GRUB as your boot loader, you can enter single-user mode using the following steps. 
- In the graphical GRUB boot loader screen, select the Red Hat Linux boot label and press [e] to edit it.
- Arrow down to the kernel line and press [e] to edit it.
- At the prompt, type single and press [Enter].
- You will be returned to the GRUB screen with the kernel information. Press the [b] key to boot the system 
  into single user mode.


In case of boot problems, like a corrupt /etc/inittab file, you might try the following:

Boot by typing linux init=/bin/bash at the LILO boot: prompt. 
This places you at a shell prompt; note that no file systems other than the root file system are mounted, 
and the root file system is mounted in read-only mode. To mount it in read-write mode 
(to allow editing of a broken /etc/inittab, for example) do: 

mount -n /proc
mount -o rw,remount /

 
- Installing GRUB:

Once the GRUB rpm package is installed, open a root shell prompt and run the command 
/sbin/grub-install <location>, 

where <location> is the location GRUB Stage 1 boot loader should be installed. 

The following command installs GRUB to the MBR of the master IDE device on the primary IDE bus, 
alos known as the C drive: 

/sbin/grub-install /dev/hda

 
- GRUB and bootpaths:

In Linux entire harddisks are listed as devices without numbers, such as "/dev/hda" (IDE) or "/dev/sda" (SCSI).
Partitions on a disk are referred to with a number such as "/dev/hda1".

GRUB uses something different.

Device Names in GRUB:
The first hard drive of a system is called (hd0) by GRUB. 
The first partition on that drive is called (hd0,0), and the fifth partition on the second hard drive 
is called (hd1,4). In general, the naming convention for file systems when using GRUB breaks down in this way: 

(<type-of-device><bios-device-number>,<partition-number>)
 
The parentheses and comma are very important to the device naming conventions. The <type-of-device> refers 
to whether a hard disk (hd) or floppy disk (fd) is being specified. 
The <bios-device-number> is the number of the device according to the system's BIOS, starting with 0. 
The primary IDE hard drive is numbered 0, while the secondary IDE hard drive is numbered 1. 
The ordering is roughly equivalent to the way the Linux kernel arranges the devices by letters, 
where the a in hda relates to 0, the b in hdb relates to 1, and so on. 

File Names
When typing commands to GRUB involving a file, such as a menu list to use when allowing the booting 
of multiple operating systems, it is necessary to include the file immediately after specifying 
the device and partition. A sample file specification to an absolute filename is organized as follows: 

(<type-of-device><bios-device-number>,<partition-number>)/path/to/file, for example, (hd0,0)/grub/grub.conf. 

- Example grub.conf:

default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz

# section to load linux
title Red Hat Linux (2.4.18-5.47)
        root (hd0,0)
        kernel /vmlinuz-2.4.18-5.47 ro root=/dev/sda2
        initrd /initrd-2.4.18-5.47.img

# section to load Windows 2000
title windows
        rootnoverify (hd0,0)
        chainloader +1
 

This file would tell GRUB to build a menu with Red Hat Linux as the default operating system, set to autoboot 
it after 10 seconds. Two sections are given, one for each operating system entry, with commands specific 
to this system's disk partition table. 

- Example lilo.conf:

The file /etc/lilo.conf is used by lilo to determine which operating system or kernel to start, as well as 
to know where to install itself (for example, /dev/hda for the first MBR of the first IDE hard drive). 
A sample /etc/lilo.conf file looks like this (your /etc/lilo.conf may look a little different): 

boot=/dev/hda
map=/boot/map
install=/boot/boot.b
prompt
timeout=50
message=/boot/message
lba32
default=linux

image=/boot/vmlinuz-2.4.0-0.43.6
	label=linux
	initrd=/boot/initrd-2.4.0-0.43.6.img
	read-only
	root=/dev/hda5

other=/dev/hda1
	label=dos
 


- Creating a bootdiskette in Redhat:

Change to the directory that contains the image file. That might be on the original CD of Redhat.
then use the following command:

# dd if=boot.img of=/dev/fd0 bs=1440k 

 


3.2.6 Bootprocedure HP-UX 11.x:
===============================


3.2.6.1 Shutdown the system:
----------------------------


Note 1: Shutdown HP-UX
----------------------

System Shutdown
To shut down HP-UX for power-off, you can do any of the following: 
# init 0
# shutdown -h -y now

To shut down and reboot HP-UX: 
# reboot
# shutdown -r -y now

To shut down HP-UX to single-user mode: 
# init S
# shutdown -y now
# shutdown 0

The -h option to the shutdown command halts the system completely but will prompt you for a message to issue users. 
The -y option completes the shutdown without asking you any of the questions it would normally ask. 


Note 2: Shutdown HP-UX:
-----------------------

When HP-UX is running on an nPartition, you can shut down HP-UX using the shutdown command.

On nPartitions you have the following options when shutting down HP-UX:

To shut down HP-UX and reboot an nPartition: shutdown -r

On nPartition-capable HP Integrity servers, the shutdown -r command is equivalent to the shutdown -R command.

To shut down HP-UX and halt an nPartition: shutdown -h

On nPartition-capable HP Integrity servers, the shutdown -h command is equivalent to the shutdown -R -H command.

To perform a reboot for reconfig of an nPartition: shutdown -R

To hold an nPartition at a shutdown for reconfig state: shutdown -R -H


Note 3: Shutdown HP-UX:
-----------------------

Shutting down 
/sbin/shutdown -r -y now         Reboot
/sbin/shutdown -h -y now         Stop system
/sbin/shutdown -y now            Single user mode


Note 4: Shutdown HP-UX:
-----------------------

To reboot HP-UX use command
# reboot

To shutdown HP-UX in 120 seconds (2 minutes) use command
# shutdown -hy 120

To shutdown to single user mode use command
# shutdown -y 0

To shutdown down a V-Class server use command
# cd /
# shutdown

When you are are at root prompt (from single user mode restart) type following command:

# reboot -h



3.2.6.1 Booting HP-UX:
----------------------


Note 1:
-------

PDC -> ISL -> hpux -> kernel -> init

PDC 
HP-UX systems come with firmware installed called Processor Dependent Code. After the system is powered on 
or the processor is RESET, the PDC runs self-test operations and initializes the processor. PDC also identifies 
the console path so it can provide messages and accept input. PDC would then begin the "autoboot" process 
unless you interrupt it during the 10-second interval that is supplied. If you interrupt the "autoboot" process, 
you can issue a variety of commands. The interface to PDC commands is called the Boot Console Handler (BCH). 
This is sometimes a point of confusion; that is, are we issuing PDC commands or BCH commands? 
The commands are normally described as PDC commands, and the interface through which you execute them is the BCH. 

ISL 
The Initial System Loader is run after PDC. You would normally just run an "autoboot" sequence from ISL; 
however, you can run a number of commands from the ISL prompt. 

hpux 
The hpux utility manages loading the HP-UX kernel and gives control to the kernel. ISL can have hpux run 
an "autoexecute" file, or commands can be given interactively. In most situations, you would just want to 
automatically boot the system; however, I cover some of the hpux commands you can execute. This is sometimes 
called the Secondary System Loader (SSL). 


Note 2:
-------

HP-UX
Normal Boot

The bootstrap process involves the execution of three software components: 

pdc 
isl 
hpux 

- pdc

Automatic boot processes on various HP-UX systems follow similar general sequences. 
When power is applied to the HP-UX system processor, or the system Reset button is pressed, 
the firmware processor-dependent code (pdc) is executed to verify hardware and general system integrity. 
After checking the hardware, pdc gives the user the option to override the autoboot sequence by pressing 
the Esc key. A message resembling the following usually appears on the console. 

     (c) Copyright. Hewlett-Packard Company. 1994.
     All rights reserved.

     PDC ROM rev. 130.0
     32 MB of memory configured and tested.

     Selecting a system to boot.
     To stop selection process, press and hold the ESCAPE key...


If no keyboard activity is detected, pdc commences the autoboot sequence by loading isl and transferring control to it. 

- isl

The initial system loader (isl) implements the operating-system-independent portion of the bootstrap process. 
It is loaded and executed after self-test and initialization have completed successfully. Typically, when control 
is transferred to isl, an autoboot sequence takes place. An autoboot sequence allows a complete bootstrap 
operation to occur with no intervention from an operator. While an autoboot sequence occurs, isl finds and 
executes the autoexecute file which requests that hpux be run with appropriate arguments. Messages similar 
to the following are displayed by isl on the console: 

     Booting from: scsi.6  HP 2213A
     Hard booted.
     ISL Revision A.00.09  March 27, 1990
     ISL booting  hpux boot disk(;0)/stand/vmunix

- hpux

hpux, the secondary system loader, then announces the operation it is performing, in this case the boot operation, 
the device file from which the load image comes, and the TEXT size, DATA size, BSS size, and start address 
of the load image, as shown below, before control is passed to the image. 

    Booting disk(scsi.6;0)/stand/vmunix
    966616+397312+409688 start 0x6c50


Finally, the loaded image displays numerous configuration and status messages, and passes control to 
the init process. 


- Single-user Boot

A single-user boot in HP-UX is sometimes referred to as an interactive boot or attended mode boot. Pressing the 
Escape key at the boot banner on an older Series 700 workstation halts the automatic boot sequence, puts you into 
attended mode, and displays the Boot Console User Interface main menu, a sample of which is below. 

   Selecting a system to boot.
   To stop selection process, press and hold the ESCAPE key.

   Selection process stopped.

   Searching for Potential Boot Devices.
   To terminate search, press and hold the ESCAPE key.

   Device Selection    Device Path             Device Type
   -------------------------------------------------------------
   P0                  scsi.6.0                QUANTUM PD210S
   P1                  scsi.1.0                HP      2213A
   P2                  lan.ffffff-ffffff.f.f   hpfoobar

   b) Boot from specified device
   s) Search for bootable devices
   a) Enter Boot Administration mode
   x) Exit and continue boot sequence

      Select from menu:

In this case the system automatically searches the SCSI, LAN, and EISA interfaces for all potential boot devices
-devices for which boot I/O code (IODC) exists. The key to booting to single-user mode is first to boot to ISL 
using the b) option. The ISL is the program that actually controls the loading of the operating system. 
To do this using the above as an example, you would type the following at the Select from menu: prompt: 

Select from menu: b p0 isl


This tells the system to boot to the ISL using the SCSI drive at address 6 (since the device path of P0 is scsi.6.0). 
After displaying a few messages, the system then produces the ISL> prompt. 
Pressing the Escape key at the boot banner on newer Series 700 machines produces the Boot Administration Utility, 
as shown below. 

   Command                            Description
   -------                            -----------
   Auto [boot|search] [on|off]        Display or set auto flag
   Boot [pri|alt|scsi.addr][isl]      Boot from primary, alt or SCSI
   Boot lan[.lan_addr][install][isl]  Boot from LAN
   Chassis [on|off]                   Enable chassis code
   Diagnostic [on|off]                Enable/disable diag boot mode
   Fastboot [on|off]                  Display or set fast boot flag
   Help                               Display the command menu
   Information                        Display system information
   LanAddress                         Display LAN station addresses
   Monitor [type]                     Select monitor type
   Path [pri|alt] [lan.id|SCSI.addr]  Change boot path
   Pim [hpmc|toc|lpmc]                Display PIM info
   Search [ipl] [scsi|lan [install]]  Display potential boot devices
   Secure [on|off]                    Display or set security mode
   -----------------------------------------------------------------
   BOOT_ADMIN>


To display bootable devices with this menu you have to execute the Search command at the BOOT_ADMIN> prompt: 

BOOT_ADMIN> search
Searching for potential boot device.
This may take several minutes.

To discontinue, press ESCAPE.

   Device Path      Device Type
   --------------   ---------------
   scsi.6.0         HP C2247
   scsi.3.0         HP HP35450A
   scsi.2.0         Toshiba CD-ROM

BOOT_ADMIN>


To boot to ISL from the disk at device path scsi.6.0 type the following: 

BOOT_ADMIN>boot scsi.6.0 isl

Once you get the ISL prompt you can run the hpux utility to boot the kernel to single-user mode: 

ISL>hpux -is

   Note: the following can also be used; ISL>hpux -is -lq (;0)/stand/vmunix

This essentially tells hpux to load the kernel (/stand/vmunix) into single-user mode (-is) off the SCSI disk drive 
containing the kernel. The -is option says to pass the string s to the init process (i), and the command init s 
puts the system in single-user mode. In fact, you will see something similar to the following after typing the 
above command: 

Boot
: disk(scsi.6;0)/stand/vmunix
966616+397312+409688 start 0x6c50

   Kernel Startup Messages Omitted

INIT: Overriding default level with level 's'

INIT: SINGLE USER MODE
WARNING:  YOU ARE SUPERUSER!!
#

- Startup

Beginning with HP-UX 10 /etc/inittab calls /sbin/rc, which in turn calls execution scripts to start subsystems. 
This approach follows the OSF/1 industry standard and has been adopted by Sun, SGI, and other vendors. 
There are four components to this method of startup and shutdown: /sbin/rc, execution scripts, 
configuration variable scripts, and link files. 

/sbin/rc
This script invokes execution scripts based on run levels. It is also known as the startup and shutdown 
sequencer script. 

Execution scripts
These scripts start up and shut down various subsystems and are found in the /sbin/init.d directory. 
/sbin/rc invokes each execution script with one of four arguments, indicating the "mode": 

"start"		Bring the subsystem up 
"start_msg"	Report what the start action will do 
"stop"		Bring the subsystem down 
"stop_msg"	Report what the stop action will do 

These scripts are designed never to be modified. Instead, they are customized by sourcing in configuration files 
found in the /etc/rc.config.d directory. These configuration files contain variables that you can set. 
For example, in the configuration file /etc/rc.config.d/netconf you can specify routing tables by setting 
variables like these: 

ROUTE_DESTINATION[0]="default"
ROUTE_GATEWAY[0]="gateway_address"
ROUTE_COUNT[0]="1"


The execution script /sbin/init.d/net sources these and other network-related variables when it runs upon 
system startup. More on configuration files is described below. 
Upon startup a checklist similar to the one below will appear based upon the exit value of each of 
the execution scripts. 

HP-UX Startup in progress
-----------------------------------
Mount file systems..............................[ OK ]
Setting hostname................................[ OK ]
Set privilege group.............................[ OK ]
Display date...................................[FAIL]*
Enable auxiliary swap space....................[ N/A ]
Start syncer daemon.............................[ OK ]
Configure LAN interfaces........................[ OK ]
Start Software Distributor agent daemo..........[ OK ]


The execution scripts have the following exit values: 
0 Script exited without error. This causes the status OK to appear in the checklist. 
1 Script encountered errors. This causes the status FAIL to appear in the checklist. 
2 Script was skipped due to overriding control variables from /etc/rc.config.d files or for other reasons, 
  and did not actually do anything. This causes the status N/A to appear in the checklist. 
3 Script executed normally and requires an immediate system reboot for the changes to take effect. 
  (NOTE: Reserved for key system components). 

Configuration variable scripts
Configuration variable scripts are designed to customize the execution scripts. This goal here is to separate 
startup files from configuration files so that upgrading your system does not overwrite its configuration. These scripts are written for the POSIX shell (/usr/bin/sh or /sbin/sh), and not the Bourne shell, ksh, or csh. In some cases, these files must also be read, and possibly modified by other scripts or the SAM program. For this reason, each variable definition must appear on a separate line, in the syntax: 
variable=value
No trailing comments may appear on a variable definition line. Comment statements must be on separate lines, 
with the "#" comment character in column 1. An example of the required syntax for configuration files is given below: 

# Cron configuration. See cron(1m)
#
# CRON: Set to 1 to start cron daemon
#
CRON=1


Both the execution scripts and the configuration files are named after the subsystem they control. For example, 
the /sbin/init.d/cron execution script controls the cron daemon, and it is customized by the /etc/rc.config.d/cron 
configuration variable script. 

Link Files
These files control the order in which execution scripts run. The /sbin/rc#.d (where # is a run-level) directories 
are startup and shutdown sequencer directories. They contain only symbolic links to the execution scripts in 
/sbin/init.d that are executed by /sbin/rc on transition to a specific run level. For example, the /sbin/rc3.d 
directory contains symbolic links to scripts that are executed when entering run level 3. 
These directories contain two types of link files: start links and kill links. Start links have names beginning 
with the capital letter S and are invoked with the start argument at system boot time or on transition to a higher 
run level. Kill links have names beginning with the capital letter K and are invoked with the stop argument 
at system shutdown time, or when moving to a lower run level. 

Further, all link files in a sequencer directory are numbered to ensure a particular execution sequence. 
Each script has, as part of its name, a three-digit sequence number. This, in combination with the start and kill 
notation, provides all the information necessary to properly start up and shut down a system. 

The table below shows some samples from the run-level directories. (The sequence numbers shown are only for example 
and may not accurately represent your system.) 

/sbin/rc0.d /sbin/rc1.d /sbin/rc2.d /sbinrc3.d 
K480syncer S100hfsmount S340net S000nfs.server 
K800killall S320hostname S500inetd   
K900hfsmount S440savecore S540sendmail   
  S500swapstart S610rbootd   
  S520syncer S720lp   
    S730cron   
  K270cron     
  K280lp K900nfs.server   
  K390rbootd     
  K460sendmail     
  K500inetd     
  K660net     


Because each script in /sbin/init.d performs both the startup and shutdown functions, each will have two links 
pointing towards the script from /sbin/rc*.d; one for the start action and one for the stop action. 

Run Levels and /sbin/rc
In previous HP-UX releases, /etc/rc (now /sbin/rc) was run only once. Now it may run several times during the 
execution of a system, sequencing the execution scripts when moving between run levels. However, only the subsystems 
configured for execution, through configuration variables in /etc/rc.config.d, are started or stopped when 
transitioning the run levels. 
/sbin/rc sequences the startup and shutdown scripts in the appropriate sequencer directories in lexicographical order. 
Upon transition from a lower to a higher run level, the start scripts for the new run level and all intermediate 
levels between the old and new level are executed. Upon transition from a higher to a lower run level, the kill scripts 
for the new run level and all intermediate levels between the old and new level are executed. 

When a system is booted to a particular run level, it will execute startup scripts for all run levels up to and 
including the specified level (except run level 0). For example, if booting to run level 4, /sbin/rc looks at the 
old run level (S) and the new run level (4) and executes all start scripts in states 1, 2, 3, and 4. 
Within each level, the start scripts are sorted lexicographically and executed in that order. Each level is sorted 
and executed separately to ensure that the lower level subsystems are started before the higher level subsystems. 

Consequently, when shutting down a system, the reverse takes place. The kill scripts are executed in lexicographical 
order starting at the highest run level and working down, as to stop the subsystems in the reverse order they 
were started. As mentioned earlier, the numbering is reversed from the startup order. 

Example
If you want cron to start when entering run level 2, you would modify the configuration variable script 
/etc/rc.config.d/cron to read as follows: 

# cron config
#
# CRON=1 to start

CRON=1


This would be necessary because the execution script, /sbin/init.d/cron contains the following: 
# cron startup
#
. /etc/rc/config

if [ $CRON = 1 ]
   then /usr/sbin/cron
fi

cron will start at run level 2 because in /sbin/rc2.d a link exists from S730cron to /sbin/init.d/cron. 
/sbin/rc will invoke /sbin/init.d/cron with a start argument because the link name starts with an S. 

http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c01037913/c01037913.pdf

End Of File 


===========================================================================
4. Most important and current AIX, SOLARIS, and Linux fixes:
===========================================================================


4.1 AIX:
========


4.2 SOLARIS:
============


4.3 Linux Redhat:
=================
                  


=================
5. Oracle en UNIX:
=================

(Vanaf hier: Oude tekst. As from here, ignore all text, cause its too old. Its only interresting to Albert)


5.1 Installatie Oracle 8i:
--------------------------


5.1.1 Operating system dependencies:
------------------------------------

Bepaal eerst voor de te gebruiken Oracle versie, welke OS settings
en patches nodig zijn. 

Bijvoorbeeld, bij linux is glibc 2.1.3 nodig bij Oracle versie 8.1.7. 
Linux is erg kritisch m.b.t. de libraries in combinatie met Oracle.

Ook moet er mogelijk shmmax (max size of shared memory segment)
en dergelijke parameters worden aangepast.  

# sysctl -w kernel.shmmax=100000000
# sysctl -w fs.file-max=65536
# echo "kernel.shmmax = 100000000"  >> /etc/sysctl.conf
# echo "kernel.shmmax = 2147483648" >> /etc/sysctl.conf


   Opmerking: Het onderstaANDe is algemeen, maar is ook afgeleid van een Oracle 8.1.7
   installatie op Linux Redhat 6.2

   Als de 8.1.7 installatie gedaan wordt is ook nog de Java JDK 1.1.8 nodig.
   Deze kan gedownload worden van www.blackdown.org

   Download jdk-1.1.8_v3   jdk118_v3-glibc-2.1.3.tar.bz2 in /usr/local
   tar xvif jdk118_v3-glibc-2.1.3.tar.bz2
   ln -s /usr/local/jdk118_v3 /usr/local/java


5.1.2 Omgevingsvariablelen:
---------------------------

Zorg er voor dat de juiste oracle variabelen zijn gezet. 
Op ieder platform zijn dat minimaal:

ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE
(root voor oracle software)

ORACLE_HOME=$ORACLE_BASE/product/8.1.5; export ORACLE_HOME
(bepaald de directory waarin de instance software zich bevind)

ORACLE_SID=brdb; export ORACLE_SID
(bepaald de naam van de huidige instance)

ORACLE_TERM=xterm, vt100, ansi of wat ANDers; export ORACLE_TERM

ORA_NLSxx=$ORACLE_HOME/ocommon/nls/admin/data; export ORA_NLS
(bepaald de nls directory t.b.v. datafiles voor meerdere talen)

NLS_LANG="Dutch_The NetherlANDs.WE8ISO8859P1"; export NLS_LANG
Dit specificeert de language, territory en characterset t.b.v de client applicaties.

LD_LIBRARY_PATH=/u01/app/oracle/product/8.1.7/lib; export LD_LIBRARY_PATH

PATH=$ORACLE_HOME/bin:/bin:/user/bin:/usr/sbin:/bin; export PATH


plaats deze variabelen in de oracle user profile file:
.profile, of .bash_profile etc..


5.1.3 OFA directory structuur:
------------------------------

Hou je aan OFA. Een voorbeeld voor database PROD:

/u01/app/oracle/product/8.1.6

/u01/app/oracle/admin/PROD

/u01/app/oracle/admin/PROD/pfile
/u01/app/oracle/admin/PROD/adhoc
/u01/app/oracle/admin/PROD/bdump
/u01/app/oracle/admin/PROD/udump
/u01/app/oracle/admin/PROD/adump
/u01/app/oracle/admin/PROD/cdump
/u01/app/oracle/admin/PROD/create

/u02/oradata/PROD
/u03/oradata/PROD
/u04/oradata/PROD
etc..




5.1.4 Users en groups:
----------------------


Als je met OS verificatie wilt werken, moet in de init.ora gezet zijn:
remote_login_passwordfile=none (passwordfile authentication via exlusive)

Benodigde groups in UNIX: group dba. Deze moet voorkomen in de /etc/group file
vaak is ook nog nodig de group oinstall

groupadd dba
groupadd oinstall
groupadd oper

Maak nu user oracle aan:
adduser -g oinstall -G dba -d /home/oracle oracle


5.1.5 mount points en disks:
----------------------------

maak de mount points:

mkdir /opt/u01
mkdir /opt/u02
mkdir /opt/u03
mkdir /opt/u04  

dit moeten voor een produktie omgeving aparte schijven zijn

Geef nu ownership van deze mount points aan user oracle en group oinstall

chown -R oracle:oinstall /opt/u01
chown -R oracle:oinstall /opt/u02
chown -R oracle:oinstall /opt/u03
chown -R oracle:oinstall /opt/u04

directories: drwxr-xr-x  oracle  dba
files      : -rw-r-----  oracle  dba
           : -rw-r--r--  oracle  dba

chmod 644 *
chmod u+x filename
chmod ug+x filename


5.1.6 test van user oracle:
---------------------------


log in als user oracle en geef de commANDo's

$groups   laat de groups zien (oinstall, dba)
$umask   laat 022 zien, zoniet zet dan de line umask 022 in het .profile

umask is de default mode van een file of directory wanneer deze aangemaakt wordt.
rwxrwxrwx=777
rw-rw-rw-=666
rw-r--r--=644 welke correspondeert met umask 022

Verander nu het .profile of .bash_profile van de user oracle.
Plaats de environment variabelen van 9.1 in het profile.

log uit en in als user oracle, en test de environment:
%env
%echo $variablename


5.1.7 Oracle Installer bij 8.1.x op Linux:
------------------------------------------

Log in als user oracle. Draai nu oracle installer:

Linux:

  startx
  cd /usr/local/src/Oracle8iR3
  ./runInstaller

of

  Ga naar install/linux op de CD en run runIns.sh


Nu volgt een grafische setup. Beantwoord de vragen.

Het kan zijn dat de installer vraagt om scripts uit te voeren zoals:
orainstRoot.sh en root.sh
Om dit uit te voeren:

   open een nieuw window
   su root
   cd $ORACLE_HOME
   ./orainstRoot.sh



5.2 Automatische start oracle bij system boot:
----------------------------------------------


5.2.1 oratab:
-------------

Inhoud ORATAB in /etc of /var/opt:

Voorbeeld:

  #   $ORACLE_SID:$ORACLE_HOME:[N|Y]
  #
  ORCL:/u01/app/oracle/product/8.0.5:Y
  #


De oracle scripts om de database te starten en te stoppen zijn: $ORACLE_HOME/bin/dbstart en dbshut,
of startdb en stopdb of wat daarop lijkt.  Deze kijken in ORATAB om te zien welke databases
gestart moeten worden.


5.2.2 dbstart en dbshut:
------------------------

Het script dbstart zal oratab lezen en ook tests doen en om de oracle versie
te bepalen. Verder bestaat de kern uit:

  het starten van sqldba, svrmgrl of sqlplus
  vervolgens doen we een connect
  vervolgens geven we het startup commando.

Voor dbshut geldt een overeenkomstig verhaal.


5.2.3 init, sysinit, rc:
------------------------

Voor een automatische start, voeg nu de juiste entries toe in het /etc/rc2.d/S99dbstart 
(or equivalent) file: 

Tijdens het opstarten van Unix worden de scrips in de /etc/rc2.d uitgevoerd die beginnen met een 'S' 
en in alfabetische volgorde. 
De Oracle database processen zullen als (een van de) laatste processen worden gestart. 
Het bestAND S99oracle is gelinkt met deze directory.

Inhoud S99oracle:

  su - oracle -c "/path/to/$ORACLE_HOME/bin/dbstart"         # Start DB's
  su - oracle -c "/path/to/$ORACLE_HOME/bin/lsnrctl start"   # Start listener
  su - oracle -c "/path/tp/$ORACLE_HOME/bin/namesctl start"  # Start OraNames (optional)

Het dbstart script is een standaard Oracle script. Het kijkt in oratab welke sid's op 'Y' staan, 
en zal deze databases starten.

of customized via een customized startdb script:

  ORACLE_ADMIN=/opt/oracle/admin; export ORACLE_ADMIN

  su - oracle -c "$ORACLE_ADMIN/bin/startdb WPRD 1>$ORACLE_ADMIN/log/WPRD/startWPRD.$$ 2>&1"
  su - oracle -c "$ORACLE_ADMIN/bin/startdb WTST 1>$ORACLE_ADMIN/log/WTST/startWTST.$$ 2>&1"
  su - oracle -c "$ORACLE_ADMIN/bin/startdb WCUR 1>$ORACLE_ADMIN/log/WCUR/startWCUR.$$ 2>&1"



5.3 Het stoppen van Oracle in unix:
-----------------------------------


Tijdens het down brengen van Unix (shutdown -i 0) worden de scrips in de directory /etc/rc2.d 
uitgevoerd die beginnen met een 'K' en in alfabetische volgorde. 
De Oracle database processen zijn een van de eerste processen die worden afgesloten. 
Het bestand K10oracle is gelinkt met de /etc/rc2.d/K10oracle

# Configuration File: /opt/oracle/admin/bin/K10oracle


ORACLE_ADMIN=/opt/oracle/admin; export ORACLE_ADMIN

su - oracle -c "$ORACLE_ADMIN/bin/stopdb WPRD 1>$ORACLE_ADMIN/log/WPRD/stopWPRD.$$ 2>&1"
su - oracle -c "$ORACLE_ADMIN/bin/stopdb WCUR 1>$ORACLE_ADMIN/log/WCUR/stopWCUR.$$ 2>&1"
su - oracle -c "$ORACLE_ADMIN/bin/stopdb WTST 1>$ORACLE_ADMIN/log/WTST/stopWTST.$$ 2>&1"


5.4 startdb en stopdb:
----------------------

Startdb [ORACLE_SID]
--------------------

Dit script is een onderdeel van het script S99Oracle. Dit script heeft 1 parameter, ORACLE_SID

# Configuration File: /opt/oracle/admin/bin/startdb

# Algemene omgeving zetten

. $ORACLE_ADMIN/env/profile

ORACLE_SID=$1
echo $ORACLE_SID 

# Omgeving zetten RDBMS
. $ORACLE_ADMIN/env/$ORACLE_SID.env

# Het starten van de database
sqlplus /nolog << EOF
connect / as sysdba
startup
EOF

# Het starten van de listener
lsnrctl start $ORACLE_SID

# Het starten van de intelligent agent voor alle instances
#lsnrctl dbsnmp_start



Stopdb [ORACLE_SID]
-------------------

Dit script is een onderdeel van het script K10Oracle. Dit script heeft 1 parameter, ORACLE_SID

# Configuration File: /opt/oracle/admin/bin/stopdb

# Algemene omgeving zetten
. $ORACLE_ADMIN/env/profile

ORACLE_SID=$1
export $ORACLE_SID

# Settings van het RDBMS
. $ORACLE_ADMIN/env/$ORACLE_SID.env

# Het stoppen van de intelligent agent
#lsnrctl dbsnmp_stop

# Het stoppen van de listener
lsnrctl stop $ORACLE_SID

# Het stoppen van de database.
sqlplus /nolog << EOF
connect / as sysdba
shutdown immediate
EOF


5.5 Batches:
------------

De batches (jobs) worden gestart door het Unix proces cron

# Batches (Oracle)

# Configuration File: /var/spool/cron/crontabs/root
# Format of lines:
# min	hour	daymo	month	daywk	cmd
#
# Dayweek 0=sunday, 1=monday...
0        9        *       *       6  /sbin/sh /opt/oracle/admin/batches/bin/batches.sh  
>> /opt/oracle/admin/batches/log/batcheserroroutput.log 2>&1



# Configuration File: /opt/oracle/admin/batches/bin/batches.sh
# Door de op de commandline  ' BL_TRACE=T ; export BL_TRACE ' worden alle commando's getoond.
case $BL_TRACE in
    T)	set -x ;;
esac

ORACLE_ADMIN=/opt/oracle/admin; export ORACLE_ADMIN
ORACLE_HOME=/opt/oracle/product/8.1.6; export ORACLE_HOME

ORACLE_SID=WCUR ; export ORACLE_SID
su - oracle -c ". $ORACLE_ADMIN/env/profile ; . $ORACLE_ADMIN/env/$ORACLE_SID.env; 
cd $ORACLE_ADMIN/batches/bin; sqlplus /NOLOG @$ORACLE_ADMIN/batches/bin/Analyse_WILLOW2K.sql 1>
$ORACLE_ADMIN/batches/log/batches$ORACLE_SID.`date +"%y%m%d"` 2>&1"

ORACLE_SID=WCON ; export ORACLE_SID
su - oracle -c ". $ORACLE_ADMIN/env/profile ; . $ORACLE_ADMIN/env/$ORACLE_SID.env; 
cd $ORACLE_ADMIN/batches/bin; sqlplus /NOLOG @$ORACLE_ADMIN/batches/bin/Analyse_WILLOW2K.sql 1>
$ORACLE_ADMIN/batches/log/batches$ORACLE_SID.`date +"%y%m%d"` 2>&1"






=======================
7. INSTALLING SUNOS:
=======================

Installing Sun Solaris 2.8 

--------------------------------------------------------------------------------

Contents 

Overview 
Using Serial Console Connection 
Starting the Installation 
Answering the Screen Prompts 
Post-Installation Tasks 


--------------------------------------------------------------------------------


Overview 


This article documents installing the 2/02 release of Solaris 8 from CD-ROM. 
For the purpose of this example, I will be installing Solaris 8 on a Sun Blade 150 with the following configuration: 

Sun Blade 150 (UltraSPARC-IIe 650MHz), No Keyboard, OpenBoot 4.6 
1,792 MB RAM Memory 
40 GB IDE Western Digital Hard Drive - (/dev/dsk/c0t0d0) 
Built-in Ethernet - (eri0) 
CDROM - (/dev/dsk/c0t1d0) 
Installing Solaris 8 will require 2 CDs found in the Solaris media kit labeled SOLARIS 8 SOFTWARE 
- 1 of 2 / 2 of 2. Before starting the installation process, ensure that you have noted the following items: 


Determine the host name of the system you are installing 
Determine the language and locales you intend to use on the system 
If you intend to include the system in a network, gather the following information: 
Host IP address 
Subnet mask 
Type of name service (DNS, NIS, or NIS+, for example) 
Domain name 
Host name of server 
Host IP address of the name server 
Using Serial / Console Connection 


For a complete discussion of connecting to a Sun serial console from Linux, see my article "Using Serial Consoles 
- (Sun Sparcs)". 
For this particular installation, I will NOT be using a VGA monitor connected to the built-in 
frame-buffer (video card). The installation will be done using the serial port of the Sun Blade as a console. 
A serial cable (null modem) will be connected from the serial port of a Linux machine to the serial port 
of the Sun Blade. Keep in mind that you will not be able to make use of the serial console of the Sun Blade 
if it was booted with the keyboard/mouse plugged in. In order to make use of the serial console, you will need 
to disconnect the keyboard/mouse and reboot the Sun server. On the Sun Blade 100/150, if the keyboard/mouse 
are plugged in during the boot phase, all console output will be redirected to the VGA console. 

From the Linux machine, you can use a program called minicom. Start it up with the command "minicom". 
Press "Ctrl-A Z" to get to the main menu. Press "o" to configure minicom. Go to "Serial port setup" 
and make sure that you are set to the correct "Serial Device" and that the speed on line E matches the speed 
of the serial console you are connecting to. (In most cases with Sun, this is 9600.) Here are the settings 
I made when using Serial A / COM1 port on the Linux machine: 

+-----------------------------------------------------------------------+
| A -    Serial Device      : /dev/ttyS0                                |
| B - Lockfile Location     : /var/lock                                 |
| C -   Callin Program      :                                           |
| D -  Callout Program      :                                           |
| E -    Bps/Par/Bits       : 9600 8N1                                  |
| F - Hardware Flow Control : Yes                                       |
| G - Software Flow Control : No                                        |
|                                                                       |
|    Change which setting?                                              |
+-----------------------------------------------------------------------+
After making all necessary changes, hit the ESC key to go back to the "configurations" menu. 
Now go to "Modem and dialing". Change the "Init string" to "~^M~". Save the settings (as dflt), 
and then restart Minicom. You should now see a console login prompt. 

[root@bertha1 root]# minicom

Welcome to minicom 1.83.1

OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n
Compiled on Aug 28 2001, 15:09:33.

Press CTRL-A Z for help on special keys

alex console login: root
Password:
Last login: Tue Nov  4 18:55:41 on console
Nov  7 12:17:24 alex login: ROOT LOGIN /dev/console
Sun Microsystems Inc.   SunOS 5.8       Generic Patch   October 2001
#
# init 0
INIT: New run level: 0
The system is coming down.  Please wait.
System services are now being stopped.
Print services stopped.
Nov  7 12:17:38 alex syslogd: going down on signal 15
The system is down.
syncing file systems... done
Program terminated
ok
Starting the Installation 


The installation process starts at the ok prompt. The previous section of this document provides the steps 
required to not only gain access to the console port of the Sun SPARC server, but also how to get the server 
to an ok prompt. If when logging you, the machine is already booted (you have console login like the following:
 "alex console login:") you will need to bring the machine to its EEPROM (ok prompt) by initiating init 0 
like in the Using Serial / Console Connection section above. 
The first step in installing Solaris 8 it to boot the machine from Disk 1 of the SOLARIS 8 SOFTWARE CDs. 
You will need to get the machine to the ok prompt. You can do this by shutting the system down using init 0. 
Once at the ok prompt, type in boot cdrom. (Or in some cases, you can use reboot cdrom). From here, 
the installation program prompts you for system configuration information that is needed to complete the installation. 

NOTE: If you were performing a network installation, you would type: ok boot net. 

In almost all cases, you will be installing the Solaris 8 software on a new system where it will not be necessary 
to preserve any data already on the hard drive. Using this assumption, I will partition the single 40 GB IDE 
hard drive in the system. 

Answering the Screen Prompts 


Let's start the installation process! Put the SOLARIS 8 SOFTWARE (Disk 1 of 2) in the CDROM tray and boot to it: 
ok boot cdrom
Resetting ...

Sun Blade 150 (UltraSPARC-IIe 650MHz), No Keyboard
Copyright 1998-2002 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.6, 1792 MB memory installed, Serial #52928138.
Ethernet address 0:3:ba:27:9e:8a, Host ID: 83279e8a.

Rebooting with command: boot cdrom
Boot device: /pci@1f,0/ide@d/cdrom@1,0:f  File and args:
SunOS Release 5.8 Version Generic_108528-13 64-bit
Copyright 1983-2001 Sun Microsystems, Inc.  All rights reserved.
 
The boot process may take several minutes to complete, but once done, you will start answering a series of prompts. 

The following section will walk you through many of the screen prompts from the installation. 

The first three prompts are from the command line interface (CLI) and are used to specify the language, 
locale and terminal. Use English for both Language and Locale. As for a terminal setting, I commonly telnet 
to a Linux server (that is connected from the serial port of the Linux server to the serial port of the Sun machine). 
From the Linux server, I use "minicom" to connect from the Linux server to the Sun server. 
The best terminal for this type of installation is "DEC VT100": 


  Language                             : English
  Locale                               : English
  What type of terminal are you using? : 3) DEC VT100
NOTE: You should be able to use a terminal type of "DEC VT100" or "X Terminal Emulator (xterms)".  

NOTE: Further installation through the terminal requires responses to the selections through ESC and function keys 
and space bar, which are mentioned on the installation screen.  


Many of the screens to follow will ask you about networking information. When asked if the system will be connected 
to a network, answer Yes. 

NOTE: Many of the screens should be easy to complete except for the "Names Services" section. In almost all cases, 
you will want to use DNS naming services, but if your machine is not currently configured within DNS, this section 
will fail and no information entered about Names Services will be stored and configured. 
If this is the case, you will need to select None under the Names Services section. 
The network configuration will then need to be completed after the installation process by updating certain 
network files on the local hard drive. This will be documented in the "Post Installation Procedures" of this document. 
 


--------------------------------------------------------------------------------


Screen 1 : The Solaris Installation Program 

This is the Solaris Installation Welcome screen. 

Hit ESC - F2 to continue 


Screen 2 : Identify This System 

This screen informs you about how you will need to identify the computer as it applies to network connectivity. 

Hit ESC - F2 to continue 


Screen 3 : Network Connectivity 


Networked
---------
[X] Yes
[ ] No
Hit ESC - F2 to continue 

Screen 4 : DHCP 


Use DHCP
--------
[ ] Yes
[X] No
Hit ESC - F2 to continue 

Screen 5 : Host Name 


Host name: alex
Hit ESC - F2 to continue 

Screen 6 : IP Address 


Host name: 192.168.1.102
Hit ESC - F2 to continue 

Screen 7 : Subnets 


System part of a subnet
-----------------------
[X] Yes
[ ] No
Hit ESC - F2 to continue 

Screen 8 : Netmask 


Netmask: 255.255.255.0
Hit ESC - F2 to continue 

Screen 9 : IPv6 


Enable IPv6
-----------
[ ] Yes
[X] No
Hit ESC - F2 to continue 

Screen 10 : Confirm Information 

This is a confirmation screen. Verify all data is correct. 

Hit ESC - F2 to continue 


Screen 11 : Configure Security Policy 


Configure Kerberos Security
---------------------------
[ ] Yes
[X] No
Hit ESC - F2 to continue 

Screen 12 : Confirm Information 

This is a confirmation screen. Verify all data is correct. 

Hit ESC - F2 to continue 


Screen 13 : Name Service 


Name service
------------
[ ] NIS+
[ ] NIS
[X] DNS
[ ] LDAP
[ ] None
Hit ESC - F2 to continue 

Screen 14 : Domain Name 


Host name: idevelopment.info
Hit ESC - F2 to continue 

Screen 15 : DNS Server Addresses 


Server's IP address: 63.67.120.18
Server's IP address: 63.67.120.23
Server's IP address: 
Hit ESC - F2 to continue 

Screen 16 : DNS Search List 


Search domain:
Search domain:
Search domain: 
Search domain:
Search domain:
Search domain:
Hit ESC - F2 to continue 

Screen 17 : Confirm Information 

This is a confirmation screen. Verify all data is correct. 

Hit ESC - F2 to continue 


Screen 18 : Time Zone 


Regions
-------
[ ] Asia, Western
[ ] Australia / New Zealand
[ ] Canada
[ ] Europe
[ ] Mexico
[ ] South America
[X] United States
[ ] other - offset from GMT
[ ] other - specify time zone file
Hit ESC - F2 to continue 

Screen 19 : Time Zone 


Time zones
----------
[X] Eastern
[ ] Central
[ ] Mountain
[ ] Pacific
[ ] East-Indiana
[ ] Arizona
[ ] Michigan
[ ] Samoa
[ ] Alaska
[ ] Aleutian
[ ] Hawaii
Hit ESC - F2 to continue 

Screen 20 : Date and Time 


Date and time: YYYY-MM-DD HH:MM

  Year   (4 digits) : <enter year>
  Month  (1-12)     : <enter month>
  Day    (1-31)     : <enter day>
  Hour   (0-23)     : <enter hour>
  Minute (0-59)     : <enter minute>
Hit ESC - F2 to continue 

Screen 21 : Confirm Information 

This is a confirmation screen. Verify all data is correct. 

Hit ESC - F2 to continue 


Screen 22 : Solaris Interactive Installation 

This screen recognizes if a previous version of Solaris is installed and whether you would like to upgrade or not. 
Always select the install option (F4_Initial). 

Hit ESC - F4 to continue 


Screen 23 : Solaris Interactive Installation 

There are two ways to install your Solaris software: "Standard" or "Flash". 
Choose the "Standard" method (Esc-2_Standard). 

Hit ESC - F2 to continue 


Screen 24 : Time Zone 


Select the geographic regions for which support should be installed.
--------------------------------------------------------------------
> [ ] Asia
> [ ] Eastern Europe
> [ ] Middle East
> [ ] Central America
> [ ] South America
> [ ] Northern Europe
> [ ] Southern Europe
> [ ] Central Europe
V [/] North America
  [ ]     Canada-English (ISO8859-1)
  [ ]     Canada-French (ISO8859-1)
  [ ]     French
  [ ]     Mexico (ISO8859-1)
  [X]     U.S.A. (en_US.ISO8859-1) [ ] Australasia
> [ ] Western Europe
> [ ] Northern Africa
Hit ESC - F2 to continue 

Screen 25 : Select Software 


Select the Solaris software to install on the system.
-----------------------------------------------------
[ ] Entire Distribution plus OEM support 64-bit  1432.00 MB
[X] Entire Distribution 64-bit ................. 1401.00 MB
[ ] Developer System Support 64-bit ............ 1350.00 MB
[ ] End User System Support 64-bit ............. 932.00 MB
[ ] Core System Support 64-bit ................. 396.00 MB
Hit ESC - F2 to continue 

Screen 26 : Select Disks 

You must select the disks for installing Solaris software. If there are several disks available, 
I always install the Solaris software on the boot disk c0t0d0. 

----------------------------------------------------------
Disk Device (Size)        Available Space
=============================================
[X] c0t0d0   (14592 MB) boot disk    14592 MB  (F4 to edit)

                    Total Selected:  14592 MB
                 Suggested Minimum:    974 MB




--------------------------------------------------------------------------------

I generally select ESC - F4 to edit the c0t0d0 disk to ensure that the root directory is going 
to be located on this disk. 

----------------------------------------------------------
On this screen you can select the disk for installing the 
root (/) file system of the Solaris software.

Original Boot Device : c0t0d0

          Disk
      ==============================
      [X] c0t0d0    (F4 to select boot device)




--------------------------------------------------------------------------------

On this screen, I typically select ESC - F4 to select boot device to ensure the root file system will be 
located on slice zero, c0t0d0s0. 

----------------------------------------------------------
On this screen you can select the specific slice for the root (/) file
system. If you choose Any of the Above, the Solaris installation program
will choose a slice for you.

Original Boot Device : c0t0d0s0

          [X]  c0t0d0s0
          [ ]  c0t0d0s1
          [ ]  c0t0d0s2
          [ ]  c0t0d0s3
          [ ]  c0t0d0s4
          [ ]  c0t0d0s5
          [ ]  c0t0d0s6
          [ ]  c0t0d0s7
          [ ]  Any of the Above
Hit ESC - F2 to after selecting Disk Slice 


--------------------------------------------------------------------------------

Hit ESC - F2 to continue with your Boot Disk selection 


--------------------------------------------------------------------------------


Screen 27 : Reconfigure EEPROM? 

Do you want to update the system's hardware (EEPROM) to always boot from c0t0d0? 

Hit ESC - F2 to Reconfigure EEPROM and Continue 


Screen 28 : Preserve Data? 

Do you want to preserve existing data? At least one of the disks you've selected for installing Solaris software 
has file systems or unnamed slices that you may want to save. 

Hit ESC - F2 to continue 


Screen 29 : Automatically Layout File Systems? 

Do you want to use auto-layout to automatically layout file systems? Manually laying out file systems 
requires advanced system administration skills. 

I typically perform an "Auto" File System Layout (F2_Auto Layout). 

Hit ESC - F2 to Perform Auto Layout. 


Screen 30 : Automatically Layout File Systems 

On this screen you must select all the file systems you want auto-layout to create, or accept the 
default file systems shown. 

File Systems for Auto-layout
========================================
[X]  /
[ ]  /opt
[ ]  /usr
[ ]  /usr/openwin
[ ]  /var
[X]  swap
Hit ESC - F2 to continue 

Screen 31 : File System and Disk Layout 

The summary below is your current file system and disk layout, based on the information you've supplied. 

NOTE: If you choose to customize, you should understand file systems, their intended purpose on the disk, 
and how changing them may affect the operation of the system. 

File system/Mount point           Disk/Slice             Size
=============================================================
/                                 c0t0d0s0            1338 MB
swap                              c0t0d0s1             296 MB
overlap                           c0t0d0s2           38162 MB
/export/home                      c0t0d0s7           36526 MB




--------------------------------------------------------------------------------

I generally select ESC - F4 (F4_Customize) to edit the partitions for disk c0t0d0. If this is a workstation, 
I make only three partitions: 


/ : I often get the sizes for the individual filesystems (/usr, /opt, and /var) incorrect. This is one reason 
I typically create only one partition as / that will be used for the entire system (minus swap space). 
In most cases, I will be installing addition disks for large applications like the Oracle RDBMS, 
Oracle Application Server, or other J2EE application servers. 
overlap : The overlap partition represents entire disk and is slice s2 of the disk. 
swap : The swap partition size depends on the size of RAM in the system. If you are not sure of its size, 
make it double the amount of RAM in your system. I typically like to make swap 1GB. 
------------------------------------------------
Boot Device: c0t0d0s0
=================================================
  Slice  Mount Point                 Size (MB)
     0   /                               37136
     1   swap                             1025
     2   overlap                         38162
     3                                       0
     4                                       0
     5                                       0
     6                                       0
     7                                       0
=================================================
                         Capacity:       38162 MB
                        Allocated:       38161 MB
                   Rounding Error:           1 MB
                             Free:           0 MB
Hit ESC - F2 to continue 


--------------------------------------------------------------------------------

This is what the File System and Disk Layout screen looks like now. 

File system/Mount point           Disk/Slice             Size
=============================================================
/                                 c0t0d0s0           37136 MB
swap                              c0t0d0s1            1025 MB
overlap                           c0t0d0s2           38162 MB
Hit ESC - F2 to continue 

Screen 32 : Mount Remote File Systems? 

Do you want to mount software from a remote file server? This may be necessary if you had to remove software 
because of disk space problems. 

Hit ESC - F2 to continue 


Screen 33 : Confirm Information 

This is a confirmation screen. Verify all data is correct. 

Hit ESC - F2 to continue 


Screen 34 : Reboot After Installation? 

After Solaris software is installed, the system must be rebooted. You can choose to have the system 
automatically reboot, or you can choose to manually reboot the system if you want to run scripts or do other 
customizations before the reboot. You can manually reboot a system by using the reboot(1M) command. 

[X] Auto Reboot
[ ] Manual Reboot
Hit ESC - F2 to Begin the Installation 

Screen 34 : Installation Progress 

Afterwards it starts configuring disk making partitions and installing software indicating the progress. 

Preparing system for Solaris install

Configuring disk (c0t0d0)
        - Creating Solaris disk label (VTOC)

Creating and checking UFS file systems
        - Creating / (c0t0d0s0)

==================================================================

MBytes Installed: 392.08

MBytes Remaining: 428.09

      Installing: JavaVM run time environment

***************
|    |     |     |     |     |  

0   20    40    60    80    100 
After the installation is complete it customizes system files, devices, and logs. 
The system then reboots or asks you to reboot depending upon the choice selected earlier in the Reboot 
After Installation? screen. 


Screen 36 : Create a root Password 

On this screen you can create a root password. 

A root password can contain any number of characters, but only the first eight characters in the password 
are significant. (For example, if you create `a1b2c3d4e5f6' as your root password, you can use `a1b2c3d4' 
to gain root access.) 

You will be prompted to type the root password twice; for security, the password will not be displayed 
on the screen as you type it. 

> If you do not want a root password, press RETURN twice. 


Root password:
Enter Your root Password and Press Return to continue. 

Screen 37 : Solaris 8 Software 2 of 2 

Please specify the media from which you will install Solaris 8 Software 2 of 2 (2/02 SPARC Platform Edition). 

Alternatively, choose the selection for "Skip" to skip this disc and go on to the next one. 


Media:

1. CD/DVD
2. Network File System
3. Skip

   Media [1]: 1

Screen 38 : Insert the CD/DVD for Solaris 8 Software 2 of 2 

Please insert the CD/DVD for Solaris 8 Software 2 of 2 (2/02 SPARC Platform Edition). 

After you insert the disc, please press Enter. 

Enter S to skip this disc and go on to the next one. To select a different media, enter B to go Back. 

[]

Screen 39 : Solaris 8 packages (part 2) 

After hitting <Enter> in the previous screen, the installation will continue installing the Solaris software (part 2) 

Reading Solaris 8 Software 2 of 2 (2/02 SPARC Platform Edition).... \

Launching installer for Solaris 8 Software 2 of 2 (2/02 SPARC Platform
Edition). Please Wait...

Installing Solaris 8 packages (part 2)
|-1%--------------25%-----------------50%-----------------75%--------------100%|


Installation details:

     Product                      Result     More Info
 1.  Solaris 8 packages (part 2)  Installed  Available

 2.  Done

   Enter the number corresponding to the desired selection for more
   information, or enter 2 to continue [2]:2

   <Press Return to reboot the system> 
Post-Installation Tasks 


After successfully installing the Solaris operating platform software, there may be several tasks that need 
to be performed depending on your configuration. 

Networking: 
If you will be using networking database files for your TCP/IP networking configuration, several files 
will need to be manually created and/or modified. I provided a step-by-step document on how to manually 
configure TCP/IP networking files to manually enable TCP/IP networking using files: 
Configuring TCP/IP on Solaris - TCP/IP Configuration Files - (Quick Config Guide) 


Solaris 8 Patch Cluster: 
It is advisable to install the latest Sun Solaris Patch Cluster to ensure a stable operating environment. 
I provided a step-by-step document on how to download and install the latest Sun Solaris 8 Patch Cluster: 
Patching Sun Solaris 2.8 



=======================
8. RAID Volumes on SUN:
=======================



8.1 SCSI, DISKS AND RAID:
=========================

8.1.1 General
-------------

 SCSI HBA-----------SCSI ID 5----Lun 0 Primary CDROM drive
               |              |--Lun 1 Slave CDROM drive
               |              |-- ....
               |              |--Lun 7 Slave CDROM drive
               |
               |----SCSI ID 6----Lun 0 Primary CDROM
               |              |--...
               |
               |----SCSI ID 0----...

Every SCSI Device can have 8 lun numbers from 0-7


A logical unit number (LUN) is a unique identifier used on a SCSI bus that enables it to differentiate between 
up to eight separate devices (each of which is a logical unit). Each LUN is a unique number that identifies 
a specific logical unit, which may be an disk. 

A SCSI (Small System Computer Interface) is a parallel interface, that can have up to eight devices 
all attached through a single cable; the cable and the host (computer) adapter make up the SCSI bus. 
The bus allows the interchange of information between devices independently of the host. 
In the SCSI program, each device is assigned a unique number, which is either a number between 
0 and 7 for an 8-bit (narrow) bus, or between 8 and 16 for a 16-bit (wide) bus. 
The devices that request input/output (I/O) operations are initiators and the devices that perform 
these operations are targets. Each target has the capacity to connect up to eight additional devices 
through its own controller; these devices are the logical units, each of which is assigned a unique number 
for identification to the SCSI controller for command processing. 

Short for logical unit number, a unique identifier used on a SCSI bus to distinguish between devices 
that share the same bus. SCSI is a parallel interface that allows up to 16 devices to be connected along a single cable. 
The cable and the host adapter form the SCSI bus, and this operates independently of the rest of the computer. 
Each of the eight devices is given a unique address by the SCSI BIOS, ranging from 0 to 7 for an 8-bit bus or 
0 to 15 for a 16-bit bus. Devices that request I/O processes are called initiators. Targets are devices that perform 
operations requested by initiators. Each target can accommodate up to eight other devices, known as logical units, 
and each is assigned an LUN. Commands that are sent to the SCSI controller identify devices based on their LUNs. 

So we might have a situation as:

- Drive C: is standard, Drive D: is SCSI Target 0 LUN 0.
- Drive C: is SCSI Target 0 LUN 0, Drive D:, if installed,is SCSI Target 0 LUN 1 or Target 1 LUN 0.


8.1.2 single-initiator
----------------------


A single-initiator SCSI bus has only one node connected to it, and provides host isolation and better 
performance than a multi-initiator bus. Single-initiator buses ensure that each node is protected 
from disruptions due to the workload, initialization, or repair of the other nodes.

When using a single- or dual-controller RAID array that has multiple host ports and provides 
simultaneous access to all the shared logical units from the host ports on the storage enclosure, 
the setup of the single-initiator SCSI buses to connect each cluster node to the RAID array is possible. 
If a logical unit can fail over from one controller to the other, the process must be transparent 
to the operating system. Note that some RAID controllers restrict a set of disks to a specific 
controller or port. In this case, single-initiator bus setups are not possible.


To set up a single-initiator SCSI bus configuration, perform the following steps:


Enable the onboard termination for each host bus adapter.

Enable the termination for each RAID controller. 

Use the appropriate SCSI cable to connect each host bus adapter to the storage enclosure.

Setting host bus adapter termination is done in the adapter BIOS utility during system boot. 
To set RAID controller termination, refer to the vendor documentation. 


  ---------   SI SCSI bus                   --------------
  |      T|---------------                  |  HBA        |
  |HBA    |               |       ----------|T            |
  |       |               |       |         --------------
  ---------               |       |
                          |       |
                     -------------------
                     |    T       T    |
                     |Storage Enclosure|
                     -------------------

Recommended in Linux an Sun clusters.


8.1.3 Multi Initiator SCSI
--------------------------


Multi Initiator SCSI configurations are configurations with two SCSI host adapter boards connect 
to a single SCSI bus like in the following example: 

  ______________                                              ______________
 |   System 1   |  SCSI   ___________    ___________   SCSI  |   System 2   |
 |(SCSI Adapter)|--------|SCSI Device|--|SCSI Device|--------|(SCSI Adapter)|
 |______________|  Bus   |___________|  |___________|  Bus   |______________|
                                                



  ---------   SI SCSI bus                   --------------
  |      T|-------------------------------- |T            |
  |       |                  |              |             |
  |HBA    |                  |              |HBA          |
  |       |                  |              |             |
  ---------                  |              ---------------           
                     -------------------
                     |       T         |
                     |Storage Enclosure|
                     -------------------


Not recommended for Linux or Solaris clusters.



8.2 Installing an A1000 on Solaris8:
====================================

contributed by Jim Shumpert, edited by Doug Hughes
Here is what you need to do to install An A1000 on Solaris8. The order is very particular. 
Much of this is by way of example. The particulars of your site will differ. Substitute the latest version of 
Raid Manager if there is a newer one available. Also, the exact firmware versions will change over time, 
so, do not take this too literally. 

-Install Solaris8 
-Install required OS patches
 (If you have an Ultra60, install 106455-09 or better - firmware patch - before proceeding) 
- Install Raid Manager 6.22 (RM 6.22) or better. 
# pkgadd -d . SUNWosar SUNWosafw SUNWosamn SUNWosau
  See also section 6.2

(contributed by Greg Whalin) Check /etc/osa/mnf and make sure that your controller name does NOT contain any periods. 
Change them to a _ instead. The RM software does not have any clue how to deal with a period. 
This kept me screwed up for quite a while. 

/etc/osa >more mnf
rebv-pegasu_001~1T94516518~ 0 1 2~~~0~3~~c1t0d0~~

Install patches 109571-02 (for Solaris8 FCS) and 108553-07 (or newer)
(for Solaris7/2.6 patch 108834-07 or newer) [ NOTE: 112125-01 and 112126-01 or better for RM 6.22.1] 
# patchadd 109571-02
# patchadd 108553-02 

Boot -r 
# touch /reconfigure
# reboot -- -r 

Upgrade the firmware on the A1000 

/usr/lib/osa/bin/raidutil -c c1t0d0 -i

LUNs found on c1t0d0.
  LUN 0    RAID 0    0 MB

Vendor ID         Symbios 
ProductID         StorEDGE A1000  
Product Revision  0205
Boot Level        02.05.01.00
Boot Level Date   12/02/97
Firmware Level    02.05.02.11
Firmware Date     04/09/98
raidutil succeeded!


Find lowest number firmware upgrade that is still greater than the firmware that is installed on your A1000. 
For the above example, with patch 108553, upgrade to 2.05.06.32 (do this first, VERY IMPORTANT!) 
# cd /usr/lib/osa/fw
# /usr/lib/osa/bin/fwutil 02050632.bwd c1t0d0
# /usr/lib/osa/bin/fwutil 02050632.apd c1t0d0 

Upgrade to the each next higher firmware in succession until you get to the most recent version. 
It is recommend that you do the upgrades in order. For this example, Upgrade to 3.01.02.33/5 
# /usr/lib/osa/bin/fwutil 03010233.bwd c1t0d0
# /usr/lib/osa/bin/fwutil 03010235.apd c1t0d0 

Upgrade to 03.01.03.60 (or better) 
# /usr/lib/osa/bin/fwutil 03010304.bwd c1t0d0
# /usr/lib/osa/bin/fwutil 03010360.apd c1t0d0 

Check that the array has the correct versions: 

# /usr/lib/osa/bin/raidutil -c c1t0d0 -i

LUNs found on c1t0d0.
  LUN 0    RAID 0    0 MB

Vendor ID         Symbios 
ProductID         StorEDGE A1000  
Product Revision  0301
Boot Level        03.01.03.00
Boot Level Date   10/22/99
Firmware Level    03.01.03.54
Firmware Date     03/30/00
raidutil succeeded!

Check to make sure that the RAID is attached and looks good 

# /usr/lib/osa/bin/drivutil -i c1t0d0

Drive Information for satl-adb1_a_001


Location  Capacity   Status         Vendor  Product          Firmware     Serial
	    (MB)                              ID             Version      Number
[1,0]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0RHKA00    
[2,0]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0QZM600    
[1,1]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0QLRG00    
[2,1]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0RHM400    
[1,2]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0R9FZ00    
[2,2]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0R9SZ00    
[1,3]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0R9FY00    
[2,3]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0QKVR00    
[1,4]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0R79X00    
[2,4]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0QX8500    
[1,5]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0R9JS00    
[2,5]     17274      Optimal        SEAGATE ST318404LSUN18G  4207         3BT0RCY600    

drivutil succeeded!

Example: Create 1 large 10-disk RAID 5 configuration (LUN 0) of max size and then create 2 Hot Spare disks 

# /usr/lib/osa/bin/raidutil -c c1t0d0 -D 0

LUNs found on c1t0d0.
  LUN 0    RAID 0    0 MB
Deleting LUN 0.
Press Control C to abort.

 LUNs successfully deleted

raidutil succeeded!

# /usr/lib/osa/bin/raidutil -c c1t0d0 -l 5 -n 0 -s 0 -r fast -g 10,20,11,21,12,22,13,23,14,24

No existing LUNs were found on c1t0d0.
Capacity available in drive group:  317669184 blocks  (155111 MB).
Creating LUN 0

Registering new logical unit 0 with system.
Formatting logical unit 0  RAID 5   155111 MB 
LUNs found on c1t0d0.
  LUN 0    RAID 5    155111 MB

 LUNs successfully created

raidutil succeeded!

# /usr/lib/osa/bin/raidutil -c c1t0d0 -h 15,25

LUNs found on c1t0d0.
  LUN 0    RAID 5    155111 MB

raidutil succeeded!

Format new RAID by making only one slice 2 partition: 

# prtvtoc /dev/rdsk/c1t0d0s2
* /dev/rdsk/c1t0d0s2 partition map
*
* Dimensions:
*     512 bytes/sector
*      75 sectors/track
*      64 tracks/cylinder
*    4800 sectors/cylinder
*   65535 cylinders
*   65533 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*       First     Sector    Last
*       Sector     Count    Sector 
*           0 314558400 314558399
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       2      5    01          0 314558400 314558399


Newfs new RAID 
# newfs /dev/dsk/c1t0d0s2 

Mount the RAID up as /raid 
# mkdir /raid
# echo "/dev/dsk/c1t0d0s2 /dev/rdsk/c1t0d0s2 /raid ufs 3 yes -" >> /etc/vfstab
# mount /raid 

Check to make sure that the new array is available via "df -lk" 

 # df -lk
 Filesystem            kbytes    used   avail capacity  Mounted on
 /dev/md/dsk/d0       2056211   43031 1951494     3%    /
 /dev/md/dsk/d6       4131866 1133180 2957368    28%    /usr
 /proc                      0       0       0     0%    /proc
 fd                         0       0       0     0%    /dev/fd
 mnttab                     0       0       0     0%    /etc/mnttab
 /dev/md/dsk/d5       2056211    9092 1985433     1%    /var
 swap                 1450208       8 1450200     1%    /var/run
 swap                 1450208       8 1450200     1%    /tmp
 /dev/md/dsk/d7       8089425  182023 7826508     3%    /export
 /dev/dsk/c1t0d0s2    154872105       9 153323375     1%    /raid


6.2 Install problem A1000
-------------------------

Hi.

Thanks for your kind responses. There are a few reply but tons of 
out of office mail. And sorry for forgetting to state that A1000
is not brand new one but used one. After some researches I found
this. here's my summary.

Conclusion:
If A1000 has previously defined LUNs and will be used to be array
as new one, you have to be remove old LUNs before define new LUNs
or your rm6 complains that cannot find raid modules.

---
if you can see more than 1 LUNs in boot prom via command "probe-scsi-all"
you have to insert disk into slot as many as LUNs than reboot with boot -rs.
Than you can see configured LUNs via /usr/lib/osa/bin/lad.
and /usr/lib/osa/bin/raidutil -c c#t#d# -X to delete all old LUNs.
Once you delete old LUNs you can boot normaly with just one disk and 
can find raid module.

Again, Thanks for your help.
-- 



6.3 Installing RM 6.22 on Solaris:
----------------------------------

Raid Manager 6.22 and A1000 config

-- Config and setup
-- ----------------

Firstly install the Raid manager 6.22 (6.221) software on the Solaris 8 system. 

	# pkgadd -d . SUNWosar SUNWosafw SUNWosamn SUNWosau

Defending upon your raid manager version and  scsi/fibre card type you will need to patch the system. 
The following patches are recommended for Solaris 8.

Solaris 8 & Raid manager 6.22        108553-07108982-09111085-02 
Solaris 8 & Raid manager 6.221       112125-01108982-09111085-02 
Ultra 60                             106455-09 
Fibre channel card                   109571-02 

It is probably worth giving the system a reconfigure reboot at this stage.


-- Firmware
-- --------

The first thing to do is check the firmware of the A1000. This can be done with the raidutil command. 
( I assume the A1000 is on controller 1. If not then change the controller as appropriate. 

	# raidutil -c c1t0d0 -i

If the returned values are less that those shown below you will have to upgrade the firmware using fwutil.

	Product		Revision  0301
	Boot Level        03.01.03.04
	Boot Level Date   07/06/00
	Firmware Level    03.01.03.60
	Firmware Date     06/30/00

To upgrade the firmware perform the following.

	# cd /usr/lib/osa/fw
	# fwutil 02050632.bwd c1t0d0
	# fwutil 02050632.apd  c1t0d0
	# fwutil 03010233.bwd  c1t0d0
	# fwutil 03010235.apd  c1t0d0
	# fwutil 03010304.bwd  c1t0d0
	# fwutil 03010360.apd  c1t0d0

You can now re-perform the "raidutil -c c1todo -i" command again to verify the firmware changes.


Clean up the array
I am assuming that the array is free for full use by ourselves and intend to remove any old luns that might be lying around. 

	# raidutil -c c1t0d0 -X
The above command resets the array internals.
We can now remove any old lun's.  To do this run "raidutil -c c1t0d0 -i" and note any luns that are configured.

To delete the luns perform the following command.
	# raidutil -c c1t0d0 -i
			LUNs found on c1t0d0.
 			LUN 0    RAID 1    10 MB

			Vendor ID         Symbios
			ProductID         StorEDGE A1000
			Product Revision  0301
			Boot Level        03.01.03.04
			Boot Level Date   07/06/00
			Firmware Level    03.01.03.60
			Firmware Date     06/30/00
			raidutil succeeded!

	# raidutil -c c1t0d0 -D 0
In the above example we are removing lun 0.  repeat this command changing the lun number as appropriate.

We can now give the array a name of our choice. (Do not use a .)
	# storutil -c c1t0d0 -n "dragon_array"


Creating Lun's
The disks are labelled on the front of the A1000 as controller number and disk number seperated by a comma eg. 1,0 1,2 and 2,0 etc, etc. We refer to the disks without using the comma. So the first disk on controller 1 is disk 10 and the 3rd disk on controller 2 is disk 23. we will use disks on both controllers when creating the mirrors. I am starting with the disks on each controller as viewed form the left. The next stage is to create the luns we require. In the below example I will configure a fully populated (12 disks) system which has 18Gb drives into the following sizes. Here we will use the raidutil command again. 

	# raidutil -c controller -n lun_number -l  raid_type  -s  size  -g  disk_list

LUN 0  	Size 8617mb of a stripped/mirror configuration across half of the first two disks.
		# raidutil -c c1t0d0 -n 0 -l 1+0 -s 8617 -g 10,20

LUN 1  	Size 8617mb of a stripped/mirror configuration across the second half of the first two disks.
		# raidutil -c c1t0d0 -n 1 -l 1+0 -s 8617 -g 10,20

LUN 2  	Size 8617mb of a stripped/mirror configuration across half of the next two disks.
		# raidutil -c c1t0d0 -n 2 -l 1+0 -s 8617 -g 11,21

LUN 3  	Size 8617mb of a stripped/mirror configuration across the second half of the next two disks.
		# raidutil -c c1t0d0 -n 3 -l 1+0 -s 8617 -g 11,21

LUN 4  	Size 34468mb of a stripped/mirror configuration across the next four disks.
		# raidutil -c c1t0d0 -n 4 -l 1+0 -s 34468 -g 12,13,22,23

LUN 5  	Size 17234mb of a stripped/mirror configuration across the next two disks.
		# raidutil -c c1t0d0 -n 5 -l 1+0 -s 34468 -g 14,24

LUN 6 	Size 17234mb of a non mirror configuration on the next disk.
		# raidutil -c c1t0d0 -n 6 -l 0 -s 34468 -g 15

This then leaves the disk 25 or disk 5 on the second controller free as a hot spare.
to set up this disk as a hot spare run
                # raidutil -h 25


Finishing off
We are now ready to reboot the system performing a reconfigure. When this is done we can format, partition, newfs 
and mount the disks in the normal way. 

Other commands
The following is a list of possibly useful raid manager commands 

rm6 (GUI interface) 
drivutil (drive / lun management) 
healtchk (helth check on a raid module 
lad (list array devices) 
logutil (log formatting program) 
nvutil (edit / modify NVSRAM) 
parityck (parity checker and repair) 
rdacutil (redundency controller for failed bits and load balancing) 
storutil (host and naming info) 



7.3 Sun StorEdge D1000:
=======================


Overview
The Sun StorEdge D1000 is a disk tray with hot-pluggable 


- Power supplies 
- Fans 
- Disks (If SPARCstorage Volume Manager configured). 

A D1000 is disk array attached to the hostname is configured as a RAID5 metadevice. 


Disk Terminology
Before you can effectively use the information in this section, you should be familiar with basic disk architecture. 
In particular, you should be familiar with the following terms: 

Track 
Cylinder 
Sector 
Disk controller 
Disk label 
Device drivers 
 

Disk Slices
Files stored on a disk are contained in file systems. Each file system on a disk is assigned to a slice-a group of 
cylinders set aside for use by that file system. Each disk slice appears to the operating system 
(and to the system administrator) as though it were a separate disk drive. 
Slices are sometimes referred to as partitions. 

Each disk slice holds only one file system. 

No file system can span multiple slices. 
On SPARC based systems, Solaris defines eight disk slices and assigns to each a conventional use. 
These slices are numbered 0 through 7. 

Slice File System Purpose 
0  root  Holds files and directories that make up the operating system.  
1  swap  Provides virtual memory, or swap space. Swap space is used when running programs are too large to fit 
   in a computer's memory. The Solaris operating environment then "swaps" programs from memory to the disk 
   and back as needed.  
2  Refers to the entire disk, by convention. It is defined automatically by the format and the Solaris 
   installation programs. The size of this slice should not be changed.  
3  /export  Holds alternative versions of the operating system. These alternative versions are required by client systems 
   whose architectures differ from that of the server. Clients with the same architecture type as the server 
   obtain executables from the /usr file system, usually slice 6.  
4  /export/swap  Provides virtual memory space for client systems.  
5  /opt  Holds application software added to a system. If a slice is not allocated for this file system 
   during installation, the /opt directory is put in slice 0.  
6  /usr  Holds operating system commands-also known as executables- designed to be run by users. 
   This slice also holds documentation, system programs (init and /tech/sun/commands/syslogd.html">syslogd, for example) 
   and library routines.  
7  /export/home  Holds files created by users.  

Or.. something like this is also seen on a single disk system:

/        Slice 0, partition  about 2G
swap     Slice 1, partition  about 4G
/export  Slice 3, partition  about 50G, maybe you link it to /u01
/var     Slice 4, partition  about 2G
/opt     Slice 5, partition  about 10G if you plan to install apps here
/usr     Slice 6, partition  about 2G
/u01     Slice 7, partition  optional, standard it's /home
         Depending on how you configure /export, size could be around 20G 

Raw Data Slices
The SunOS operating system stores the disk label in block 0, cylinder 0 of each disk. This means that using third-party 
database applications that create raw data slices must not start at block 0, cylinder 0, or the disk label 
will be overwritten and the data on the disk will be inaccessible. 

Do not use the following areas of the disk for raw data slices, which are sometimes created by third-party d
atabase applications: 

Block 0, cylinder 0  Where the disk label is stored.  
Cylinder 0  Avoid for improved performance. 
Slice 2  Represents the entire disk. 


Slice Arrangements on Multiple Disks 
Although a single disk that is large enough can hold all slices and their corresponding file systems, two or more disks are often used to hold a system's slices and file systems. A slice cannot be split between two or more disks. However, multiple swap slices on separate disks are allowed. 

For instance, a single disk might hold the root (/) file system, a swap area, and the /usr file system, while a separate disk is provided for the /export/home file system and other file systems containing user data. 

In a multiple disk arrangement, the disk containing the operating system software and swap space (that is, the disk holding the root (/) or /usr file systems or the slice for swap space) is called the system disk. Disks other than the system disk are called secondary disks or non-system disks. 

Locating a system's file systems on multiple disks allows you to modify file systems and slices on the secondary disks without having to shut down the system or reload operating system software. 

Having more than one disk also increases input-output (I/O) volume. By distributing disk load across multiple disks, you can avoid I/O bottlenecks. 

 

Determining Which Slices to Use
When you set up a disk's file systems, you choose not only the size of each slice, but also which slices to use. The system configuration requires the use of different slices. The table below lists these requirements. 


Slice Server 
0 root 
1 swap 
2  -  
3  /export  
4  /export/swap  
5 /opt 
6 /usr 
7 /export/home 

 

The format Utility
The format utility can be used to manipulate hard disk drives: 


Display slice information 
Divide a disk into slices 
Add a disk drive 
Reformat a disk drive 
Repair a disk drive 
 

Disk Labels
A special area of every disk is set aside for storing information about the disk's controller, geometry, and slices. That information is called the disk's label. Another term used to described the disk label is the VTOC (Volume Table of Contents). To label a disk means to write slice information onto the disk. You usually label a disk after changing its slices. 

If you fail to label a disk after creating slices, the slices will be unavailable because the operating system has no way of "knowing" about the slices. The partition table identifies a disk's slices, the slice boundaries (in cylinders), and total size of the slices. A disk's partition table can be displayed using the format utility. Partition flags and tags are assigned by convention and require no maintenance. 

The following partition table example is displayed from a 1.05-Gbyte disk using the format utility: 

Total disk cylinders available: 2036 + 2 (reserved cylinders)
Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       0 -  300      148.15MB    (301/0/0)   303408
  1       swap    wu     301 -  524      110.25MB    (224/0/0)   225792
  2     backup    wm       0 - 2035     1002.09MB    (2036/0/0) 2052288
  3 unassigned    wm       0               0         (0/0/0)          0
  4 unassigned    wm       0               0         (0/0/0)          0
  5 unassigned    wm       0               0         (0/0/0)          0
  6        usr    wm     525 - 2035      743.70MB    (1511/0/0) 1523088
  7 unassigned    wm       0               0         (0/0/0)          0

The partition table contains the following information: 


Column Name Description 
Part  Partition or (slice number). Valid numbers are 0-7. 
Tag  A numeric value that usually describes the file system mounted on this partition. 
0=UNASSIGNED 
1=BOOT 
2=ROOT 
3=SWAP 
4=USR 
5=BACKUP 
7=VAR 
8=HOME  
Flags  wm  Partition is writable and mountable. 
wu rm  Partition is writable and unmountable. Default state of partitions dedicated for swap areas. The mount command does not check the "not mountable" flag. 
top>rm  The partition is read only and mountable. 
 
Cylinders  The starting and ending cylinder number for the slice. 
Size  The slice size in Mbytes. 
Blocks  The total number of cylinders and the total number of sectors per slice in the far right column. 

The following example displays a disk label using the prtvtoc command. 


# prtvtoc /dev/rdsk/c0t1d0s0
* /dev/rdsk/c0t1d0s0 partition map
*
* Dimensions:
*     512 bytes/sector
*      72 sectors/track
*      14 tracks/cylinder
*    1008 sectors/cylinder
*    2038 cylinders
*    2036 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00          0    303408    303407   /
       1      3    01     303408    225792    529199
       2      5    00          0   2052288   2052287
       6      4    00     529200   1523088   2052287   /usr

The disk label includes the following information: 


Dimensions - Physical dimensions of the disk drive. 

Flags - Flags listed in the partition table section. 

Partition (or Slice) Table - Contains the following information: 

Column Name  Description  
Partition Slice number  
Flags Partition flag.  
First Sector  The first sector of the slice. 
Sector Count  The total number of sectors in the slice. 
Last Sector The last sector number in the slice. 
Mount Directory The last mount point directory for the file system. 

 

Dividing a Disk Into Slices
The format utility is most often used by system administrators to divide a disk into slices. The steps are: 

Determining which slices are needed 
Determining the size of each slice 
Using the format utility to divide the disk into slices 
Labeling the disk with new slice information 
Creating the file system for each slice 
The easiest way to divide a disk into slices is to use the modify command from the partition menu. The modify command allows you to create slices by specifying the size of each slice in megabytes without having to keep track of starting cylinder boundaries. It also keeps tracks of any disk space remainder in the "free hog" slice. 

 

Using the Free Hog Slice
When you use the format utility to change the size of one or more disk slices, you designate a temporary slice that will expand and shrink to accommodate the resizing operations. 

This temporary slice donates, or "frees," space when you expand a slice, and receives, or "hogs," the discarded space when you shrink a slice. For this reason, the donor slice is sometimes called the free hog. 

The donor slice exists only during installation or when you run the format utility. There is no permanent donor slice during day-to-day, normal operations. 

 

 

How to Identify the Disks on a System

Become superuser. 

Run the format utility. 

# format 
The format utility displays a list of disks that it recognizes under AVAILABLE DISK SELECTIONS. 
Here is sample format output: 


# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
          /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf785d11,0
       1. c1t1d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
          /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf78670e,0
       2. c2t0d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
          /pci@8,600000/scsi@1/sd@0,0
       3. c2t1d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
          /pci@8,600000/scsi@1/sd@1,0
       4. c2t8d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
          /pci@8,600000/scsi@1/sd@8,0
       5. c2t9d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
          /pci@8,600000/scsi@1/sd@9,0


The format output associates a disk's physical and local device name to the disk's marketing name which appears in angle brackets <>. This is an easy way to identify which local device names represent the disks connected to your system. The following example uses a wildcard to display the disks connected to a second controller. 

# format /dev/rdsk/c2*
AVAILABLE DISK SELECTIONS:
  0. /dev/rdsk/c2t0d0s0 
     /io-unit@f,e0200000/sbi@0,0/QLGC,isp@2,10000/sd@0,0
  1. /dev/rdsk/c2t1d0s0 
     /io-unit@f,e0200000/sbi@0,0/QLGC,isp@2,10000/sd@1,0
  2. /dev/rdsk/c2t2d0s0 
     /io-unit@f,e0200000/sbi@0,0/QLGC,isp@2,10000/sd@2,0
  3. /dev/rdsk/c2t3d0s0 
     /io-unit@f,e0200000/sbi@0,0/QLGC,isp@2,10000/sd@3,0
  4. /dev/rdsk/c2t5d0s0 
     /io-unit@f,e0200000/sbi@0,0/QLGC,isp@2,10000/sd@5,0
Specify disk (enter its number): 

The format output identifies that disk 2 (targets 0-5) are connected to the first SCSI host adapter (sbi@...), 
which is connected to the first SBus device (io-unit@). 



--------------------------------------------------------------------------------


Displaying Disk Slices
You can use the format utility to check whether or not a disk has the appropriate disk slices. If you determine 
that a disk does not contain the slices you want to use, use the format utility to re-create them and label the disk. 
The format utility uses the term partition in place of slice. 



Become superuser. 

Enter the format utility. 

Identify the disk for which you want to display slice information by selecting a disk listed 
under AVAILABLE DISK SELECTIONS. 

Specify disk (enter its number):1 

Enter the partition menu by typing partition at the format> prompt. 

format> partition 

Display the slice information for the current disk drive by typing print at the partition> prompt. 

partition> print 

Exit the format utility by typing q at the partition> prompt and typing q at the format> prompt. 

partition> q 
format> q 
# 

Verify displayed slice information by identifying specific slice tags and slices. If the screen output shows that 
no slice sizes are assigned, the disk probably does not have slices. 
 


Examples--Displaying Disk Slice Information
The following example displays slice information for disk /dev/rdsk/c2t0d0s0 


Total disk cylinders available: 24620 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       0                0         (0/0/0)            0
  1 unassigned    wm       0                0         (0/0/0)            0
  2     backup    wu       0 - 24619       33.92GB    (24620/0/0) 71127180
  3 unassigned    wm       0                0         (0/0/0)            0
  4 unassigned    wm       0                0         (0/0/0)            0
  5 unassigned    wm       0                0         (0/0/0)            0
  6 unassigned    wm       0 - 24618       33.91GB    (24619/0/0) 71124291
  7 unassigned    wm       0                0         (0/0/0)            0

The following example displays slice information for disk /dev/rdsk/c2t8d0s0 


Total disk cylinders available: 24620 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       0                0         (0/0/0)            0
  1 unassigned    wm       0                0         (0/0/0)            0
  2     backup    wu       0 - 24619       33.92GB    (24620/0/0) 71127180
  3 unassigned    wm       0                0         (0/0/0)            0
  4 unassigned    wm       0                0         (0/0/0)            0
  5 unassigned    wm       0                0         (0/0/0)            0
  6 unassigned    wm       0 - 24618       33.91GB    (24619/0/0) 71124291
  7 unassigned    wm       0                0         (0/0/0)            0

 



--------------------------------------------------------------------------------

 

Creating and Examining a Disk Label
Labeling a disk is usually done during system installation or when you are creating new disk slices. 
You might need to relabel a disk if the disk label is corrupted (for example, from a power failure). 
The format utility will attempt to automatically configure any unlabeled SCSI disk. If format is able 
to automatically configure an unlabeled disk, it will display a message like the following: 


c1t0d0:configured with capacity of 404.65MB 
 

How to Label a Disk

Become superuser. 

Enter the format utility. 

Enter the number of the disk that you want to label from the list displayed on your screen. 
Specify disk (enter its number):1 

If the disk is unlabeled and was successfully configured, format will ask if you want to label the disk. 
Go to step 5 to label the disk. 
If the disk was labeled and you want to change the type, or format was not able to automatically configure 
the disk you must specify the disk type. Go to steps 6-7 to set the disk type and label the disk. 


Label the disk by typing y at the Label it now? prompt. 

Disk not labeled. Label it now? y 

The disk is now labeled. Go to step 10 to exit the format utility. 

Enter type at the format> prompt. 

format> type 
Format displays the Available Drive Types menu. 

Select a disk type from the list of possible disk types. 

Specify disk type (enter its number)[12]: 12 

Label the disk. If the disk is not labeled, the following message is displayed. 

Disk not labeled. Label it now? y 
Otherwise you are prompted with this message: 

Ready to label disk, continue? y 

Use the verify command from the format main menu to verify the disk label. 

format> verify 

Exit the format utility by typing q at the format> prompt. 

partition> q 
format> q 
# 
 

Example-Labeling a Disk
The following example automatically configures and labels a 1.05-Gbyte disk. 


# format
 c1t0d0: configured with capacity of 1002.09MB
 AVAILABLE DISK SELECTIONS:
   0. c0t3d0 
     /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0
   1. c1t0d0 
     /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0
Specify disk (enter its number): 1
Disk not labeled.  Label it now?  yes
format> verify
#

 

How to Examine a Disk Label


Examine disk label information by using the prtvtoc(1M) command. See Chapter 28, Disk Management (Overview) for 
a detailed description of the disk label and the information displayed by the prtvtoc command. 

Become superuser. 

Display the disk label information by using the prtvtoc command. 

# prtvtoc /dev/rdsk/device-name 
 


Automatically Configuring SCSI Disk Drives
In Solaris 2.3 release and compatible versions, the format utility automatically configures SCSI disk drives even if 
that specific type of drive is not listed in the /etc/format.dat file. This feature enables you to format, slice, 
and label any disk driver compliant with SCSI-2 specification for disk device mode sense pages. 
The following steps are involved in configuring a SCSI drive using autoconfiguration: 

Shutting down the system 
Attaching the SCSI disk drive to the system 
Turning on the disk drive 
Performing a reconfiguration boot 
Using the format utility to automatically configure the SCSI disk drive 
After the reconfiguration boot, invoke the format utility. The format utility will attempt to configure the disk and, 
if successful, alert the user that the disk was configured. See How to Automatically Configure a SCSI Drive 
for step-by-step instructions on configuring a SCSI disk drive automatically. 

Here are the default slice rules that format uses to create the partition table. 


Disk Size Root File System Swap Slice 
0 - 180 Mbytes 16 Mbytes 16 Mbytes 
180 Mbytes - 280 Mbytes  16 Mbytes 32 Mbytes 
280 Mbytes - 380 Mbytes 24 Mbytes 32 Mbytes 
380 Mbytes - 600 Mbytes 32 Mbytes 32 Mbytes 
600 Mbytes - 1.0 Gbytes 32 Mbytes 64 Mbytes 
1.0 Gbytes - 2.0 Gbytes 64 Mbytes 128 Mbytes 
More than 2.0 Gbytes 128 Mbytes 128 Mbytes 

In all cases, slice 6 (for the /usr file system) gets the remainder of the space on the disk. 

Here's an example of a format-generated partition table for a 1.3-Gbyte SCSI disk drive. 


Part    Tag    Flag     Cylinders     Size        Blocks
   0     root    wm       0 -   96    64.41MB      (97/0/0)
   1     swap    wu      97 -  289   128.16MB     (193/0/0)
   2   backup    wu       0 - 1964     1.27GB    (1965/0/0)
   6      usr    wm     290 - 1964     1.09GB    (1675/0/0)

 

How to Automatically Configure a SCSI Drive


Become superuser. 

Create the /reconfigure file that will be read when the system is booted. 

# /tech/sun/commands/touch.html">touch /reconfigure 

Shut down the system. 

# /tech/sun/commands/shutdown.html">shutdown -i0 -g30 -y 
The ok or > prompt is displayed after the operating environment is shut down. 


Turn off power to the system and all external peripheral devices. 

Make sure the disk you are adding has a different target number than the other devices on the system. 
You will often find a small switch located at the back of the disk for this purpose. 

Connect the disk to the system and check the physical connections. 

Turn on the power to all external peripherals. 

Turn on the power to the system. The system will boot and display the login prompt. 

Login as superuser, invoke the format utility, and select the disk to be configured automatically. 

# format
Searching for disks...done
c1t0d0: configured with capacity of 1002.09MB
AVAILABLE DISK SELECTIONS:
  0. c0t1d0 
     /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0
  1. c0t3d0 
     /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0
Specify disk (enter its number): 1


Reply yes to the prompt to label the disk. Replying y will cause the disk label to be generated and written 
to the disk by the autoconfiguration feature. 

Disk not labeled. Label it now? y 

Verify the disk label with the verify command. 

format> verify 

Exit the format utility. 

format> q 
 



--------------------------------------------------------------------------------

 

SPARC: How to Create Disk Slices and Label a Disk

Become superuser. 

Start the format(1M) utility. 

# format 
A list of available disks is displayed. 

Enter the number of the disk that you want to repartition from the list displayed on your screen. 

Specify disk (enter its number): disk-number 

Go into the partition menu (which lets you set up the slices). 

format> partition 

Display the current partition (slice) table. 

partition> print 

Start the modification process. 

partition> modify 

Set the disk to all free hog. 

Choose base (enter number) [0]? 1 
See Using the Free Hog Slice for more information about the free hog slice. 

Create a new partition table by answering y when prompted to continue. 

Do you wish to continue creating a new partition table based on above table[yes]? y 

Identify the free hog partition (slice) and the sizes of the slices when prompted. When adding a system disk, 
you must set up slices for: root (slice 0) and swap (slice 1) and/or /usr (slice 6) After you identify the slices, 
the new partition table is displayed. 

Make the displayed partition table the current partition table by answering y when asked. Okay to make this 
the current partition table[yes]? y If you don't want the current partition table and you want to change it, 
answer no and go to Step 6 . 

Name the partition table. 

Enter table name (remember quotes): "partition-name" 

Label the disk with the new partition table when you have finished allocating slices on the new disk. 

Ready to label disk, continue? yes 

Quit the partition menu. 

partition> q 

Verify the disk label using the verify command. 

format> verify 

Quit the format menu. 

format> q 
 

SPARC: Example-Creating Disk Slices and Labeling a System Disk
The following example uses the format utility to divide a 1-Gbyte disk into three slices: 
one for the root (/) file system, one for the swap area, and one for the /usr file system. 


# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
   0. c0t1d0 
      /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0
   1. c0t3d0 
      /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0
Specify disk (enter its number): 0
selecting c0t1d0
[disk formatted]
format> partition
partition> print
partition> modify
Select partitioning base:
 0. Current partition table (original)
 1. All Free Hog
Choose base (enter number) [0]? 1
 Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       0               0         (0/0/0)          0
  1       swap    wu       0               0         (0/0/0)          0
  2     backup    wu       0 - 2035     1002.09MB    (2036/0/0) 2052288
  3 unassigned    wm       0               0         (0/0/0)          0
  4 unassigned    wm       0               0         (0/0/0)          0
  5 unassigned    wm       0               0         (0/0/0)          0
  6        usr    wm       0               0         (0/0/0)          0
  7 unassigned    wm       0               0         (0/0/0)          0
Do you wish to continue creating a new partition
table based on above table[yes]? yes
Free Hog partition[6]? 6
Enter size of partition `0' [0b, 0c, 0.00mb]: 200mb
Enter size of partition `1' [0b, 0c, 0.00mb]: 200mb
Enter size of partition `3' [0b, 0c, 0.00mb]:
Enter size of partition `4' [0b, 0c, 0.00mb]:
Enter size of partition `6' [0b, 0c, 0.00mb]:
Enter size of partition `7' [0b, 0c, 0.00mb]:
  Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       0 -  406      200.32MB    (407/0/0)   410256
  1       swap    wu     407 -  813      200.32MB    (407/0/0)   410256
  2     backup    wu       0 - 2035     1002.09MB    (2036/0/0) 2052288
  3 unassigned    wm       0               0         (0/0/0)          0
  4 unassigned    wm       0               0         (0/0/0)          0
  5 unassigned    wm       0               0         (0/0/0)          0
  6        usr    wm     814 - 2035      601.45MB    (1222/0/0) 1231776
  7 unassigned    wm       0               0         (0/0/0)          0
 Okay to make this the current partition table[yes]? yes
Enter table name (remember quotes): "disk0"
Ready to label disk, continue? yes
partition> quit
format> verify
format> quit

 

SPARC: Example-Creating Disk Slices and Labeling a Secondary Disk
The following example uses the format utility to divide a 1-Gbyte disk into one slice for the /export/home file system. 


# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
   0. c0t1d0 
      /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0
   1. c0t3d0 
      /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0
Specify disk (enter its number): 0
selecting c0t1d0
[disk formatted]
format> partition
partition> print
partition> modify
Select partitioning base:
 0. Current partition table (original)
 1. All Free Hog
Choose base (enter number) [0]? 1
 Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       0               0         (0/0/0)          0
  1       swap    wu       0               0         (0/0/0)          0
  2     backup    wu       0 - 2035     1002.09MB    (2036/0/0) 2052288
  3 unassigned    wm       0               0         (0/0/0)          0
  4 unassigned    wm       0               0         (0/0/0)          0
  5 unassigned    wm       0               0         (0/0/0)          0
  6        usr    wm       0               0         (0/0/0)          0
  7 unassigned    wm       0               0         (0/0/0)          0
Do you wish to continue creating a new partition
table based on above table[yes]? y
Free Hog partition[6]? 7
Enter size of partition '0' [0b, 0c, 0.00mb, 0.00gb]: 
Enter size of partition '1' [0b, 0c, 0.00mb, 0.00gb]: 
Enter size of partition '3' [0b, 0c, 0.00mb, 0.00gb]: 
Enter size of partition '4' [0b, 0c, 0.00mb, 0.00gb]: 
Enter size of partition '5' [0b, 0c, 0.00mb, 0.00gb]: 
Enter size of partition '6' [0b, 0c, 0.00mb, 0.00gb]:
 Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       0               0         (0/0/0)          0
  1       swap    wu       0               0         (0/0/0)          0
  2     backup    wu       0 - 2035     1002.09MB    (2036/0/0) 2052288
  3 unassigned    wm       0               0         (0/0/0)          0
  4 unassigned    wm       0               0         (0/0/0)          0
  5 unassigned    wm       0               0         (0/0/0)          0
  6        usr    wm       0               0         (0/0/0)          0
  7 unassigned    wm       0 - 2035     1002.09MB    (2036/0/0) 2052288 
Okay to make this the current partition table[yes]? yes
Enter table name (remember quotes): "home"
Ready to label disk, continue? y
partition> q
format> verify
format> q
# 


 

SPARC: How to Create File Systems


Become superuser. 

Create a file system for each slice with the newfs(1M) command. 

# newfs /dev/rdsk/cwtxdysz 

Verify the new file system by mounting it on an unused mount point. 

# mount /dev/dsk/cwtxdysz /mnt 
# ls 
lost+found 
 

 

How to Stop All Processes Accessing a File System

Become superuser. 

List all the processes that are accessing the file system, so you know which processes you are going to stop. 

# /tech/sun/commands/fuser.html">fuser -c [ -u ] mount-point 

Stop all processes accessing the file system. You should not stop a user's processes without warning. 

# /tech/sun/commands/fuser.html">fuser -c -k mount-point 
A SIGKILL is sent to each process using the file system. 

Verify that there are no processes accessing the file system. 

# /tech/sun/commands/fuser.html">fuser -c mount-point 
 



--------------------------------------------------------------------------------

 

Add Disk 
Follow the steps below to add a new external/internal disk: 


Bring the system down to the ok prompt. 

# init 0


Find an available target setting. This command will show what you currently have on your system. 

# probe-scsi

If the disk is on another scsi controller (another card off of an sbus slot) 

# probe-scsi-all


Attach the new disk with the correct target setting. Run probe-scsi again to make sure the system sees it. If it doesn't, the disk is either not connected properly, has a target conflict, or is defective. Resolve this issue before continuing. 
In this example, we'll say: 


T3 original internal drive 
T1 new/other internal drive where a duplicate copy of the OS will be placed. 

Perform a reconfiguration boot. 

# boot -rv
rv -> reconfigure in verbose mode.



Run format and partition the disk. (Here's our example): 


# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:

1. c0t1d0 
/iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@1,0
2. c0t3d0 
/iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@3,0
Specify disk (enter its number): 1
selecting c0t1d0
[disk formatted]

FORMAT MENU:
disk 		- select a disk
type 		- select (define) a disk type
partition 	- select (define) a partition table
current 	- describe the current disk
format 		- format and analyze the disk
repair 		- repair a defective sector
label 		- write label to the disk
analyze 	- surface analysis
defect 		- defect list management
backup 		- search for backup labels
verify 		- read and display labels
save 		- save new disk/partition definitions
inquiry 	- show vendor, product and revision
volname 	- set 8-character volume name
quit
format> part

PARTITION MENU:
0 	- change `0' partition
1 	- change `1' partition
2 	- change `2' partition
3 	- change `3' partition
4 	- change `4' partition
5 	- change `5' partition
6 	- change `6' partition
7 	- change `7' partition
select 	- select a predefined table
modify 	- modify a predefined partition table
name 	- name the current table
print 	- display the current table
label 	- write partition map and label to the disk
quit

partition> print

Current partition table (original):
Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part 	Tag 	Flag 	Cylinders 	Size 			Blocks
0 	root 	wm 	0 - 203 	100.41MB 	(204/0/0) 	205632
1 	swap 	wu 	204 - 407 	100.41MB 	(204/0/0) 	205632
2 	backup 	wm 	0 - 2035 	1002.09MB 	(2036/0/0) 	2052288
3   unassigned 	wm 	0 		0 		(0/0/0) 	0
4 	var 	wm 	408 - 611 	100.41MB 	(204/0/0) 	205632
5   unassigned 	wm 	612 - 1018 	200.32MB 	(407/0/0) 	410256
6 	usr 	wm 	1019 - 2034 	500.06MB 	(1016/0/0) 	1024128
7   unassigned 	wm 	0 		0 		(0/0/0) 	0

partition>

****** Modify partitions to suit your needs ******
****** Do NOT alter partition 2, backup !!! ******


In this example we'll go with the current displayed partition table listed: 

partition> 0
Part 	    Tag 	Flag 	Cylinders 	Size 	     Blocks
0 	unassigned 	wm 	0 - 162 	80.23MB (163/0/0) 164304

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: o
`o' is not an integer.
Enter new starting cyl[0]: 0
Enter partition size[164304b, 163c, 80.23mb, 0.08gb]: 100.41mb
partition> pr
Current partition table (unnamed):
Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part 	   Tag 		Flag Cylinders 		Size 		Blocks
0 	unassigned 	wm 	0 - 203 	100.41MB 	(204/0/0) 	205632
1 	unassigned 	wu 	163 - 423 	128.46MB 	(261/0/0) 	263088
2 	backup 		wu 	0 - 2035 	1002.09MB 	(2036/0/0) 	2052288
3 	unassigned 	wm 	0 		0 		(0/0/0) 	0
4 	unassigned 	wm 	424 - 749 	160.45MB 	(326/0/0) 	328608
5 	unassigned 	wm 	750 - 1109 	177.19MB 	(360/0/0) 	362880
6 	unassigned 	wm 	1110 - 2035 	455.77MB 	(926/0/0) 	933408
7 	unassigned 	wm 	0 		0 		(0/0/0) 	0

partition> 1
Part 	Tag 		Flag Cylinders 		Size 		Blocks
1 	unassigned 	wu 	163 - 423 	128.46MB 	(261/0/0) 	263088

Enter partition id tag[unassigned]:
Enter partition permission flags[wu]:
Enter new starting cyl[163]: 204
Enter partition size[263088b, 261c, 128.46mb, 0.13gb]: 100.41mb
partition> pr
Current partition table (unnamed):
Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part 	Tag 		Flag Cylinders 		Size 			Blocks
0 	unassigned 	wm 	0 - 203 	100.41MB 	(204/0/0) 	205632
1 	unassigned 	wu 	204 - 407 	100.41MB 	(204/0/0) 	205632
2 	backup 		wu 	0 - 2035 	1002.09MB 	(2036/0/0) 	2052288
3 	unassigned 	wm 	0 		0 		(0/0/0) 	0
4 	unassigned 	wm 	424 - 749 	160.45MB 	(326/0/0) 	328608
5 	unassigned 	wm 	750 - 1109 	177.19MB 	(360/0/0) 	362880
6 	unassigned 	wm 	1110 - 2035 	455.77MB 	(926/0/0) 	933408
7 	unassigned 	wm 	0 		0 		(0/0/0) 	0


partition> 4
Part 	Tag 		Flag 	Cylinders 	Size 		Blocks
4 	unassigned 	wm 	424 - 749 	160.45MB 	(326/0/0) 328608

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[424]: 408
Enter partition size[328608b, 326c, 160.45mb, 0.16gb]: 100.41mb
partition> pr
Current partition table (unnamed):
Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part 	Tag 		Flag 	Cylinders 	Size 			Blocks
0 	unassigned 	wm 	0 - 203 	100.41MB 	(204/0/0) 	205632
1 	unassigned 	wu 	204 - 407 	100.41MB 	(204/0/0) 	205632
2 	backup 		wu 	0 - 2035 	1002.09MB 	(2036/0/0) 	2052288
3 	unassigned 	wm 	0 		0 		(0/0/0) 	0
4 	unassigned 	wm 	408 - 611 	100.41MB 	(204/0/0) 	205632
5 	unassigned 	wm 	750 - 1109 	177.19MB 	(360/0/0) 	362880
6 	unassigned 	wm 	1110 - 2035 	455.77MB 	(926/0/0) 	933408
7 	unassigned 	wm 	0 		0 		(0/0/0) 	0

partition> 5
Part 	Tag 		Flag 	Cylinders 	Size 			Blocks
5 	unassigned 	wm 	750 - 1109 	177.19MB 	(360/0/0) 	362880

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[750]: 612
Enter partition size[362880b, 360c, 177.19mb, 0.17gb]: 177mb
partition> pr
Current partition table (unnamed):
Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part 	Tag 		Flag 	Cylinders 	Size 			Blocks
0 	unassigned 	wm 	0 - 203 	100.41MB 	(204/0/0) 	205632
1 	unassigned 	wu 	204 - 407 	100.41MB 	(204/0/0) 	205632
2 	backup 		wu 	0 - 2035 	1002.09MB 	(2036/0/0) 	2052288
3 	unassigned 	wm 	0 		0 		(0/0/0) 	0
4 	unassigned 	wm 	408 - 611 	100.41MB 	(204/0/0) 	205632
5 	unassigned 	wm 	612 - 971 	177.19MB 	(360/0/0) 	362880
6 	unassigned 	wm 	1110 - 2035 	455.77MB 	(926/0/0) 	933408
7 	unassigned 	wm 	0 		0 		(0/0/0) 	0

partition> 6
Part 	Tag 		Flag 	Cylinders 	Size 			Blocks
6 	unassigned 	wm 	1110 - 2035 	455.77MB 	(926/0/0) 	933408

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[1110]: 972
Enter partition size[933408b, 926c, 455.77mb, 0.45gb]: $
partition> pr
Current partition table (unnamed):
Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part 	Tag 		Flag 	Cylinders 	Size 			Blocks
0 	unassigned 	wm 	0 - 203 	100.41MB 	(204/0/0) 	205632

1 unassigned wu 204 - 407 100.41MB (204/0/0) 205632
2 backup wu 0 - 2035 1002.09MB (2036/0/0) 2052288
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 408 - 611 100.41MB (204/0/0) 205632
5 unassigned wm 612 - 971 177.19MB (360/0/0) 362880
6 unassigned wm 972 - 2035 523.69MB (1064/0/0) 1072512
7 unassigned wm 0 0 (0/0/0) 0

partition>

NOTE: You will know for certain that your partitioning is correct if you add all the cylinder values [the values enclosed in ( )], like so, 204+204+204+360+1064=2036 which is the same value for slice 2 or the whole disk (Tag = backup). 

Now label the disk. This is important as this is what saves the partition table in your VTOC (Virtual Table Of Contents). It's also always recommended to do the labeling part twice to be certain that the VTOC gets saved. 


partition> label
partition> q
format> q

After partitioning c0t1d0 to be exactly the same as c0t3d0, be sure you label the disk so that VTOC gtes updated with the correct partition table. 

To recap, our scenario is: 


c0t3d0 (running Solaris 2.6) being copied to c0t1d0 (which will have the copied Solaris 2.6 slices/partitions) 
c0t3d0s0 / -> c0t1d0s0 /
c0t3d0s4 /var -> c0t1d0s4 /var
c0t3d0s5 /opt -> c0t1d0s5 /opt
c0t3d0s6 /usr -> c0t1d0s6 /usr


For each of the partitions that you wish to mount, run newfs to contruct a unix filesystem. 
So, newfs each partition. 


# newfs -v /dev/rdsk/c0t1d0s0
# newfs -v /dev/rdsk/c0t1d0s4
# newfs -v /dev/rdsk/c0t1d0s5
# newfs -v /dev/rdsk/c0t1d0s6


To ensure that they are clean and mounted properly, run fsck on these mounted partitions: 

# fsck /dev/rdsk/c0t1d0s0
# fsck /dev/rdsk/c0t1d0s4
# fsck /dev/rdsk/c0t1d0s5
# fsck /dev/rdsk/c0t1d0s6


Make the mount points. 

# /tech/sun/commands/mkdir.html">mkdir /mount_point

Create mountpoints for each slice/partition, like so: 


# /tech/sun/commands/mkdir.html">mkdir /root2
# /tech/sun/commands/mkdir.html">mkdir /var2
# /tech/sun/commands/mkdir.html">mkdir /opt2
# /tech/sun/commands/mkdir.html">mkdir /usr2


Mount the new partitions. 

# mount /dev/dsk/c0t1d0sX /mount_point

Mount each partition (of the new disk), like so: 

# mount /dev/dsk/c0t1d0s0 /root2
# mount /dev/dsk/c0t1d0s4 /var2
# mount /dev/dsk/c0t1d0s5 /opt2
# mount /dev/dsk/c0t1d0s6 /usr2


Now we /tech/sun/commands/ufsdump.html">ufsdump each slices/partitions: It is often difficult to copy from one disk to another disk. If you try to use dd, and the disks are of differing sizes, then you will undoubtedly run into trouble. Use this method to copy from disk to disk and you should not have any problems. Of course you're still on the old disk (that's where you booted from c0t3d0): 

# cd /

(Just ensures that you are in the root's parent/top directory). 


# /tech/sun/commands/ufsdump.html">ufsdump 0f - /dev/rdsk/c0t3d0s0 | (cd /root2; /tech/sun/commands/ufsrestore.html">ufsrestore rf -)
# /tech/sun/commands/ufsdump.html">ufsdump 0f - /dev/rdsk/c0t3d0s4 | (cd /var2; /tech/sun/commands/ufsrestore.html">ufsrestore rf -)
# /tech/sun/commands/ufsdump.html">ufsdump 0f - /dev/rdsk/c0t3d0s5 | (cd /opt2; /tech/sun/commands/ufsrestore.html">ufsrestore rf -)
# /tech/sun/commands/ufsdump.html">ufsdump 0f - /dev/rdsk/c0t3d0s6 | (cd /usr2; /tech/sun/commands/ufsrestore.html">ufsrestore rf -)

The gotcha here is that you can't really specify the directory name as /tech/sun/commands/ufsdump.html">ufsdump will interpret it as not being a block or character device. To illustrate this error: 

# cd /usr
# /tech/sun/commands/ufsdump.html">ufsdump 0f - /usr | (cd /usr2; /tech/sun/commands/ufsrestore.html">ufsrestore xf - )
DUMP: Writing 32 Kilobyte records
DUMP: Date of this level 0 dump: Wed Dec 10 17:33:42 1997
DUMP: Date of last level 0 dump: the epoch
DUMP: Dumping /dev/rdsk/c0t3d0s0 (tmpdns:/usr) to standard output
DUMP: Mapping (Pass I) [regular files]
DUMP: Mapping (Pass II) [directories]
DUMP: Estimated 317202 blocks (154.88MB)
DUMP: Dumping (Pass III) [directories]
DUMP: Broken pipe
DUMP: The ENTIRE dump is aborted

If you want to use the directory names to simplify your command line, use the tar command instead of /tech/sun/commands/ufsdump.html">ufsdump as follows: 

Example: 


# cd /usr
# tar cvfp - . | (cd /usr2; tar xvfp - )


OPTIONAL (This may be redundant BUT ensures that the copied files are once again clean and consistent). Checking the integrity of a filesystem is always highly recommended even if it becomes redundant in nature. Now, check and run fsck on the new partition/slices: 

# fsck /dev/rdsk/c0t1d0s0
# fsck /dev/rdsk/c0t1d0s4
# fsck /dev/rdsk/c0t1d0s5
# fsck /dev/rdsk/c0t1d0s6


Edit your /mount_point/etc/vfstab file to have this disk bootup from the correct disk/devices c0t1d0 as opposed to c0t3d0. 

# cd /root2
# vi /root2/etc/vfstab

Change c0tXd0sX devices to reflect the new disk! 

#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
#/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr ufs 1 yes -
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c0t1d0s1 - - swap - no -
/dev/dsk/c0t1d0s0 /dev/rdsk/c0t1d0s0 / ufs 1 no -
/dev/dsk/c0t1d0s6 /dev/rdsk/c0t1d0s6 /usr ufs 1 no -
/dev/dsk/c0t1d0s4 /dev/rdsk/c0t1d0s4 /var ufs 1 no -
/dev/dsk/c0t1d0s5 /dev/rdsk/c0t1d0s5 /opt ufs 2 yes -
swap - /tmp tmpfs - yes -
:wq!


Now you must run /tech/sun/commands/installboot.html">installboot to load a new bootblk on that disk. Not loading a bootblk will leave this disk in an unbootable state as the boot strap program is contained within the bootblk, and this in turn is what loads the boot file called ufsboot after interfacing with the OBP (Open Boot PROM). 
You can do this from your current booted disk or you may choose to boot off from cdrom via ok> boot cdrom -sw (single-user mode, writeable mode off of cdrom's mini-root). 

If you choose to get bootblk from your current disk the location of the bootblk in Solaris 2.5 or higher is under: 


/usr/platform/`uname -i`/lib/fs/ufs/bootblk


# /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \
/dev/rdsk/c0t1d0s0

If you choose to get bootblk from your cdrom image: 


ok> boot cdrom -sw
# /tech/sun/commands/installboot.html">installboot /cdrom/solaris_2_5_sparc/s0/export/exec/sparc.Solaris_2.5 \
/usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0txd0s0

ANOTHER SPARC EXAMPLE: 
To install a ufs bootblock on slice 0 of target 0 on con- troller 1, of the platform where the command is being run, use: 


example# /tech/sun/commands/installboot.html">installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \
/dev/rdsk/c1t0d0s0


Now create an alias for the other disk (this may be existent if it's off of the onboard/first scsi controller). 

ok> probe-scsi
    T3 original boot disk
    T1 new disk with copied slices

Verify via devalias command to see current /tech/sun/commands/aliases.html">aliases: disk1 is for sd@1,0 which is scsi id/target 1 


ok> devalias

ok> setenv boot-device disk1
ok> boot -rv

You do not necessarily need to do a reconfiguration boot as devices had already been created. This parameter will only be run if you attached new devices to your system. 

By default this will always boot from the new disk. If you want to boot from the old disk you can manually tell it to boot to that alias, like so: 


ok> boot disk
or
ok> boot disk3

(This will boot off from any Target 3/scsi id 3 internal disk). Also see INFODOC #'s 14046, 11855, 11854 for setting different boot devalias'es. 

NOTE: If the new disk encounters a problem on booting, most likely cause would be inappropriate /tech/sun/commands/devlinks.html">devlinks so, the course of action to take here is the /etc/path_to_inst, /dev, /devices fix: The following is a solution to solve problems with /dev, /devices, and/or /etc/path-to_inst. This routine extracts the defaults (with links intact) from the Solaris 2.x CD-ROM. 


ok> boot cdrom -sw

# mount /dev/dsk/c0t1d0s0 /a ** This step assumes your boot disk is
c0t1d0s0
# cd /tmp/dev
# tar cvfp - . | (cd /a/dev; tar xvfp - )
# cd /tmp/devices
# tar cvfp - . | (cd /a/devices; tar xvfp - )
# cd /tmp/root/etc
# /tech/sun/commands/cp.html">cp path_to_inst /a/etc/path_to_inst
# /tech/sun/commands/reboot.html">reboot -- -rv


If you plan to move this new disk you copied the OS on, you MUST ensure that it will be moved to a similar architecture and machine type as hardware address paths are usually different from one machine to another. 
Each hardware platform has a hardware device tree which must match the device tree information saved during installation in /devices and the /dev directories. 

Another reason is that a kernel from one architecture cannot boot on a machine of a different architecture. Customers often overlook these architecture differences (Sun 4/4c/4m/4d/4u). A boot drive moved from a SPARCstation 2 (sun4c architecture) cannot boot on a SPARCstation 5 (sun4m architecture). 

For more details on why you can't move Solaris 2.X boot disk between machines please see INFODOC 13911 and 13920. 

Also ensure that you have the correct /tech/sun/commands/hostname.html">hostname, IP address and vfstab entries for this new drive if you plan to move it to another machine. 

 

Add a New Disk II
In this example, the Sun StorEdge D1000 tray is connected to a UDWIS host adapter corresponding to controller c2 and a drive was added to slot 4 on the tray. The new drive appears as /dev/dsk/c2t4d0s[0-7] and /dev/rdsk/c2t4d0s[0-7]. 


Add the new device: 

# drvconfig (or devfsadm) 
# disks 

Verify the new disk has been created: 

# ls -l /dev/dsk/c1t4d0s* 

The new disk drive is now available for use as a block or character device. Refer to sd for more info. 


7.4 bare-metal restore procedure.
=================================

SUMMARY: Help troubleshooting bare-metal restore procedure.

Thank you:
   Anand Chouthai
   Roy Erickson

With help from contributors I identified two things wrong
w/ the procedure I was implementing:

 1. I needed to update /dev,/devices, and /etc/path_to_inst
    so that the replacement FC drive w/ a new WWN 
    was correctly recognized as the root disk (this was on a Sun 280R).

 2. After fixing that it helped me uncover the "lockup". It was because
    I didn't add the "-H" option to bprestore which indicates that
    to rename the hardlink targets as well as the normal
    source files (I thought bprestore had an AI module that figured
    all that out ;-).

 After making these two changes I was able to get the system
 back to a sane state.

 Below are the updated scripts. 

 The first one is the revised script I am using to do the actual
 bare metal restore.

 The second is a script I use to build a recovery image. Certain
 packages and tarballs are needed which aren't included but
 the general idea is there.

 --

 The only quirk I have observed is that /var/run doesn't get
 mounted as swap (the system creates a normal directory).

    mount: mount-point /var/run does not exist.

 Not sure about this one but for now I will live w/ it.

 Thanks again Sun Managers!

 Kevin Counts 

 --

Script #1:

#!/bin/sh
#------------------------------------------------------------------------
# $Id: recover-egate2.sh,v 1.7 2004/03/01 19:36:06 countskm Exp $
#------------------------------------------------------------------------
# Custom script to restore egate2 (run from jumpstart recovery image).
#-------------------------------------------------------------------------

#-------------------------------------------------------------------------
# Create pre-defined vtoc for 36GB FC Drive
#-------------------------------------------------------------------------

/usr/sbin/fmthard -s - /dev/rdsk/c1t0d0s2 <<EOF
       0      2    00          0   8389656   8389655
       1      3    01    8389656   8389656  16779311
       2      5    00          0  71127180  71127179
       3      7    00   16779312  16779312  33558623
       4      0    00   33558624  37516554  71075177
       6      0    00   71075178     26001  71101178
       7      0    00   71101179     26001  71127179
EOF

echo "y" | /usr/sbin/newfs /dev/rdsk/c1t0d0s0
echo "y" | /usr/sbin/newfs /dev/rdsk/c1t0d0s3
echo "y" | /usr/sbin/newfs /dev/rdsk/c1t0d0s4

/usr/sbin/fsck /dev/rdsk/c1t0d0s0
/usr/sbin/fsck /dev/rdsk/c1t0d0s3
/usr/sbin/fsck /dev/rdsk/c1t0d0s4

mount /dev/dsk/c1t0d0s0 /a
mkdir -p /a/var
mkdir -p /a/opt
mount /dev/dsk/c1t0d0s3 /a/var
mount /dev/dsk/c1t0d0s4 /a/opt

#------------------------------------------------------------------------
server=veritas
log=/var/tmp/bprestore.log
rename=/var/tmp/bprestore.rename
filelist=/var/tmp/bprestore.filelist

# extra_opt="-e 2/01/2004 -C egate2"
  extra_opt="-C egate2"

cat <<EOF > ${filelist}
/
!/egate
EOF

cat <<EOF > ${rename}
change / to /a
EOF

cat /dev/null > ${log}

cat <<EOF

--------------------------------------------------------------------
 Running bprestore in foreground.                                   

 View logfile: $log in another login session for status.
 
 (A message will appear in this window when the restore is complete)
--------------------------------------------------------------------

EOF

echo \
/usr/openv/netbackup/bin/bprestore -w                \
                                   -H                \
                                   -S ${server}      \
                                   -L ${log}         \
                                   -R ${rename}      \
                                   ${extra_opt}      \
                                   -f ${filelist}

/usr/openv/netbackup/bin/bprestore -w                \
                                   -H                \
                                   -S ${server}      \
                                   -L ${log}         \
                                   -R ${rename}      \
                                   ${extra_opt}      \
                                   -f ${filelist}


#-------------------------------------------------------------------------
# Make excluded /egate mountpoint
#-------------------------------------------------------------------------
mkdir -p /a/egate

#-------------------------------------------------------------------------
# Unconfigure disksuite mirror
#-------------------------------------------------------------------------
mv /a/etc/lvm/mddb.cf /a/etc/lvm/mddb.cf.bak

sed -e 's!md/!!g'         \
    -e 's!d10!c1t0d0s0!g' \
    -e 's!d20!c1t0d0s1!g' \
    -e 's!d30!c1t0d0s3!g' \
    -e 's!d40!c1t0d0s4!g' \
/a/etc/vfstab > /a/etc/vfstab.tmp

cp /a/etc/vfstab     /a/etc/vfstab.bak
cp /a/etc/vfstab.tmp /a/etc/vfstab

sed -e '/^rootdev/ s/^/*/' \
    -e '/^set md/  s/^/*/' \
/a/etc/system > /a/etc/system.tmp

cp /a/etc/system     /a/etc/system.bak
cp /a/etc/system.tmp /a/etc/system

#-------------------------------------------------------------------------
# Rebuild /dev and /devices and /etc/path_to_inst
# Typically we don't backup /dev so check if its even there.
#-------------------------------------------------------------------------
[ -d /a/dev ] && mv /a/dev /a/dev.bak
                 mv /a/devices  /a/devices.bak

mkdir /a/dev
mkdir /a/devices

cd /dev     ;     find . -depth -print  | cpio -pdm /a/dev
cd /devices ;     find . -depth -print  | cpio -pdm /a/devices
cd

mv /a/etc/path_to_inst \
   /a/etc/path_to_inst.bak

cp /tmp/root/etc/path_to_inst \
   /a/etc/path_to_inst


#-------------------------------------------------------------------------
# Make mount points excluded from backup
#-------------------------------------------------------------------------
mkdir          /a/tmp
chmod 1777     /a/tmp
chown root:sys /a/tmp

#-------------------------------------------------------------------------
# Umount the slices and install the ufs boot block 
#-------------------------------------------------------------------------
umount /a/var
umount /a/opt
umount /a

/usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0

echo "--------------------------------------------------------------------"
echo " Restore complete - type \"reboot -- -r\" to reboot the system."
echo "--------------------------------------------------------------------"

#-------------------------------------------------------------------------
# End.
#-------------------------------------------------------------------------


Script #2:


#!/bin/sh
#-------------------------------------------------------------------------
# Configuring Solaris 8 Boot Image
#-------------------------------------------------------------------------
root=/export/install/SOL8-RECOVER-TEST/Solaris_8/Tools/Boot/
noask=/export/depot/fileset/isconf/plat/sunos/5.8/etc/noask_pkgadd
depot=/export/depot/pkg/sunos/5.8

#-------------------------------------------------------------------------
perl -pi -e '/^root/ && s/NP/<your own hash>/' $root/etc/shadow

exit 0
pkgadd -d ${depot}/SMC/SMCncurs-5.3 -R $root  \
       -n -a ${noask} all

pkgadd -d ${depot}/MCC/MCCssh2-3.2.3 -R $root \
       -n -a ${noask} all

pkgadd -d ${depot}/SMC/SMCbash-2.05 -R $root  \
       -n -a ${noask} all


#-------------------------------------------------------------------------
perl -pi -e ' /^\s*install\)/ and print <<EOF

                recover)
                         cat < /dev/null > /tmp/._recover_startup
                         shift
                         ;;

EOF
' $root/sbin/rcS

#-------------------------------------------------------------------------
perl -pi -e ' m!#/usr/sbin/inetd -s! and print <<EOF

if [ -f /tmp/._recover_startup ] ; then
   /usr/sbin/inetd -s
fi
EOF
' $root/sbin/sysconfig

#-------------------------------------------------------------------------
perl -pi -e ' m!exec /sbin/suninstall! and print <<EOF
if [ -f /tmp/._recover_startup ] ; then
   exec /bin/ksh -o vi
fi

EOF
' $root/sbin/sysconfig

#-------------------------------------------------------------------------
cp -rp tmp_proto/openv $root/.tmp_proto/

ln -s /tmp/openv $root/usr/openv

#-------------------------------------------------------------------------
cat <<EOF >> $root/etc/services
#
# NetBackup services
#
bprd    13720/tcp       bprd
bpcd    13782/tcp       bpcd
vopied  13783/tcp       vopied
bpjava-msvc     13722/tcp       bpjava-msvc
EOF

#-------------------------------------------------------------------------
cat <<EOF >> $root/etc/inetd.conf
#
# netbackup services
#
bpcd    stream  tcp     nowait  root    /usr/openv/netbackup/bin/bpcd bpcd
vopied  stream  tcp     nowait  root    /usr/openv/bin/vopied vopied
bpjava-msvc     stream  tcp     nowait  root    /usr/openv/netbackup/bin/bpjava-msvc bpjava-msvc -transient
EOF
_______________________________________________






Example about FileSystems on a SunOs 5.9 server
===============================================


Devices are described in three ways in the Solaris environment, using three distinct naming
conventions: the physical device name, the instance name, and the logical device name.

- Physical devices:
A "physical device name" represents the full pathname of the device. 
Physical device files are found in the /devices directory and have the following
naming convention:

/devices/sbus@1,f8000000/esp@0,40000/sd@3,0:a

Each device has a unique name representing both the type of device and the location of that device
in the system-addressing structure called the "device tree". The OpenBoot firmware builds the 
device tree for all devices from information gathered at POST. The device tree is loaded in memory
and is used by the kernel during boot to identify all configured devices.
A device pathname is a series of node names separated by slashes. Each device has the following form: 
  
driver-name@unit-address:device-arguments

On our testmachine, we find:

/devices>ls -al
total 70
drwxr-xr-x   7 root     sys          512 Aug 10  2004 .
drwxr-xr-x  25 root     root         512 Aug 17  2004 ..
crw-------   1 root     sys      201,  0 Aug 10  2004 memory-controller@0,0:mc-us3i
drwxr-xr-x   4 root     sys          512 Aug 10  2004 pci@1c,600000
crw-------   1 root     sys      109,767 Aug 10  2004 pci@1c,600000:devctl
drwxr-xr-x   2 root     sys          512 Aug 10  2004 pci@1d,700000
crw-------   1 root     sys      109,1023 Aug 10  2004 pci@1d,700000:devctl
drwxr-xr-x   4 root     sys          512 Aug 10  2004 pci@1e,600000
crw-------   1 root     sys      109,511 Aug 10  2004 pci@1e,600000:devctl
drwxr-xr-x   2 root     sys          512 Aug 10  2004 pci@1f,700000
crw-------   1 root     sys      109,255 Aug 10  2004 pci@1f,700000:devctl
drwxr-xr-x   2 root     sys        29696 Aug 11  2004 pseudo


- Instance name:
The "instance name" represents the kernel's abbreviated name for every possible device
on the system. For example, sd0 and sd1 represents the instance names of two SCSI disk devices.
Instance names are mapped in the /etc/path_to_inst file, an are displayed by using the
commands dmesg, sysdef, and prtconf

/devices>cd /etc
/etc>more path_to_inst
#
#       Caution! This file contains critical kernel state
#
"/options" 0 "options"
"/pci@1f,700000" 0 "pcisch"
"/pci@1f,700000/network@2" 0 "bge"
"/pci@1f,700000/network@2,1" 1 "bge"
"/pci@1e,600000" 1 "pcisch"
"/pci@1e,600000/ide@d" 0 "uata"
"/pci@1e,600000/ide@d/sd@0,0" 30 "sd"
"/pci@1e,600000/isa@7" 0 "ebus"
"/pci@1e,600000/isa@7/power@0,800" 0 "power"
"/pci@1e,600000/isa@7/rmc-comm@0,3e8" 0 "rmc_comm"
"/pci@1e,600000/isa@7/i2c@0,320" 0 "pcf8584"
"/pci@1e,600000/isa@7/i2c@0,320/motherboard-fru-prom@0,a2" 0 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/chassis-fru-prom@0,a8" 1 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/power-supply-fru-prom@0,b0" 2 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/power-supply-fru-prom@0,a4" 3 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/dimm-spd@0,b6" 4 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/dimm-spd@0,b8" 5 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/dimm-spd@0,c6" 6 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/dimm-spd@0,c8" 7 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/nvram@0,50" 8 "seeprom"
"/pci@1e,600000/isa@7/i2c@0,320/gpio@0,70" 0 "pca9556"
"/pci@1e,600000/isa@7/i2c@0,320/gpio@0,44" 1 "pca9556"
"/pci@1e,600000/isa@7/i2c@0,320/gpio@0,46" 2 "pca9556"
"/pci@1e,600000/isa@7/i2c@0,320/gpio@0,4a" 3 "pca9556"
"/pci@1e,600000/isa@7/i2c@0,320/gpio@0,68" 4 "pca9556"
"/pci@1e,600000/isa@7/i2c@0,320/gpio@0,88" 5 "pca9556"
"/pci@1e,600000/isa@7/serial@0,3f8" 0 "su"
"/pci@1e,600000/isa@7/serial@0,2e8" 1 "su"
"/pci@1e,600000/pmu@6" 0 "pmubus"
"/pci@1e,600000/pmu@6/gpio@8a" 0 "pmugpio"
"/pci@1e,600000/pmu@6/i2c@0" 0 "smbus"
"/pci@1e,600000/pmu@6/gpio@80000000" 1 "pmugpio"
"/pci@1e,600000/pmu@6/i2c@0,0" 1 "smbus"
"/pci@1e,600000/usb@a" 0 "ohci"
"/pci@1c,600000" 2 "pcisch"
"/pci@1c,600000/scsi@2" 0 "glm"
"/pci@1c,600000/scsi@2/sd@0,0" 0 "sd"
"/pci@1c,600000/scsi@2/sd@1,0" 1 "sd"
"/pci@1c,600000/scsi@2/sd@2,0" 2 "sd"
"/pci@1c,600000/scsi@2/sd@3,0" 3 "sd"
"/pci@1c,600000/scsi@2/sd@4,0" 4 "sd"
"/pci@1c,600000/scsi@2/sd@5,0" 5 "sd"
"/pci@1c,600000/scsi@2/sd@6,0" 6 "sd"
"/pci@1c,600000/scsi@2/sd@8,0" 7 "sd"
"/pci@1c,600000/scsi@2/sd@9,0" 8 "sd"
"/pci@1c,600000/scsi@2/sd@a,0" 9 "sd"
"/pci@1c,600000/scsi@2/sd@b,0" 10 "sd"
"/pci@1c,600000/scsi@2/sd@c,0" 11 "sd"
"/pci@1c,600000/scsi@2/sd@d,0" 12 "sd"
"/pci@1c,600000/scsi@2/sd@e,0" 13 "sd"
"/pci@1c,600000/scsi@2/sd@f,0" 14 "sd"
"/pci@1c,600000/scsi@2/st@0,0" 0 "st"
"/pci@1c,600000/scsi@2/st@1,0" 1 "st"
"/pci@1c,600000/scsi@2/st@2,0" 2 "st"
"/pci@1c,600000/scsi@2/st@3,0" 3 "st"
"/pci@1c,600000/scsi@2/st@4,0" 4 "st"
"/pci@1c,600000/scsi@2/st@5,0" 5 "st"
"/pci@1c,600000/scsi@2/st@6,0" 6 "st"
"/pci@1c,600000/scsi@2/ses@0,0" 0 "ses"
"/pci@1c,600000/scsi@2/ses@1,0" 1 "ses"
"/pci@1c,600000/scsi@2/ses@2,0" 2 "ses"
"/pci@1c,600000/scsi@2/ses@3,0" 3 "ses"
"/pci@1c,600000/scsi@2/ses@4,0" 4 "ses"
"/pci@1c,600000/scsi@2/ses@5,0" 5 "ses"
"/pci@1c,600000/scsi@2/ses@6,0" 6 "ses"
"/pci@1c,600000/scsi@2/ses@7,0" 7 "ses"
"/pci@1c,600000/scsi@2/ses@8,0" 8 "ses"
"/pci@1c,600000/scsi@2/ses@9,0" 9 "ses"
"/pci@1c,600000/scsi@2/ses@a,0" 10 "ses"
"/pci@1c,600000/scsi@2/ses@b,0" 11 "ses"
"/pci@1c,600000/scsi@2/ses@c,0" 12 "ses"
"/pci@1c,600000/scsi@2/ses@d,0" 13 "ses"
"/pci@1c,600000/scsi@2/ses@e,0" 14 "ses"
"/pci@1c,600000/scsi@2/ses@f,0" 15 "ses"
"/pci@1c,600000/scsi@2,1" 1 "glm"
"/pci@1c,600000/scsi@2,1/sd@0,0" 15 "sd"
"/pci@1c,600000/scsi@2,1/sd@1,0" 16 "sd"
"/pci@1c,600000/scsi@2,1/sd@2,0" 17 "sd"
"/pci@1c,600000/scsi@2,1/sd@3,0" 18 "sd"
"/pci@1c,600000/scsi@2,1/sd@4,0" 19 "sd"
"/pci@1c,600000/scsi@2,1/sd@5,0" 20 "sd"
"/pci@1c,600000/scsi@2,1/sd@6,0" 21 "sd"
"/pci@1c,600000/scsi@2,1/sd@8,0" 22 "sd"
"/pci@1c,600000/scsi@2,1/sd@9,0" 23 "sd"
"/pci@1c,600000/scsi@2,1/sd@a,0" 24 "sd"
"/pci@1c,600000/scsi@2,1/sd@b,0" 25 "sd"
"/pci@1c,600000/scsi@2,1/sd@c,0" 26 "sd"
"/pci@1c,600000/scsi@2,1/sd@d,0" 27 "sd"
"/pci@1c,600000/scsi@2,1/sd@e,0" 28 "sd"
"/pci@1c,600000/scsi@2,1/sd@f,0" 29 "sd"
"/pci@1c,600000/scsi@2,1/st@0,0" 7 "st"
"/pci@1c,600000/scsi@2,1/st@1,0" 8 "st"
"/pci@1c,600000/scsi@2,1/st@2,0" 9 "st"
"/pci@1c,600000/scsi@2,1/st@3,0" 10 "st"
"/pci@1c,600000/scsi@2,1/st@4,0" 11 "st"
"/pci@1c,600000/scsi@2,1/st@5,0" 12 "st"
"/pci@1c,600000/scsi@2,1/st@6,0" 13 "st"
"/pci@1c,600000/scsi@2,1/ses@0,0" 16 "ses"
"/pci@1c,600000/scsi@2,1/ses@1,0" 17 "ses"
"/pci@1c,600000/scsi@2,1/ses@2,0" 18 "ses"
"/pci@1c,600000/scsi@2,1/ses@3,0" 19 "ses"
"/pci@1c,600000/scsi@2,1/ses@4,0" 20 "ses"
"/pci@1c,600000/scsi@2,1/ses@5,0" 21 "ses"
"/pci@1c,600000/scsi@2,1/ses@6,0" 22 "ses"
"/pci@1c,600000/scsi@2,1/ses@7,0" 23 "ses"
"/pci@1c,600000/scsi@2,1/ses@8,0" 24 "ses"
"/pci@1c,600000/scsi@2,1/ses@9,0" 25 "ses"
"/pci@1c,600000/scsi@2,1/ses@a,0" 26 "ses"
"/pci@1c,600000/scsi@2,1/ses@b,0" 27 "ses"
"/pci@1c,600000/scsi@2,1/ses@c,0" 28 "ses"
"/pci@1c,600000/scsi@2,1/ses@d,0" 29 "ses"
"/pci@1c,600000/scsi@2,1/ses@e,0" 30 "ses"
"/pci@1c,600000/scsi@2,1/ses@f,0" 31 "ses"
"/pci@1d,700000" 3 "pcisch"
"/pci@1d,700000/network@2" 2 "bge"
"/pci@1d,700000/network@2,1" 3 "bge"
"/memory-controller@0,0" 0 "mc-us3i"
"/memory-controller@1,0" 1 "mc-us3i"
"/pseudo" 0 "pseudo"
"/scsi_vhci" 0 "scsi_vhci"
/etc>


- Logical Device names.
The "Logical device names" are used with most Solaris file system commands to refer to devices.
Logical device files in the /dev directory are symbolically linked to physical device files
in the /devices directory. Logical device names are used to access disk devices in the
following circumstances:
  - adding a new disk to the system and partitioning the disk
  - moving a disk from one system to another
  - accessing or mounting a file system residing on a local disk
  - backing up a local file system
  - repairing a file system

  Logical devices are organized in subdirs under the /dev directory by their device types
  /dev/dsk    block interface to disk devices
  /dev/rdsk   raw or character interface to disk devices
  /dev/rmt    tape devices
  /dev/term   serial line devices 
  etc..

  Logical device files have a major and minor number that indicate device drivers, 
  hardware addresses, and other characteristics.
  Furthermore, a device filename must follow a specific naming convention.
  A logical device name for a disk drive has the following format:

  /dev/[r]dsk/cxtxdxsx

  where cx refers to the SCSI controller number, tx to the SCSI bus target number,
  dx to the disk number (always 0 except on storage arrays)
  and sx to the slice or partition number.
  
/dev/ls -al
..
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd1a -> rdsk/c1t1d0s0
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd1b -> rdsk/c1t1d0s1
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd1c -> rdsk/c1t1d0s2
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd1d -> rdsk/c1t1d0s3
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd1e -> rdsk/c1t1d0s4
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd1f -> rdsk/c1t1d0s5
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd1g -> rdsk/c1t1d0s6
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd1h -> rdsk/c1t1d0s7
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd3a -> rdsk/c1t0d0s0
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd3b -> rdsk/c1t0d0s1
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd3c -> rdsk/c1t0d0s2
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd3d -> rdsk/c1t0d0s3
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd3e -> rdsk/c1t0d0s4
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd3f -> rdsk/c1t0d0s5
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd3g -> rdsk/c1t0d0s6
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsd3h -> rdsk/c1t0d0s7
lrwxrwxrwx   1 root     root          27 Aug 10  2004 rsm -> ../devices/pseudo/rsm@0:rsm
lrwxrwxrwx   1 root     root          13 Aug 10  2004 rsr0 -> rdsk/c0t0d0s2
lrwxrwxrwx   1 root     root           7 Aug 10  2004 rst12 -> rmt/0lb
lrwxrwxrwx   1 root     root           7 Aug 10  2004 rst20 -> rmt/0mb
lrwxrwxrwx   1 root     root           7 Aug 10  2004 rst28 -> rmt/0hb
lrwxrwxrwx   1 root     root           7 Aug 10  2004 rst36 -> rmt/0cb
lrwxrwxrwx   1 root     other         27 Aug 10  2004 rts -> ../devices/pseudo/rts@0:rts
drwxr-xr-x   2 root     sys          512 Aug 10  2004 sad
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd1a -> dsk/c1t1d0s0
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd1b -> dsk/c1t1d0s1
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd1c -> dsk/c1t1d0s2
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd1d -> dsk/c1t1d0s3
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd1e -> dsk/c1t1d0s4
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd1f -> dsk/c1t1d0s5
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd1g -> dsk/c1t1d0s6
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd1h -> dsk/c1t1d0s7
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd3a -> dsk/c1t0d0s0
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd3b -> dsk/c1t0d0s1
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd3c -> dsk/c1t0d0s2
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd3d -> dsk/c1t0d0s3
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd3e -> dsk/c1t0d0s4
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd3f -> dsk/c1t0d0s5
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd3g -> dsk/c1t0d0s6
lrwxrwxrwx   1 root     root          12 Aug 10  2004 sd3h -> dsk/c1t0d0s7
..

/dev>cd dsk
/dev/dsk>ls -al
total 58
drwxr-xr-x   2 root     sys          512 Aug 10  2004 .
drwxr-xr-x  14 root     sys         4096 Oct  4 14:15 ..
lrwxrwxrwx   1 root     root          42 Aug 10  2004 c0t0d0s0 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:a
lrwxrwxrwx   1 root     root          42 Aug 10  2004 c0t0d0s1 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:b
lrwxrwxrwx   1 root     root          42 Aug 10  2004 c0t0d0s2 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:c
lrwxrwxrwx   1 root     root          42 Aug 10  2004 c0t0d0s3 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:d
lrwxrwxrwx   1 root     root          42 Aug 10  2004 c0t0d0s4 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:e
lrwxrwxrwx   1 root     root          42 Aug 10  2004 c0t0d0s5 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:f
lrwxrwxrwx   1 root     root          42 Aug 10  2004 c0t0d0s6 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:g
lrwxrwxrwx   1 root     root          42 Aug 10  2004 c0t0d0s7 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:h
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t0d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:a
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t0d0s1 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:b
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t0d0s2 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:c
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t0d0s3 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:d
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t0d0s4 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:e
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t0d0s5 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:f
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t0d0s6 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:g
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t0d0s7 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:h
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t1d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:a
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t1d0s1 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:b
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t1d0s2 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:c
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t1d0s3 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:d
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t1d0s4 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:e
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t1d0s5 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:f
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t1d0s6 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:g
lrwxrwxrwx   1 root     root          43 Aug 10  2004 c1t1d0s7 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:h

/dev/dsk>cd ..
/dev>cd rdsk
/dev/rdsk>ls -al
total 58
drwxr-xr-x   2 root     sys          512 Aug 10  2004 .
drwxr-xr-x  14 root     sys         4096 Oct  4 14:15 ..
lrwxrwxrwx   1 root     root          46 Aug 10  2004 c0t0d0s0 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:a,raw
lrwxrwxrwx   1 root     root          46 Aug 10  2004 c0t0d0s1 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:b,raw
lrwxrwxrwx   1 root     root          46 Aug 10  2004 c0t0d0s2 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:c,raw
lrwxrwxrwx   1 root     root          46 Aug 10  2004 c0t0d0s3 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:d,raw
lrwxrwxrwx   1 root     root          46 Aug 10  2004 c0t0d0s4 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:e,raw
lrwxrwxrwx   1 root     root          46 Aug 10  2004 c0t0d0s5 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:f,raw
lrwxrwxrwx   1 root     root          46 Aug 10  2004 c0t0d0s6 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:g,raw
lrwxrwxrwx   1 root     root          46 Aug 10  2004 c0t0d0s7 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:h,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t0d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:a,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t0d0s1 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:b,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t0d0s2 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:c,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t0d0s3 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:d,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t0d0s4 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:e,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t0d0s5 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:f,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t0d0s6 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:g,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t0d0s7 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:h,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t1d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:a,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t1d0s1 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:b,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t1d0s2 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:c,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t1d0s3 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:d,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t1d0s4 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:e,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t1d0s5 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:f,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t1d0s6 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:g,raw
lrwxrwxrwx   1 root     root          47 Aug 10  2004 c1t1d0s7 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:h,raw

# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1c,600000/scsi@2/sd@0,0
       1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1c,600000/scsi@2/sd@1,0
Specify disk (enter its number):


# prtvtoc /dev/rdsk/c1t0d0s2
* /dev/rdsk/c1t0d0s2 partition map
*
* Dimensions:
*     512 bytes/sector
*     424 sectors/track
*      24 tracks/cylinder
*   10176 sectors/cylinder
*   14089 cylinders
*   14087 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00    9514560   8191680  17706239
       1      3    01          0   8395200   8395199
       2      5    00          0 143349312 143349311
       3      7    00    8466432   1048128   9514559
       4      0    00   51266688  33560448  84827135
       5      0    00   17706240  33560448  51266687
       6      8    00   84827136  58522176 143349311
       7      0    00    8395200     71232   8466431


# prtvtoc /dev/rdsk/c1t1d0s2
* /dev/rdsk/c1t1d0s2 partition map
*
* Dimensions:
*     512 bytes/sector
*     424 sectors/track
*      24 tracks/cylinder
*   10176 sectors/cylinder
*   14089 cylinders
*   14087 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00    9514560   8191680  17706239
       1      3    01          0   8395200   8395199
       2      5    00          0 143349312 143349311
       3      7    00    8466432   1048128   9514559
       4      0    00   51266688  33560448  84827135
       5      0    00   17706240  33560448  51266687
       6      8    00   84827136  58522176 143349311
       7      0    00    8395200     71232   8466431
#



Intel's x86 processors and their clones are little endian. 
Sun's SPARC,Motorola's 68K, and the PowerPC families are all big endian. 


Contents on AIX /dev:
---------------------

# ls -al

..
..
crw-------   1 root     system       17,  0 Aug 08 12:00 tty0
crw-rw-rw-   1 root     system       22,  0 May 27 17:45 ttyp0
crw-rw-rw-   1 root     system       22,  1 May 27 17:45 ttyp1
crw-rw-rw-   1 root     system       22,  2 May 27 17:45 ttyp2
..
..
brw-rw----   1 root     system       10,  7 May 27 17:46 hd3
brw-rw----   1 root     system       10,  4 Jun 27 15:51 hd4
brw-rw----   1 root     system       10,  1 Aug 08 11:41 hd5
brw-rw----   1 root     system       10,  2 May 27 17:46 hd6
brw-rw----   1 root     system       10,  3 May 27 17:46 hd8
brw-rw----   1 root     system       10,  6 May 27 17:46 hd9var
brw-------   1 root     system       16,  7 May 27 17:44 hdisk0
brw-------   1 root     system       16,  2 May 27 17:44 hdisk1
brw-------   1 root     system       16, 10 May 27 18:23 hdisk10
brw-------   1 root     system       16, 12 May 27 18:23 hdisk11
brw-------   1 root     system       16,  5 May 27 17:44 hdisk2
brw-------   1 root     system       16, 20 May 27 18:23 hdisk20
brw-------   1 root     system       16, 21 May 27 18:23 hdisk21
brw-------   1 root     system       16, 22 May 27 18:23 hdisk22
..
..

The 'c' implies that the device is a character device, as 
opposed to a block device (b prefix) or ordinary file.




=========================
8. Current machines 2005:
=========================


8.1 AIX machines:
=================

1. eServer p5 family:
---------------------

               dimensions             cpu            Ghz           Max mem GB
  p5-510 Express 19" 2U                 1 or 2         1.5           32
  p5-520 Express 19" 4U or deskside     1 or 2         1.5           32
  p5-550 Express 19" 4U or deskside     1,2,4          1.5           64
  p5-570 Express 19" rack               2,4,8          1.5           128

               dimensions             cpu            Ghz           Max mem GB
  p5-510         19" 2U                 1 or 2         1.65          32
  p5-520         19" 4U or deskside     2              1.65          32
* p5-550         19" 4U or deskside     1,2 or 4       1.5 or 1.65   64
  p5-570         19" rack               2,4,8,12,16    1.65 or 1.90  128

               dimensions             cpu            Ghz           Max mem GB
  p5-575         24" frame              8              1.90          256
  p5-590         24" frame              8 to 32        1.65          256
  p5-595         24" frame              16 to 64       1.65 or 1.90  256 / 512


More info on the 550:
---------------------

5 hotplug PCI slots, 
1 embedded Ultra 320 SCSI dual channel controller,
1 10/100/1000 Mbps integrated dual port Ethernet controller,
2 service processor communications port,
2 USB 2 ports,
2 HMC ports,
2 RIO ports = Remote IO ports, e.g. for connecting the system to an external drawer
2 System Power Control Network SPCN ports
1 or 2 hotswap capable 4-diskbays. 4x146 or 8x 146 GB Ultra320 scsi disks 

SPCN interface: System Power Control Network is a microprocessor based operating system that controls all aspects
of the IBM power network, including power on/off, sequencing, fru isolation, battery testing etc..
So SPCN is used for power control

RIO interface: for connecting the system to external drawer.

SPC interface: Service Processor Communications port. Can be used to connect a terminal.


8.2 former pSeries:
===================

- pSeries 655
  (Rack-mount)

Easy-to-manage 4- or 8-way ultra-dense cluster-optimized server for HPC and BI.

Processor 64-bit POWER4+ 
Clock rates (Min/Max) 1.50GHz / 1.70GHz 
System memory (Std/Max) 4GB / 64GB 
Internal storage (Std/Max) 72.8GB / 2.6TB 
Performance (rPerf range)*** 15.22 to 21.87 

- pSeries 670
  (Rack-mount)

The 4- to 16-way p670 is packed with the same Capacity on Demand (CoD) capabilities and innovative 
technology as the flagship p690.

Processor POWER4+ 
Clock rates (Min/Max) 1.50GHz 
System memory (Std/Max) 4GB / 256GB 
Internal storage (Std/Max) 72.8GB / 7.0TB 
Performance (rPerf range)*** 13.66 to 46.79 

- pSeries 690

8- to 32-way enterprise-class AIX 5L/Linux server 

Processor 64-bit POWER4+ 
Clock rates (Min/Max) 1.50GHz / 1.90GHz 
System memory (Std/Max) 8GB / 1TB 
Internal storage (Std/Max) 72.8GB / 18.7TB 
Performance (rPerf range)*** 27.11 to 104.17 
 


8.3 Recent Sun machines:
========================

Sun Fire High-End:
------------------

Sun Fire High-end Servers
Sun's flagship servers powered by the new UltraSPARC IV+ processors work with the Solaris Operating System 
to offer a stellar enterprise consolidation platform for customers: over five times greater performance, 
doubled memory capacity and significantly increased I/O performance over previous generation systems. 
Compatible with multiple platforms and CPU speeds, the Solaris 10 OS allows customers to quickly and easily 
take advantage of the latest server technology and the industry's only true technology systems 
investment protection.

Sun Fire E20K

High-end computing, affordability, and scalability: The Sun Fire E20K server gives you 36 UltraSPARC IV+ processors 
and 72 simultaneous threads, with mainframe class reliability and security. Later, scale it up to the full 
Sun Fire E25K capacities.

Scales up to 36 UltraSPARC IV+ processors with 72 threads, over 5x faster performance over UltraSPARC III 
processor-based servers
Doubled memory capacity with 2GB DIMMs: 576 GB memory
Unique Uniboard technology to provision up to 9 CPU/Memory Uniboards on the fly
Easily upgraded up to 72 processors with 144 threads and over 1TB memory
High-bandwidth PCI-X support for I/O intensive applications
Mix and match UltraSPARC III, IV and IV+ processors in the same system
Dynamic System Domains and Solaris Containers
Predictive Self-Healing/Automatic System Recovery
Solaris Security Toolkit
Capacity on Demand 2.0 to add resources only when needed
Solaris 9 9/05 and Solaris 10 3/05 HW1


Sun Fire E25K 

Scales up to 72 UltraSPARC IV+ processors with 144 threads, over 5x faster performance compared to UltraSPARC III 
processor-based servers
Doubled memory capacity with 2GB DIMMs: over 1TB memory
Unique Uniboard technology to provision up to 18 CPU/Memory Uniboards on the fly
High-bandwidth PCI-X support for I/O intensive applications
Mix and match UltraSPARC III, IV and IV+ processors in the same system
Dynamic System Domains and Solaris Containers
Predictive Self-Healing/Automatic System Recovery
Solaris Security Toolkit
Capacity on Demand 2.0 to add resources only when needed
Solaris 9 9/05 and Solaris 10 3/05 HW1


Sun Fire Mid-Range:
-------------------

Sun Fire Midrange Servers
Designed for compute density backed by enterprise-class features, Sun's midrange servers deliver high performance 
and protect your IT investments over time. Sun has enhanced the industry's leading 64-bit server line with the 
UltraSPARC IV+ processor, offering up to five times the performance of previous UltraSPARC III systems. 
Yet you retain Sun's migration-enabling unbroken binary application compatibility with the Solaris OS. 

 
Sun Fire V490
Scales up to four processors, eight simultaneous compute threads 

Sun Fire V890
Scales up to eight processors, 16 simultaneous compute threads 

Sun Fire E2900
Scales up to 12 processors, 24 simultaneous compute threads 

Sun Fire E4900
Scales up to 12 processors, 24 simultaneous compute threads 

Sun Fire E6900
Scales up to 24 processors, 48 simultaneous compute threads 


Sun Fire Low-End:
-----------------

V125
V215
V245
V445

V210
V240
etc..

Sun Fire x64 Servers:
---------------------

These servers can run many operating system, including Solaris OS, Linux, Windows or VMware.
These are the Sun Blade machines. 


8.4 Recent HP Servers:
======================

 _ HP ProLiant servers 
 _ HP ProLiant DL 
 _ HP ProLiant ML 
 _ HP ProLiant BL blades 
 
 _ HP Integrity servers 
 _ Entry-class 
 _ Mid-range 
 _ Superdome (high-end) 
 _ HP Integrity BL blades 
 
 _ HP Integrity NonStop servers 
 _ HP 9000 servers 
 _ HP AlphaServer systems 
 _ Telco and carrier-grade servers 
 _ HP e3000 servers 
 

HP servers running HP-UX 11i :

 _ HP 9000 servers 
  PA-RISC powered servers

 _ HP Integrity servers 
  Industry standard 
  Itaniumr 2 based servers 

 _ HP Telco servers 
Specially designed for the telecom and service provider industries 


 
 





=======================================
9. Most important pSeries LED Codes:
=======================================

MCA LED codes:
--------------

Booting BIST phase: leds 100-195, defining hardware status
Booting POST phase: leds 200-2E7, during finding BLV
LED 200: key in secure position
LED 299: BLV will be loaded

PCI systems an pSeries LED codes:
---------------------------------

reduced ODM from BLV copied into RAMFS: OK=510, NOT OK=LED 548: 
LED 511: bootinfo -b is called to determine the last bootdevice
ipl_varyon of rootvg: OK=517,ELSE 551,552,554,556: 
LED 555,557: mount /dev/hd4 on temporary mountpoint /mnt
LED 518: mount /usr, /var
LED 553: syncvg rootvg, or inittab problem
LED 549
LED 581: tcp/ip is being configured, and there is some problem

Last phases in the boot is where cfgcon is called, to configure the console.
cfgcon LED codes include:
C31: Console not yet configured.
C32: Console is an LFT terminal
C33: Console is a TTY
C34: Console is a file on disk
C99: Could not detect a console device

LED 551: ipl_varyon of rootvg

201           : Damaged boot image
223-229       : Invalid boot list
551,555,557   : Corrupted filesystem, corrupted JFS log
552,554,556   : Superblock corrupted, corrupted customized ODM database
553           : Corrupted /etc/inittab file

Firmware that leads to LED code:
--------------------------------

LED Code 888 right after boot: software problem 102, OR, hardware or software problem 103

rc.boot LED codes:
------------------

rc.boot1

  init          success=F05 error=c06                             
  restbase      copies bootimage ODM -> RAM fs ODM:  success=510  error=548
  cfgmgr -f     configuration all base devices needed to access rootvg
  bootinfo -b

end rc.boot 1   LED=511




PCI / RS6000 LED Codes:
========================


=============
1.

Built-In Self-Test (BIST) Indicators
------------------------------------

100 BIST completed successfully; control was passed to IPL ROS.
101 BIST started following reset.
102 BIST started, following the system unit's power-on reset.
103 BIST could not determine the system model number.
104 Equipment conflict; BIST could not find the CBA.
105 BIST could not read from the OCS EPROM.
106 BIST failed: CBA not found
111 OCS stopped; BIST detected a module error.
112 A checkstop occurred during BIST; checkstop results could not be logged out.
113 Three checkstops have occurred.
120 BIST starting a CRC check on the 8752 EPROM.
121 BIST detected a bad CRC in the first 32K bytes of the OCS EPROM.
122 BIST started a CRC check on the first 32K bytes of the OCS EPROM.
123 BIST detected a bad CRC on the OCS area of NVRAM.
124 BIST started a CRC check on the OCS area of NVRAM.
125 BIST detected a bad CRC on the time-of-day area of NVRAM.
126 BIST started a CRC check on the time-of-day area of NVRAM.
127 BIST detected a bad CRC on the 8752 EPROM.
130 BIST presence test started.
140 Running BIST. (Box Manufacturing Mode Only)
142 Box manufacturing mode operation.
143 Invalid memory configuration.
144 Manufacturing test failure.
151 BIST started AIPGM test code.
152 BIST started DCLST test code.
153 BIST started ACLST test code.
154 BIST started AST test code.
160 Bad EPOW Signal/Power status signal.
161 BIST being conducted on BUMP I/O.
162 BIST being conducted on JTAG.
163 BIST being conducted on Direct I/O.
164 BIST being conducted on CPU.
165 BIST being conducted on DCB and Memory.
166 BIST being conducted on Interrupts.
170 BIST being conducted on Multi-Processors.
180 Logout in progress.
182 BIST COP bus not responding.
185 A checkstop condition occurred during the BIST.
186 System logic-generated checkstop (Model 250 only).
187 Graphics-generated checkstop (Model 250).
195 Checkstop logout complete
199 Generic SCSI backplane
888 BIST did not start.

Power-On Self-Test (POST) Indicators
------------------------------------ 

200 IPL attempted with keylock in the Secure position.
201 IPL ROM test failed or checkstop occurred (irrecoverable).
202 Unexpected machine check interrupt.
203 Unexpected data storage interrupt.
204 Unexpected instruction storage interrupt.
205 Unexpected external interrupt.
206 Unexpected alignment interrupt.
207 Unexpected program interrupt.
208 Unexpected floating point unavailable interrupt.
209 Unexpected SVC interrupt.
20c L2 cache POST error. (The display shows a solid 20c for 5 seconds.)
210 Unexpected SVC interrupt.
211 IPL ROM CRC comparison error (irrecoverable).
212 RAM POST memory configuration error or no memory found (irrecoverable).
213 RAM POST failure (irrecoverable).
214 Power status register failed (irrecoverable).
215 A low voltage condition is present (irrecoverable).
216 IPL ROM code being uncompressed into memory.
217 End of boot list encountered.
218 RAM POST is looking for good memory.
219 RAM POST bit map is being generated.
21c L2 cache is not detected. (The display shows a solid 21c for 2 seconds.)
220 IPL control block is being initialized.
221 NVRAM CRC comparison error during AIX IPL(key mode switch in Normal mode).
Reset NVRAM by reaccomplishing IPL in Service mode. For systems with an
internal, direct-bus-attached (DBA) disk, IPL ROM attempted to perform an IPL from
that disk before halting with this operator panel display value.
222 Attempting a Normal mode IPL from Standard I/O planar-attached devices specified 
in NVRAM IPL Devices List.
223 Attempting a Normal mode IPL from SCSI-attached devices specified in NVRAM IPL 
Devices List.
224 Attempting a Normal mode IPL from 9333 subsystem device specified in NVRAM IPL 
Devices List.
225 Attempting a Normal mode IPL from 7012 DBA disk-attached devices specified in 
NVRAM IPL Devices List.
226 Attempting a Normal mode IPL from Ethernet specified in NVRAM IPL Devices List.
227 Attempting a Normal mode IPL from Token-Ring specified in NVRAM IPL Devices List.
228 Attempting a Normal mode IPL from NVRAM expansion code.
229 Attempting a Normal mode IPL from NVRAM IPL Devices List; cannot IPL from any
of the listed devices, or there are no valid entries in the Devices List.
22c Attempting a normal mode IPL from FDDI specified in NVRAM IPL device list.
230 Attempting a Normal mode IPL from adapter feature ROM specified in IPL ROM
Device List.
231 Attempting a Normal mode IPL from Ethernet specified in IPL ROM Device List.
232 Attempting a Normal mode IPL from Standard I/O planar-attached devices specified
in ROM Default Device List.
233 Attempting a Normal mode IPL from SCSI-attached devices specified in IPL ROM
Default Device List.
234 Attempting a Normal mode IPL from 9333 subsystem device specified in IPL ROM
Device List.
235 Attempting a Normal mode IPL from 7012 DBA disk-attached devices specified in
IPL ROM Default Device List.
236 Attempting a Normal mode IPL from Ethernet specified in IPL ROM Default Device
List.
237 Attempting a Normal mode IPL from Token-Ring specified in IPL ROM Default
Device List.
238 Attempting a Normal mode IPL from Token-Ring specified by the operator.
239 System failed to IPL from the device chosen by the operator.
23c Attempting a normal mode IPL from FDDI specified in IPL ROM device list.
240 Attempting a Service mode IPL from adapter feature ROM.
241 Attempting a normal boot from devices specified in the NVRAM boot list.
242 Attempting a Service mode IPL from Standard I/O planar-attached devices specified
in the NVRAM IPL Devices List.
243 Attempting a Service mode IPL from SCSI-attached devices specified in the
NVRAM IPL Devices List.
244 Attempting a Service mode IPL from 9333 subsystem device specified in the
NVRAM IPL Devices List.
245 Attempting a Service mode IPL from 7012 DBA disk-attached devices specified in
the NVRAM IPL Devices List.
246 Attempting a Service mode IPL from Ethernet specified in the NVRAM IPL Devices
List.
247 Attempting a Service mode IPL from Token-Ring specified in the NVRAM Device
List.
248 Attempting a Service mode IPL from NVRAM expansion code.
249 Attempting a Service mode IPL from the NVRAM IPL Devices List; cannot IPL from
any of the listed devices, or there are no valid entries in the Devices List.
24c Attempting a service mode IPL from FDDI specified in NVRAM IPL device list.
250 Attempting a Service mode IPL from adapter feature ROM specified in the IPL ROM
Device List.
251 Attempting a Service mode IPL from Ethernet specified in the IPL ROM Default
Device List.
252 Attempting a Service mode IPL from Standard I/O planar-attached devices specified
in the ROM Default Device List.
253 Attempting a Service mode IPL from SCSI-attached devices specified in the IPL
ROM Default Device List.
254 Attempting a Service mode IPL from 9333 subsystem device specified in the IPL
ROM Devices List.
255 Attempting a Service mode IPL from 7012 DBA disk-attached devices specified in
IPL ROM Default Device List.
256 Attempting a Service mode IPL from Ethernet specified in the IPL ROM Devices
List.
257 Attempting a Service mode IPL from Token-Ring specified in the IPL ROM Devices
List.
258 Attempting a Service mode IPL from Token-Ring specified by the operator.
259 Attempting a Service mode IPL from FDDI specified by the operator.
25c Attempting a service mode IPL from FDDI specified in IPL ROM device list.
260 Information is being displayed on the display console.
261 No supported local system display adapter was found.
262 Keyboard not detected as being connected to the system's keyboard port.
263 Attempting a Normal mode IPL from adapter feature ROM specified in the NVRAM
Device List.
269 Stalled state - the system is unable to IPL.
270 Low Cost Ethernet Adapter (LCE) POST executing
271 Mouse and Mouse port POST.
272 Tablet Port POST.
276 10/100Mbps MCA Ethernet Adapter POST executing
277 Auto Token-Ring LANstreamer MC 32 Adapter.
278 Video ROM scan POST.
279 FDDI POST.
280 3com Ethernet POST.
281 Keyboard POST executing.
282 Parallel port POST executing.
283 Serial port POST executing.
284 POWER Gt1 graphics adapter POST executing.
285 POWER Gt3 graphics adapter POST executing.
286 Token-Ring adapter POST executing.
287 Ethernet adapter POST executing.
288 Adapter card slots being queried.
289 POWER GT0 Display Adapter POST.
290 IOCC POST error (irrecoverable).
291 Standard I/O POST running.
292 SCSI POST running.
293 7012 DBA disk POST running.
294 IOCC bad TCW memory module in slot location J being tested.
295 Graphics Display adapter POST, color or grayscale.
296 ROM scan POST.
297 System model number does not compare between OCS and ROS (irrecoverable).
298 Attempting a software IPL.
299 IPL ROM passed control to the loaded program code.
301 Flash Utility ROM test failed or checkstop occurred (irrecoverable
302 Flash Utility ROM: User prompt, move the key to the service position in order to
perform an optional Flash Update. LED 3d2 will only appear if the key switch is in
the secure position. This signals the user that a Flash Update may be initiated by
moving the key switch to the service position. If the key is moved to the service
position then LED 3d3 will be displayed, this signals the user to press the Reset
button and select optional Flash Update.
303 Flash Utility ROM: User prompt, press the Reset button in order to perform an
optional Flash Update. LED 3d2 will only appear if the key switch is the secure
position. This signals the user that a Flash Update may be initiated by moving the
key switch to the service position. If the key is moved to the service position LED
3d3 will be displayed, this signals the user to press the Reset button and select
optional Flash Update.
304 Flash Utility ROM IOCC POST error (irrecoverable).
305 Flash Utility ROM standard I/O POST running.
306 Flash Utility ROM is attempting IPL from Flash Update media device.
307 Flash Utility ROM system model number does not compare between OCS and
ROM (irrecoverable).
308 Flash Utility ROM: IOCC TCW memory is being tested.
309 Flash Utility ROM passed control to a Flash Update Boot Image.
311 Flash Utility ROM CRC comparison error (irrecoverable).
312 Flash Utility ROM RAM POST memory configuration error or no memory found
(irrecoverable).
313 Flash Utility ROM RAM POST failure (irrecoverable).
314 Flash Utility ROM Power status register failed (irrecoverable).
315 Flash Utility ROM detected a low voltage condition.
318 Flash Utility ROM RAM POST is looking for good memory.
319 Flash Utility ROM RAM POST bit map is being generated.
322 CRC error on media Flash Image. No Flash Update performed.
323 Current Flash Image is being erased.
324 CRC error on new Flash Image after Update was performed. (Flash Image is cor-rupted.)
325 Flash Update successful and complete.

Configuration Program Indicators
--------------------------------

500 Querying Standard I/O slot.
501 Querying card in Slot 1.
502 Querying card in Slot 2.
503 Querying card in Slot 3.
504 Querying card in Slot 4.
505 Querying card in Slot 5.
506 Querying card in Slot 6.
507 Querying card in Slot 7.
508 Querying card in Slot 8.
510 Starting device configuration.
511 Device configuration completed.
512 Restoring device configuration files from media.
513 Restoring basic operating system installation files from media.
516 Contacting server during network boot.
517 Mounting client remote file system during network IPL.
518 Remote mount of the root and /usr file systems failed during network boot.
520 Bus configuration running.
521 /etc/init invoked cfgmgr with invalid options; /etc/init has been corrupted or incor-rectly 
modified (irrecoverable error).
522 The configuration manager has been invoked with conflicting options (irrecoverable
error).
523 The configuration manager is unable to access the ODM database (irrecoverable
error).
524 The configuration manager is unable to access the config.rules object in the ODM
database (irrecoverable error).
525 The configuration manager is unable to get data from a customized device object in
the ODM database (irrecoverable error).
526 The configuration manager is unable to get data from a customized device driver
object in the ODM database ( irrecoverable error).
527 The configuration manager was invoked with the phase 1 flag; running phase 1 at
this point is not permitted (irrecoverable error).
528 The configuration manager cannot find sequence rule, or no program name was
specified in the ODM database (irrecoverable error).
529 The configuration manager is unable to update ODM data (irrecoverable error).
530 The program savebase returned an error.
531 The configuration manager is unable to access the PdAt object class (irrecoverable
error).
532 There is not enough memory to continue (malloc failure); irrecoverable error.
533 The configuration manager could not find a configure method for a device.
534 The configuration manager is unable to acquire database lock (irrecoverable error).
535 HIPPI diagnostics interface driver being configured.
536 The configuration manager encountered more than one sequence rule specified in
the same phase (irrecoverable error).
537 The configuration manager encountered an error when invoking the program in the
sequence rule.
538 The configuration manager is going to invoke a configuration method.
539 The configuration method has terminated, and control has returned to the configura-tion 
manager.
551 IPL vary-on is running.
552 IPL varyon failed.
553 IPL phase 1 is complete.
554 The boot device could not be opened or read, or unable to define NFS swap device
during network boot.
555 An ODM error occurred when trying to varyon the rootvg, or unable to create an
NFS swap device during network boot.
556 Logical Volume Manager encountered error during IPL vary-on.
557 The root filesystem will not mount.
558 There is not enough memory to continue the system IPL.
559 Less than 2 M bytes of good memory are available to load the AIX kernel.
570 Virtual SCSI devices being configured.
571 HIPPI common function device driver being configured.
572 HIPPI IPI-3 master transport driver being configured.
573 HIPPI IPI-3 slave transport driver being configured.
574 HIPPI IPI-3 transport services user interface device driver being configured.
575 A 9570 disk-array driver is being configured.
576 Generic async device driver being configured.
577 Generic SCSI device driver being configured.
578 Generic commo device driver being configured.
579 Device driver being configured for a generic device.
580 HIPPI TCPIP network interface driver being configured.
581 Configuring TCP/IP.
582 Configuring Token-Ring data link control.
583 Configuring an Ethernet data link control.
584 Configuring an IEEE Ethernet data link control.
585 Configuring an SDLC MPQP data link control.
586 Configuring a QLLC X.25 data link control.
587 Configuring a NETBIOS.
588 Configuring a Bisync Read-Write (BSCRW).
589 SCSI target mode device being configured.
590 Diskless remote paging device being configured.
591 Configuring an LVM device driver.
592 Configuring an HFT device driver.
593 Configuring SNA device drivers.
594 Asynchronous I/O being defined or configured.
595 X.31 pseudo-device being configured.
596 SNA DLC/LAPE pseudo-device being configured.
597 OCS software being configured.
598 OCS hosts being configured during system reboot.
599 Configuring FDDI data link control.
5c0 Streams-based hardware drive being configured.
5c1 Streams-based X.25 protocol being configured.
5c2 Streams-based X.25 COMIO emulator driver being configured.
5c3 Streams-based X.25 TCP/IP interface driver being configured.
5c4 FCS adapter device driver being configured.
5c5 SCB network device driver for FCS is being configured.
5c6 AIX SNA channel being configured.
600 Starting network boot portion of /sbin/rc.boot
602 Configuring network parent devices.
603 /usr/lib/methods/defsys, /usr/lib/methods/cfgsys, or /usr/lib/methods/cfgbus
failed.
604 Configuring physical network boot device.
605 Configuration of physical network boot device failed.
606 Running /usr/sbin/ifconfig on logical network boot device.
607 /usr/sbin/ifconfig failed.
608 Attempting to retrieve the client.info file with tftp.Note that a flashing 608 indicates
multiple attempt(s) to retrieve the client_info file are occurring.
609 The client.info file does not exist or it is zero length.
610 Attempting remote mount of NFS file system.
611 Remote mount of the NFS file system failed.
612 Accessing remote files; unconfiguring network boot device.
614 Configuring local paging devices.
615 Configuration of a local paging device failed.
616 Converting from diskless to dataless configuration.
617 Diskless to dataless configuration failed.
618 Configuring remote (NFS) paging devices.
619 Configuration of a remote (NFS) paging device failed.
620 Updating special device files and ODM in permanent filesystem with data from boot
RAM filesystem.
622 Boot process configuring for operating system installation.
650 IBM SCSD disk drive being configured
668 25MB ATM MCA Adapter being configured
680 POWER GXT800M Graphics Adapter
689 4.5GB Ultra SCSI Single Ended Disk Drive being configured
690 9.1GB Ultra SCSI Single Ended Disk Drive being configured
694 Eicon ISDN DIVA MCA Adapter for PowerPC Systems
700 Progress indicator. A 1.1 GB 8-bit SCSI disk drive being identified or configured.
701 Progress indicator. A 1.1 GB 16-bit SCSI disk drive is being identified or configured.
702 Progress indicator. A 1.1 GB 16-bit differential SCSI disk drive is being identified or
configured.
703 Progress indicator. A 2.2 GB 8-bit SCSI disk drive is being identified or configured.
704 Progress indicator. A 2.2 GB 16-bit SCSI disk drive is being identified or configured.
705 The configuration method for the 2.2 GB 16-bit differential SCSI disk drive is being
run. If an irrecoverable error occurs, the system halts.
706 Progress indicator. A 4.5 GB 16-bit SCSI disk drive is being identified or configured.
707 Progress indicator. A 4.5 GB 16-bit differential SCSI disk drive is being identified or
configured.
708 Progress indicator. A L2 cache is being identified or configured.
710 POWER GXT150M graphics adapter being identified or configured.
711 Unknown adapter being identified or configured.
712 Graphics slot bus configuration is executing.
713 The IBM ARTIC960 device is being configured.
714 A video capture adapter is being configured.
715 The Ultimedia Services audio adapter is being configured. This LED displays briefly
on the panel.
717 TP Ethernet Adapter being configured.
718 GXT500 Graphics Adapter being configured.
720 Unknown read/write optical drive type being configured.
721 Unknown disk or SCSI device being identified or configured.
722 Unknown disk being identified or configured.
723 Unknown CD-ROM being identified or configured.
724 Unknown tape drive being identified or configured.
725 Unknown display adapter being identified or configured.
726 Unknown input device being identified or configured.
727 Unknown async device being identified or configured.
728 Parallel printer being identified or configured.
729 Unknown parallel device being identified or configured.
730 Unknown diskette drive being identified or configured.
731 PTY being identified or configured.
732 Unknown SCSI initiator type being configured.
733 7GB 8mm tape drive being configured.
734 4x SCSI-2 640MB CD-ROM Drive
741 1080MB SCSI Disk Drive
745 16GB 4mm Tape Auto Loader
748 MCA keyboard/mouse adapter being configured.
749 7331 Model 205 Tape Library
754 1.1GB 16-bit SCSI disk drive being configured.
755 2.2GB 16-bit SCSI disk drive being configured.
756 4.5GB 16-bit SCSI disk drive being configured.
757 External 13GB 1.5M/s 1/4 inch tape being configured.
772 4.5GB SCSI F/W Disk Drive
773 9.1GB SCSI F/W Disk Drive
774 9.1GB External SCSI Disk Drive
77c Progress indicator. A 1.0 GB 16-bit SCSI disk drive being identified or configured.
783 4mm DDS-2 Tape Autoloader
789 2.6GB External Optical Drive
794 10/100MB Ethernet PX MC Adapter
797 Turboways 155 UTP/STP ATM Adapter being identified or configured.
798 Video streamer adapter being identified or configured.
800 Turboways 155 MMF ATM Adapter being identified or configured.
803 7336 Tape Library Robotics being configured
804 8x Speed SCSI-2 CD ROM drive being configured
807 SCSI Device Enclosure being configured
808 System Interface Full (SIF) configuration process
80c SSA 4-Port Adapter being identified or configured.
811 Processor complex being identified or configured.
812 Memory being identified or configured.
813 Battery for time-of-day, NVRAM, and so on being identified or configured, or system
I/O control logic being identified or configured.
814 NVRAM being identified or configured.
815 Floating-point processor test
816 Operator panel logic being identified or configured.
817 Time-of-day logic being identified or configured.
819 Graphics input device adapter being identified or configured.
821 Standard keyboard adapter being identified or configured.
823 Standard mouse adapter being identified or configured.
824 Standard tablet adapter being identified or configured.
825 Standard speaker adapter being identified or configured.
826 Serial Port 1 adapter being identified or configured.
827 Parallel port adapter being identified or configured.
828 Standard diskette adapter being identified or configured.
831 3151 adapter being identified or configured, or Serial Port 2 being identified or con-figured.
834 64-port async controller being identified or configured.
835 16-port async concentrator being identified or configured.
836 128-port async controller being identified or configured.
837 16-port remote async node being identified or configured.
838 Network Terminal Accelerator Adapter being identified or configured.
839 7318 Serial Communications Server being configured.
841 8-port async adapter (EIA-232) being identified or configured.
842 8-port async adapter (EIA-422A) being identified or configured.
843 8-port async adapter (MIL-STD 188) being identified or configured.
844 7135 RAIDiant Array disk drive subsystem controller being identified or configured.
845 7135 RAIDiant Array disk drive subsystem drawer being identified or configured.
846 RAIDiant Array SCSI 1.3GB Disk Drive
847 16-port serial adapter (EIA-232) being identified or configured.
848 16-port serial adapter (EIA-422) being identified or configured.
849 X.25 Interface Co-Processor/2 adapter being identified or configured.
850 Token-Ring network adapter being identified or configured.
851 T1/J1 Portmaster adapter being identified or configured.
852 Ethernet adapter being identified or configured.
854 3270 Host Connection Program/6000 connection being identified or configured.
855 Portmaster Adapter/A being identified or configured.
857 FSLA adapter being identified or configured.
858 5085/5086/5088 adapter being identified or configured.
859 FDDI adapter being identified or configured.
85c Progress indicator. Token-Ring High-Performance LAN adapter is being identified or
configured.
861 Optical adapter being identified or configured.
862 Block Multiplexer Channel Adapter being identified or configured.
865 ESCON Channel Adapter or emulator being identified or configured.
866 SCSI adapter being identified or configured.
867 Async expansion adapter being identified or configured.
868 SCSI adapter being identified or configured.
869 SCSI adapter being identified or configured.
870 Serial disk drive adapter being identified or configured.
871 Graphics subsystem adapter being identified or configured.
872 Grayscale graphics adapter being identified or configured.
874 Color graphics adapter being identified or configured.
875 Vendor generic communication adapter being configured.
876 8-bit color graphics processor being identified or configured.
877 POWER Gt3/POWER Gt4 being identified or configured.
878 POWER Gt4 graphics processor card being configured.
879 24-bit color graphics card, MEV2
880 POWER Gt1 adapter being identified or configured.
887 Integrated Ethernet adapter being identified or configured.
889 SCSI adapter being identified or configured.
890 SCSI-2 Differential Fast/Wide and Single-Ended Fast/Wide Adapter/A.
891 Vendor SCSI adapter being identified or configured.
892 Vendor display adapter being identified or configured.
893 Vendor LAN adapter being identified or configured.
894 Vendor async/communications adapter being identified or configured.
895 Vendor IEEE 488 adapter being identified or configured.
896 Vendor VME bus adapter being identified or configured.
897 S/370 Channel Emulator adapter being identified or configured.
898 POWER Gt1x graphics adapter being identified or configured.
899 3490 attached tape drive being identified or configured.
89c Progress indicator. A multimedia SCSI CD-ROM is being identified or configured.
901 Vendor SCSI device being identified or configured.
902 Vendor display device being identified or configured.
903 Vendor async device being identified or configured.
904 Vendor parallel device being identified or configured.
905 Vendor other device being identified or configured.
908 POWER GXT1000 Graphics subsystem being identified or configured.
910 1/4GB Fibre Channel/266 Standard Adapter being identified or configured.
911 Fibre Channel/1063 Adapter Short Wave
912 2.0GB SCSI-2 differential disk drive being identified or configured.
913 1.0GB differential disk drive being identified or configured.
914 5GB 8 mm differential tape drive being identified or configured.
915 4GB 4 mm tape drive being identified or configured.
916 Non-SCSI vendor tape adapter being identified or configured.
917 Progress indicator. 2.0GB 16-bit differential SCSI disk drive is being identified or
configured.
918 Progress indicator. 2GB 16-bit single-ended SCSI disk drive is being identified or
configured.
920 Bridge Box being identified or configured.
921 101 keyboard being identified or configured.
922 102 keyboard being identified or configured.
923 Kanji keyboard being identified or configured.
924 Two-button mouse being identified or configured.
925 Three-button mouse being identified or configured.
926 5083 tablet being identified or configured.
927 5083 tablet being identified or configured.
928 Standard speaker being identified or configured.
929 Dials being identified or configured.
930 Lighted program function keys (LPFK) being identified or configured.
931 IP router being identified or configured.
933 Async planar being identified or configured.
934 Async expansion drawer being identified or configured.
935 3.5-inch diskette drive being identified or configured.
936 5.25-inch diskette drive being identified or configured.
937 An HIPPI adapter is being configured.
942 POWER GXT 100 graphics adapter being identified or configured.
943 Progress indicator. 3480 and 3490 control units attached to a System/370 Channel
Emulator/A adapter are being identified or configured.
944 100MB ATM adapter being identified or configured
945 1.0GB SCSI differential disk drive being identified or configured.
946 Serial port 3 adapter is being identified or configured.
947 Progress indicator. A 730MB SCSI disk drive is being configured.
948 Portable disk drive being identified or configured.
949 Unknown direct bus-attach device being identified or configured.
950 Missing SCSI device being identified or configured.
951 670MB SCSI disk drive being identified or configured.
952 355MB SCSI disk drive being identified or configured.
953 320MB SCSI disk drive being identified or configured.
954 400MB SCSI disk drive being identified or configured.
955 857MB SCSI disk drive being identified or configured.
956 670MB SCSI disk drive electronics card being identified or configured.
957 120MB DBA disk drive being identified or configured.
958 160 MB DBA disk drive being identified or configured.
959 160MB SCSI disk drive being identified or configured.
960 1.37GB SCSI disk drive being identified or configured.
964 Internal 20GB 8mm tape drive identified or configured.
968 1.0GB SCSI disk drive being identified or configured.
970 Half-inch, 9-track tape drive being identified or configured.
971 150MB 1/4-inch tape drive being identified or configured.
972 2.3GB 8 mm SCSI tape drive being identified or configured.
973 Other SCSI tape drive being identified or configured.
974 CD-ROM drive being identified or configured.
975 Progress indicator. An optical disk drive is being identified or configured.
977 M-Audio Capture and Playback Adapter being identified or configured.
981 540MB SCSI-2 single-ended disk drive being identified or configured.
984 1GB 8-bit disk drive being identified or configured.
985 M-Video Capture Adapter being identified or configured.
986 2.4GB SCSI disk drive being identified or configured.
987 Progress indicator. Enhanced SCSI CD-ROM drive is being identified or configured.
989 200MB SCSI disk drive being identified or configured.
990 2.0GB SCSI-2 single-ended disk drive being identified or configured.
991 525MB 1/4-inch cartridge tape drive being identified or configured.
994 5GB 8 mm tape drive being identified or configured.
995 1.2GB 1/4 inch cartridge tape drive being identified or configured.
996 Progress indicator. Single-port, multi-protocol communications adapter is being
identified or configured.
997 FDDI adapter being identified or configured.
998 2.0GB4 mm tape drive being identified or configured.
999 7137 or 3514 Disk Array Subsystem being configured.
D81 T2 Ethernet Adapter being configured.

Diagnostic Load Progress Indicators
-----------------------------------

Note: When a lowercase c is listed, it displays in the lower half of the seven-segment
character position.

c00 AIX Install/Maintenance loaded successfully.
c01 Insert the first diagnostic diskette.
c02 Diskettes inserted out of sequence.
c03 The wrong diskette is in diskette drive.
c04 The loading stopped with a nonrecoverable error.
c05 A diskette error occurred.
c06 The rc.boot configuration shell script is unable to determine type of boot.
c07 Insert the next diagnostic diskette.
c08 RAM file system started incorrectly.
c09 The diskette drive is reading or writing a diskette.
c20 An unexpected halt occurred, and the system is configured to enter the kernel
debug program instead of entering a system dump.
c21 The ifconfig command was unable to configure the network for the client network
host.
c22 The tftp command was unable to read client's ClientHostName info file during a
client network boot.
c24 Unable to read client's ClientHostName.info file during a client network boot.
c25 Client did not mount remote miniroot during network install.
c26 Client did not mount the /usr file system during the network boot.
c29 The system was unable to configure the network device.
c31 Select the console display for the diagnostics. To select No console display, set the 
key mode switch to Normal then to Service. The diagnostic programs will then load
and run the diagnostics automatically.
c32 A direct-attached display (HFT) was selected.
c33 A tty terminal attached to serial ports S1 or S2 was selected.
c34 A file was selected. The console messages store in a file.
c40 Configuration files are being restored.
c41 Could not determine the boot type or device.
c42 Extracting data files from diskette.
c43 Cannot access the boot/install tape.
c44 Initializing installation database with target disk information.
c45 Cannot configure the console.
c46 Normal installation processing.
c47 Could not create a physical volume identifier (PVID) on disk.
c48 Prompting you for input.
c49 Could not create or form the JFS log.
c50 Creating root volume group on target disks.
c51 No paging devices were found.
c52 Changing from RAM environment to disk environment.
c53 Not enough space in the /tmp directory to do a preservation installation.
c54 Installing either BOS or additional packages.
c55 Could not remove the specified logical volume in a preservation installation.
c56 Running user-defined customization.
c57 Failure to restore BOS.
c58 Displaying message to turn the key.
c59 Could not copy either device special files, device ODM, or volume group information
from RAM to disk.
c61 Failed to create the boot image.
c62 Loading platform dependent debug files
c63 Loading platform dependent data files
c64 Failed to load platform dependent data files
c70 Problem Mounting diagnostic CDROM disc
c99 Diagnostics have completed. This code is only used when there is no console.



=============
2.

0c0 The dump completed successfully 
0c1 The dump failed due to an I/O error. 
0c2 A user-requested dump has started. You requested a dump using the SYSDUMPSTART command, a dump key sequence, or the Reset button. 

0c3 The dump is inhibit 
0c4 The dump did not complete. A partial dump was written to the dump device. There is not enough space on the dump deviceto contain the entire dump. To prevent this problem from occuring again, you must increase the size of your dumpmedia. 


0c5 The dump failed to start. An unecpected error occured while the system was attempting to write to the dump media. 
0c6 A dump to the secondary dump device was requested. Make the secondary dump device ready, then press CTRL-ALT-NUMPAD2. 
0c7 Reserved. 
0c8 The dump function is disabled. No primary dump device is configured. 
0c9 A dump is in progress. 
0cc Unknown dump failure 


---------- Diagnostics Load Progress Indicators ----------- 

c00 AIX Install/Maintenance loaded successfully. 
c01 Insert the first diagnostic diskette. 
c02 Diskettes inserted out of sequence. 
c03 The wrong diskette is in the drive. 
c04 The loading stopped with an irrecoverable error. 
c05 A diskette error occurred. 
c08 RAM filesystem started incorrectly. 
c07 Insert the next diagnostic diskette. 
c09 The diskette drive is reading or writing a diskette. 
c20 An unexpected halt occured, and the system is configured to enter the kernel debug program instead of entering asystem dump. 

c21 The 'ifconfig' command was unable to configure the network for the client network host. 
c22 The 'tftp' command was unable to read client's ClientHostName.info file during a client network boot. 
c24 Unable to read client's ClientHostName.info file during a client network boot. 
c25 Client did not mount remote miniroot during network install. 
c26 Client did not mount the /usr filesystem during the network boot. 
c29 System was unable to configure the network device. 
c31 Select the console display for the diagnostics. To select "No console display", set the key mode switch to normal thento Service. The diagnostic program will then load and run the diagnostics automatically. 

c32 A direct-attached display (HFT) was selected. 
c33 a TTY terminal attached to serial ports S1 or S2 was selected. 
c34 A file was selected. The console messages store in a file 
c40 Configuration files are been restored. 
c41 Could not determine the boot type or device. 
c42 Extracting data files from diskette. 
c43 Diagboot cannot be accessed. 
c44 Initialyzing installation database with target disk information. 
c45 Cannot configure the console. 
c46 Normal installation processing. 
c47 Could not create a physical volume identifier (PVID) on disk. 
c48 Prompting you for input. 
c49 Could not create or form the JFS log. 
c50 Creating rootvg volume group on target disk 
c51 No paging space were found. 
c52 Changing from RAM environment to disk environment. 
c53 Not enough space in the /tmp directory to do a preservation installation. 
c54 Installing either BOS or additionnal packages. 
c55 Could not remove the specified logical volume in a preservation installation. 
c56 Running user-defined customization. 
c57 Failure to restore BOS. 
c58 Display message to turn the key. 
c59 Could not copy either device special files, device ODM, or volume group information from RAM to disk. 
c61 Failed to create the boot image. 
c70 Problem Mounting diagnostics CDROM disc. 
c99 Diagnostics have completed. This code is only used when there is no console. 


--------Debugger Progress Indicators ---------- 

c20 Kernel debug program activated. An unexpected system halt has occured, and you have configured the system 
to enter the kernel debug program instead of performing a dump. 


---------Built-In Self Test (Bist) Indicators--------- 

100 BIST completed successfully. Control was passed to IPL ROS. 
101 BIST started following RESET 
102 BIST started following Power-on Reset 
103 BIST could not determine the system model number. 
104 Equipment conflict. BIST could not find the CBA. 
105 BIST could not read the OCS EPROM. 
106 BIST detected a module error. 
111 OCS stopped. BIST detected a module error. 
112 A checkstop occured during BIST. 
113 BIST checkstop count is greater than 1. 
120 BIST starting a CRC check on the 8752 EPROM. 
121 BIST detected a bad CRC in the first 32K of the OCS EPROM. 
122 BIST started a CRC check on the first 32K of the OCS EPROM. 
123 BIST detected a bad CRC on the OCS area of NVRAM. 
124 BIST started a CRC check on the OCS area of NVRAM. 
125 BIST detected a bad CRC on the time-of-day area of NVRAM. 
126 BIST started a CRC check on the time-of-day area of the NVRAM. 
127 BIST detected a bad CRC on the 8752 EPROM. 
130 BIST presence test started. 
140 BIST failed: procedure error 
142 BIST failed: procedure error 
143 Invalid memory configuration. 
144 BIST failed; procedure error. 
151 BIST started AIPGM test code. 
152 BIST started DCLST test code. 
153 BIST started ACLST test code. 
154 BIST started AST test code. 
160 Bad EPOW Signal/Power status signal 
161 BIST being conducted on BUMP I/O 
162 BIST being conducted on JTAG 
163 BIST being conducted on Direct I/O 
164 BIST being conducted on CPU 
165 BIST being conducted on DCB and Memory 
166 BIST being conducted on interrupts 
170 BIST being conducted on 'Multi-Processor 
180 BIST logout failed. 
182 BIST COP bus not responding 
185 A checkstop condition occured during the BIST 
186 System logic-generated checkstop (Model 250 only) 
187 Graphics-generated checkstop (Model 250) 
195 BIST logout completed. 
888 BIST did not start 


------- Power-On Self Test ------- 

200 IPL attempted with keylock in the SECURE position. 
201 IPL ROM test failed or checkstop occured (irrecoverable) 
202 IPL ROM test failed or checkstop occured (irrecoverable) 
203 Unexpected data storage interrupt. 
204 Unexpected instruction storage interrupt. 
205 Unexpected external interrupt. 
206 Unexpected alignment interrupt. 
207 Unexpected program interrupt. 
208 Unexpected floating point unavailable interrupt. 
209 Unexpected SVC interrupt. 
20c L2 cache POST error. (The display shows a solid 20c for 5 seconds 
210 Unexpected SVC interrupt. 
211 IPL ROM CRC comparison error (irrecoverable). 
212 RAM POST memory configuration error or no memory found (irrecoverable). 
213 RAM POST failure (irrecoverable). 
214 Power status register failed (irrecoverable). 
215 A low voltage condition is present (irrecoverable). 
216 IPL ROM code being uncompressed into memory. 
217 End of bootlist encountered. 
218 RAM POST is looking for 1M bytes of good memory. 
219 RAM POST bit map is being generated. 
21c L2 cache is not detected. (The display shows a solid 21c for 5 sec) 
220 IPL control block is being initialized. 
221 NVRAM CRC comparison error during AIX. 
IPL(Key Mode Switch in Normal mode). 
Reset NVRAM by reaccomplishing IPL in Service mode. For systems with an internal, direct-bus-attached(DBA)disk,IPL 
ROM attempted to perform an IPL from that disk before halting with this three-digit display value. 
222 Attempting a Normal mode IPL from Standard I/O planar attached devices specified in NVRAM IPL Devices List. 
223 Attempting a Normal mode IPL from SCSI attached devices specified in NVRAM IPL Devices List. 
Note: May be caused by incorrect jumper setting for external SCSI devices or by incorrect SCSI terminator. 
REFER FFC B88 
224 Attempting a Normal mode restart from 9333 subsystem device specified in NVRAM device list. 
225 Attempting a Normal mode IPL from IBM 7012 DBA disk attached devices specified in NVRAM IPL Devices List. 
226 Attempting a Normal mode restart from Ethernet specified in NVRAM device list. 
227 Attempting a Normal mode restart from Token Ring specified in NVRAM device list. 
228 Attempting a Normal mode IPL from NVRAM expansion code. 
229 Attempting a Normal mode IPL from NVRAM IPL Devices List; cannot IPL from any of the listed devices, or there are 
no valid entry in the Devices List. 
22c Attempting a normal mode IPL from FDDI specified in NVRAM IPL device list. 
230 Attempting a Normal mode restart from adapter feature ROM specified in IPL ROM devices list. 
231 Attempting a Normal mode restart from Ethernet specified in IPL ROM devices list. 
232 Attempting a Normal mode IPL from Standard I/O planar attached devices specified in Rom Default Device List. 
233 Attempting a Normal mode IPL from SCSI attached devices specified in IPL ROM Default Device List. 
234 Attempting a Normal mode restart from 9333 subsystem device specified in IPL ROM device list. 
235 Attempting a Normal mode IPL from IBM 7012 DBA disk attached devices specified in IPL ROM Default Device List. 
236 Attempting a Normal mode restart from Ethernet specified in IPL ROM default devices list. 
237 Attempting a Normal mode restart from Token Ring specified in IPL ROM default device list. 
238 Attempting a Normal mode restart from Token Ring specified by the operator. 
239 System failed to restart from the device chosen by the operator. 
23c Attempting a normal mode IPL from FDDI specified in IPL ROM device list. 
240 Attempting a Service mode restart from adapter feature ROM. 
241 Attempting a Normal mode IPL from devices specified in the NVRAM IPL Devices List. 
242 Attempting a Service mode IPL from Standard I/O planar attached devices specified in NVRAM IPL Devices List. 
243 Attempting a Service mode IPL from SCSI attached devices specified in NVRAM IPL Devices List. 
244 Attempting a Service mode restart from 9333 subsystem device specified in NVRAM device list. 
245 Attempting a Service mode IPL from IBM 7012 DBA disk attached devices specified in NVRAM IPL Devices List. 
246 Attempting a Service mode restart from Ethernet specified in NVRAM device list. 
247 Attempting a Service mode restart from Token Ring specified in NVRAM device list. 
248 Attempting a Service mode IPL from NVRAM expansion code. 
249 Attempting a Service mode IPL from NVRAM IPL Devices List; cannot IPL from any of the listed devices, or there areno valid entries in the Devices List. 

24c Attempting a service mode IPL from FDDI specified in NVRAM IPL device list. 
250 Attempting a Service mode restart from adapter feature ROM specified in IPL ROM device list. 
251 Attempting a Service mode restart from Ethernet specified in IPL ROM device list. 
252 Attempting a Service mode IPL from standard I/O planar attached devicesspecified in ROM Default Device List. 
253 Attempting a Service mode IPL from SCSI attached devices specified in IPL ROM Default Device List. 
254 Attempting a Service mode restart from 9333 subsystem device specified in IPL ROM device list. 
255 Attempting a Service mode IPL from IBM 7012 DBA disk'attached devices specified in IPL ROM Default Devices List. 
256 Attempting a Service mode restart from Ethernet specified in IPL ROM default device list. 
257 Attempting a Service mode restart from Token Ring specified in IPL ROM default device list. 
258 Attempting a Service mode restart from Token Ring specified by the operator. 
259 Attempting a Service mode restart from FDDI specified by the operator. 

25c Attempting a normal mode IPL from FDDI specified in IPL ROM device list. 
260 Information is being displayed on the display console. 
261 Information will be displayed on the tty terminal when the "1" key is pressed on the tty terminal keyboard. 
262 A keyboard was not detected as being connected to the system's 
NOTE: Check for blown planar fuses or for a corrupted boot on disk drive 
263 Attempting a Normal mode restart from adapter feature ROM specified in NVRAM device list. 
269 Stalled state - the system is unable to IPL 
271 Mouse port POST. 
272 Tablet port POST. 
277 Auto Token-Ring LANstreamer MC 32 Adapter 
278 Video ROM Scan POST. 
279 FDDI adapter POST. 
280 3COM Ethernet POST. 
281 Keyboard POST executing. 
282 Parallel port POST executing 
283 Serial port POST executing 
284 POWER Gt1 graphadapte POST executing 
285 POWER Gt3 graphadapte POST executing 
286 Token Ring adapter POST executing. 
287 Ethernet adapter POST executing. 
288 Adapter card slots being queried. 
289 GTO POST. 
290 IOCC POST error (irrecoverable). 
291 Standard I/O POST running. 
292 SCSI POST running. 
293 IBM 7012 DBA disk POST running. 
294 IOCC bad TCW SIMM in slot location J being tested. 
295 Graphics Display adapter POST, color or grayscale. 
296 ROM scan POST. 
297 System model number does not compare between OCS and ROS 
(irrecoverable). Attempting a software IPL. 
298 Attempting a software IPL (warm boot). 
299 IPL ROM passed control to the loaded program code. 
301 Flash Utility ROM failed or checkstop occured (irrecoverable) 
302 Flash Utility ROM failed or checkstop occured (irrecoverable) 
302 Flash Utility ROM: User prompt, move the key to the service in order to perform an optional Flash Update. LED 
will only appear if the key switch is in the SECURE position. This signals the user that a Flash Update may be 
initiated by moving the key switch to the SERVICE position. If the key is moved to the SERVICE position, 
LED 303 will be displayed. This signals the user to press the reset button and select optional Flash Update. 
303 Flash Utility ROM: User prompt, press the reset button in order to perform an optional Flash Update. LED 
only appear if the key switch is in the SECURE position. This signals the user that a Flash Update may be initiated 
by moving the key switch to the SERVICE position. If the key is moved to the SERVICE position, LED 303 will be 
displayed. This signals the user to press the reset button and select optional Flash Update. 
304 Flash Utility ROM IOCC POST error (irrecoverable) 
305 Flash Utility ROM standard I/O POST running. 
306 Flash Utility ROM is attempting IPL from Flash Update Boot Image. 
307 Flash Utility ROM system model number does not compare between OCS and ROM (irrecoverable). 
308 Flash Utility ROM: IOCC TCW memory is being tested. 
309 Flash Utility ROM passed control to a Flash Update Boot Image. 
311 Flash Utility ROM CRC comparison error (irrecoverable). 
312 Flash Utility ROM RAM POST memory configuration error or no memory found ( iirecoverable). 
313 Flash Utility ROM RAM POST failure( irrecoverable). 
314 Flash Utility ROM Power status register failed (irrecoverable). 
315 Flash Utility ROM detected a low voltage condition. 
318 Flash Utility ROM RAM POST is looking for good memory. 
319 Flash Utility ROM RAM POST bit map is being generated. 
322 CRC error on media Flash Image. No Flash Update performed. 
323 Current Flash Image is being erased. 
324 CRC error on new Flash Image after Update was performed. (Flash Image is corrupted). 
325 Flash Image successful and complete. 

500 Querying Native I/O slot. 
501 Querying card in Slot 1 
502 Querying card in Slot 2 
503 Querying card in Slot 3 
504 Querying card in Slot 4 
505 Querying card in Slot 5 
506 Querying card in Slot 6 
507 Querying card in Slot 7 
508 Querying card in Slot 8 
510 Starting device configuration. 
511 Device configuration completed. 
512 Restoring device configuration files from media. 
513 Restoring basic operating system installation files from media. 
516 Contacting server during network boot 
517 Mounting client remote file system during network IPL. 
518 Remote mount of the root and /usr filesystems failed during network boot. 
520 Bus configuration running. 
521 /etc/init invoked cfgmgr with invalid options; /etc/init has been corrupted or incorrectly modified 
(irrecoverable error). 
522 The configuration manager has been invoked with conflicting options (irrecoverable error). 
523 The configuration manager is unable to access the ODM database (irrecoverable error). 
524 The configuration manager is unable to access the config rules object in the ODM database (irrecoverable error). 
525 The configuration manager is unable to get data from a customized device object in the ODM database 
(irrecoverable error). 
526 The configuration manager is unable to get data from a customized device driver objet in the ODM database 
(irrecoverable error). 
527 The configuration manager was invoked with the phase 1 flag; running phase 1 flag; running phase 1 at this point 
is not permitted (irrecoverable error). 
528 The configuration manager cannot find sequence rule, or no program was specified in the ODM database 
(irrecoverable error). 
529 The configuration manager is unable to update ODM data 
(irrecoverable error). 
530 The program "savebase" returned an error. 
531 The configuration manager is unable to access PdAt object class 
(irrecoverable eroor) 
532 There is not enough memory to continue (malloc failure); 
irrecoverable error. 
533 The configuration manager could not find a configure method for a device. 
534 The configuration manager is unable to aquire database lock. irrecoverable error. 
536 The configuration manager encountered more than one sequence rule specified in the same phase. (irrecoverable error). 
537 The configuration manager encountered an error when invoking the program in the sequence rule. 
538 The configuration manager is going to invoke a configuration 
539 The configuration method has terminated, and control has returned to the configuration manager. 
551 IPL Varyon is running 

552 IPL Varyon failed. 
553 IPL phase 1 is complete. 
554 Unable to define NFS swap device during network boot 
555 Unable to define NFS swap device during network boot 
556 Logical Volume Manager encountered error during IPL varyon. 
557 The root filesystem will not mount. 
558 There is not enough memory to continue the IPL. 
559 Less than 2MB of good memory are available to load the AIX kernel. 
570 Virtual SCSI devices being configured. 
571 HIPPI common function device driver being configured. 
572 HIPPI IPI-3 master transport driver being configured. 
573 HIPPI IPI-3 slave transport driver being configured. 
574 HIPPI IPI-3 transport services user interface device driver being configured. 
576 Generic async device driver being configured. 
577 Generic SCSI device driver being configured. 
578 Generic commo device driver being configured. 
579 Device driver being configured for a generic device. 
580 HIPPI TCPIP network interface driver being configured. 
581 Configuring TCP/IP. 
582 Configuring token ring data link control. 
583 Configuring an Ethernet data link control. 
584 Configuring an IEEE ethernet data link control. 
585 Configuring an SDLC MPQP data link control. 
586 Configuring a QLLC X.25 data link control. 
587 Configuring NETBIOS. 
588 Configuring a Bisync Read-Write (BSCRW). 
589 SCSI target mode device being configured. 
590 Diskless remote paging device being configured. 
591 Configuring an LVM device driver 
592 Configuring an HFT device driver 
593 Configuring SNA device drivers. 
594 Asynchronous I/O being defined or configured. 
595 X.31 pseudo device being configured. 
596 SNA DLC/LAPE pseudo device being configured. 
597 OCS software being configured. 
598 OCS hosts being configured during system reboot. 
599 Configuring FDDI data link control. 
5c0 Streams-based hardware drive being configured. 
5c1 Streams-based X.25 protocol being configured. 
5c2 Streams-based X.25 COMIO emulator driver being configured. 
5c3 Streams-based X.25 TCP/IP interface driver being configured. 
5c4 FCS adapter device driver being configured. 
5c5 SCB network device driver for FCS is being configured. 
5c6 AIX SNA channel being configured. 
600 Starting network boot portion of /sbin/rs.boot 
602 Configuring network parent devices. 
603 /usr/lib/methods/defsys 
/usr/lib/methods/cggsys, or 
/usr/lib/methods/cggbus failed. 
604 Configuring physical network boot device. 
605 Configuring physical network boot device failed. 
606 Running /usr/sbin/ifconfig on logical network boot device. 
607 /usr/sbin/ifconfig failed. 
608 Attempting to retrieve the client.info file with tftp. Note that a flashing 608 indicates multiple attempts 
to retrieve the client_info file are occuring. 
609 The client.info file does not exist or it is zero length. 
610 Attempting remote mount of NFS file system 
611 Remote mount of the NFS filesystem failed. 
612 Accessing remote files; unconfiguring network boot device. 
614 Configuring local paging devices. 
615 Configuring of a local paging device failed. 
616 Converting from diskette to dataless configuration. 
617 Diskless to dataless configuration failed. 
618 Configuring remote (NFS) paging devices. 
619 Configuration of a remote (NFS) paging device failed. 
620 Updating special device files and ODM in permanent filesystem with data from boot RAM filesystem. 
622 Boot process configuring for operating system installation. 

650 IBM SCSD disk drive drive being configured 
700 Progress indicator. A 1.1GB 8-bit SCSI disk drive being identified or configured. 
701 Progress indicator. A 1.1GB 16-bit SCSI SE disk drive being identified or configured. 
702 Progress indicator. A 1.1GB 16-bit SCSI differential disk drive being identified or configured. 
703 Progress indicator. A 2.2GB 8-bit SCSI disk drive being identified or configured. 
704 Progress indicator. A 2.2GB 16-bit SCSI SE disk drive being identified or configured. 
705 The configuration method for the 2.2GB 16-bit differential SCSI disk drive is being run. If a irrecoverableerror occurs, the system halts. identified or configured. 

706 Progress indicator. A 4.5GB 16-bit SE SCSI disk drive is being identified or configured. 
707 Progress indicator. A 4.5GB 16-bit differential SCSI drive is being identified or configured. 
708 Progress indicator: A L2 cache is being identified or configured. 
710 POWER GXT150M graphics adapterbeing ientifyied or configured. 
711 Unknown adapter being identified or configured. 
712 Graphics slot bus configuration is executing. 
713 The IBM ARTIC960 device is being configured. 
714 A video capture adapter is being configured. 
715 The Ultimedia Services audio adapter is being configured. This LED displays briefly on the panel. 
720 Unknown read/write optical drive type being configured. 
721 Unknown disk or SCSI device being identified or configured. 
722 Unknown disk being identified or configured. 
723 Unknown CDROM being identified or configured. 
724 Unknown tape drive being identified or configured. 
725 Unknown display being identified or configured. 
726 Unknown input device being idenor configured 
727 Unknown adync device being idenor configured



=========================================== 
10. Diskless machines, NFS Implementations:
===========================================


Setting up nfs, NetBSD
Setting up nfs, OpenBSD
Setting up nfs, FreeBSD
Setting up nfs, Mac OS X and Darwin
Setting up nfs, Linux
Setting up nfs, SunOS
Setting up nfs, Solaris
Setting up nfs, NEWS-OS
Setting up nfs, NEXTSTEP
Setting up nfs, HP-UX 7 (couldn't get it to work)
Setting up nfs, HP-UX 9
Setting up nfs, HP-UX 10 and later


--------------------------------------------------------------------------------

NetBSD and OpenBSD
If you have built your own kernel, you need to make sure you have the following in your config file: 
options         NFSSERVER
The GENERIC kernel distributed with NetBSD has this compiled in. 

# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Add the following lines to /etc/exports: 
#/etc/exports
/export/client/root -maproot=root:wheel    client.test.net
/export/client/swap -maproot=root:wheel    client.test.net
/export/client/usr  -maproot=nobody:nobody client.test.net
/export/client/home -maproot=nobody:nobody client.test.net

# ps -aux | grep mountd
If mountd is running, then kill -HUP that process to force it to reread /etc/exports. Otherwise, you'll need to start it:
# /usr/sbin/mountd


# ps -aux | grep nfsd
If the nfsdaemons are not running, then you need to start them:
# /usr/sbin/nfsd -tun 4 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

FreeBSD
The setup for FreeBSD 4.x is similar to NetBSD, but mountd needs different options and /etc/exports has a different format. 
# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Add the following line to /etc/exports (see the FreeBSD Handbook, Section 17.4 on NFS): 
#/etc/exports
/export/client/root /export/client/swap -maproot=root:wheel    client.test.net 

FreeBSD is unable to export multiple directories within a filesystem (such as /export) to a client unless all of the directories are listed on a single line in /etc/exports. 
You will also need to make sure the your client's /home and /usr are stored in /export/client/root. FreeBSD is unable to set different properties for exported directories, defeating the point of exporting those directories separately (and without -maproot=root:wheel). 


# ps -aux | grep mountd
If mountd is running, then kill that process. You need it to be running with the -r option for the swap file to be mountable, and the -2 option is to force it to use NFS V2.
# /sbin/mountd -2r


# ps -aux | grep nfsd
If the nfsdaemons are not running, then you need to start them:
# /sbin/nfsd -tun 4 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

Mac OS X and Darwin
This setup for Mac OS X and Darwin use the NetInfo system. There are ways to use typical BSD-style configuration files, but most systems are by default configured to use NetInfo. Here, we describe how to set up a default install of Mac OS X/Darwin (i.e. in its own local NetInfo domain). Read your netinfo(5) man page for more information. 

# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Modify the NetInfo database to export your shares. Note that you must escape the forward slashes in the path to your export twice. Once for the shell, and once for the NetInfo parser (since it uses forward slashes to delimit NetInfo properties). Just to add to the confusion, the NetInfo property we're adding to is called /exports. 
# nicl . -create /exports/\\/export\\/client\\/root opts maproot=root:wheel
# nicl . -create /exports/\\/export\\/client\\/root clients 192.168.0.10
# nicl . -create /exports/\\/export\\/client\\/swap opts maproot=root:wheel
# nicl . -create /exports/\\/export\\/client\\/swap clients 192.168.0.10
# nicl . -create /exports/\\/export\\/client\\/usr opts maproot=nobody:nobody
# nicl . -create /exports/\\/export\\/client\\/usr clients 192.168.0.10
# nicl . -create /exports/\\/export\\/client\\/home opts maproot=nobody:nobody
# nicl . -create /exports/\\/export\\/client\\/home clients 192.168.0.10

To later add another client for the same export, you would append to that property (as opposed to the initial create): 
# nicl . -append /exports/\\/export\\/client\\/root clients 192.168.0.12

To verify that everything looks good, read it back: 

# nicl . -read /exports/\\/export\\/client\\/root
name: /export/client/root
opts: maproot=root:wheel
clients: 192.168.0.10 192.168.0.12

# ps -aux | grep portmap
If the portmap is not running, then you need to start it:
# /usr/sbin/portmap 

# ps -aux | grep nfsd
If the nfsdaemons are not running, then you need to start them:
# /sbin/nfsd -t -u -n 6 

# ps -aux | grep mountd
If mountd is running, then kill -HUP that process to force it to reread the NetInfo database. If it's not running, then you need to start it:
# /usr/sbin/mountd 

Your system will always start the NFS daemons after reboots if the NetInfo /exports property is present. To remove all exports and prevent your system from starting NFS in the future, run:
# nicl . -delete /exports 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

Linux
# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Add the following lines to /etc/exports: 
#/etc/exports
/export/client/root client.test.net(rw,no_root_squash)
/export/client/swap client.test.net(rw,no_root_squash)
/export/client/usr client.test.net(rw,root_squash)
/export/client/home client.test.net(rw,root_squash)

# ps aux | grep mountd
If mountd is running, then kill -HUP that process. This will force it to reread the /etc/exports file. If it's not already running, then you need to:
# /sbin/rpc.mountd [--no-nfs-version 3]
You may need to add the --no-nfs-version 3 if you're having problems. See below. 

# ps aux | grep nfsd
If the nfsdaemons are running, then you need to restart them so that they reread the /etc/exports file. If they're not already running, then you need to:
# /sbin/rpc.nfsd 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Kernel NFS Problem: 

Most versions of linux only implement NFS2, in which case NetBSD will try NFS3 and then automatically fall back. Some versions (notably RedHat 6.0) will incorrectly answer both NFS2 and NFS3 mount requests, then ignore any attempt to access the filesystem using NFS3. This causes untold pain and hassle.

The workaround is to kill mountd and start it with options preventing NFS3 problems (i.e., rpc.mountd --no-nfs-version 3). 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

SunOS
# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Create (or add to) your /etc/exports file:

#/etc/exports
/export/client/root -root=client
/export/client/swap -root=client
/export/client/usr
/export/client/home

# rm -f /etc/xtab;touch /etc/xtab 

# exportfs -a 

# ps aux | grep nfsd
If nfsd not already running, then run:
# nfsd 8 & 

# ps aux | grep mountd
If mountd is not already running, then run:
# rpc.mountd -n & 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

Solaris
# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Add the following lines to /etc/dfs/dfstab: 
share -F nfs -o root=client /export/client/root
share -F nfs -o root=client /export/client/swap
share -F nfs -o rw=client   /export/client/usr
share -F nfs -o rw=client   /export/client/home
Be certain to use names, if you use numeric IP addresses, Solaris will deny access without any error messages. 


# /usr/bin/ps -ef | grep nfs
If the nfs daemons are running, then you merely need to run:
# shareall
Normally, you'd need to run unshareall;shareall, but you've only added entries, not deleted anything. 
If the nfs daemons aren't running, then you will need to run:
# /etc/init.d/nfs.server start 

If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

NEWS-OS
# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Create (or add to) your /etc/exports file:

#/etc/exports
/export/client/root -root=client
/export/client/swap -root=client
/export/client/usr
/export/client/home

# rm -f /etc/xtab;touch /etc/xtab 

# /usr/etc/exportfs -av 

# ps -aux | grep nfsd
If nfsd not already running, then run:
# /etc/nfsd 4 & 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

NEXTSTEP
Note, NEXTSTEP doesn't support exporting a file. This means that swap will have to be a file on your root (nfs) filesystem, and not its own nfs mounted file. Keep this in mind in later steps involving swap. 
You may also wish to keep with NEXTSTEP convention and place all of your client files in /private/export/client instead of /export/client. 


# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/root/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Launch /NextAdmin/NFSManager.app 

Click on the "Export From ..." menu item 

Select your NetInfo Domain (probably /) and click OK. 

Click on the top Add button to pick your Directory Name


Type in your client's name under "Root Access" and click that "Add" button. 

Click OK. If your client doesn't have a DNS or /etc/hosts entry, NEXTSTEP will not serve correctly. 

Click the "Quit" menu item. 
For reference, here is a snapshot of what the NFSManager Exported Directories window should look like. 

If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

HP-UX 7
I couldn't get the HP-UX 7 rpc.mountd to start. Here's what I tried, if you think it might work for you. Let us know what we're doing wrong. 
I don't think HP-UX 7's NFS server allows for restricting root read/write access. 


# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Add the following lines to /etc/exports: 
#/etc/exports
/export/client/root client.test.net
/export/client/swap client.test.net
/export/client/usr  client.test.net
/export/client/home client.test.net

# ps -ef | grep nfsd
If they're not running, then run:
# /etc/nfsd 4 

Make sure the rpc.mountd in /etc/inetd.conf is uncommented 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

HP-UX 9
# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Open sam and make sure that the kernel has NFS support compiled in.
Kernel Configuration -> Subsystems, NFS/9000
This will require a reboot if it's not. 

Add the following lines to /etc/exports: 
#/etc/exports
/export/client/root   -root=client.test.net
/export/client/swap   -root=client.test.net
/export/client/usr  -access=client.test.net
/export/client/home -access=client.test.net

# ps -ef | grep mountd
If mountd is not already running, then run:
# /usr/etc/rpc.mountd 

# ps -ef | grep nfsd
If nfsd isn't already running, then run:
# /etc/nfsd 4 

# /usr/etc/exportfs -a 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 


--------------------------------------------------------------------------------

HP-UX 10
# mkdir -p /export/client/root/dev 

# mkdir /export/client/usr 

# mkdir /export/client/home 

# touch /export/client/swap 

# cd /export/client/root 

# tar [--numeric-owner] -xvpzf /export/client/NetBSD-release/binary/sets/kern.tgz 

# mknod /export/client/root/dev/console c 0 0 

Edit /etc/rc.config.d/nfsconf and make sure that:
NFS_SERVER=1
START_MOUNTD=1
If those are not set, then you will need to run:
# /sbin/init.d/nfs.server start 

Add the following lines to /etc/exports: 
#/etc/exports
/export/client/root   -root=client.test.net
/export/client/swap   -root=client.test.net
/export/client/usr  -access=client.test.net
/export/client/home -access=client.test.net

# /usr/sbin/exportfs -a 
If the server isn't running the NFS daemons, the client will print: 

le(0,0,0,0): Unknown error: code -1
boot: Unknown error: code -1
If the server is running NFS, but isn't exporting the root directory to the client, the client will print: 
boot: no such file or directory
If everything is working properly, you will see a few numbers and a spinning cursor on the client. This means you have succeeded! At this point, your client isn't bootable. If you let it continue, it will panic when attempting to start init. 
Continue on to setting up the client filesystem 



#######################################################################
SOME MORE INFO ON ERROR CODES:
#######################################################################


====================================
SECTION 1: IBM lpar reference codes:
====================================

(A2xx, B2xx) Logical partition reference codes

When the server posts these SRCs, you can find them in the Serviceable Event View or the view that you use 
to see informational logs (such as the Product Activity Log or ASM).

Characters 3 and 4 of word 1 are the partition ID of the logical partition with the problem. 
If the SRC begins with A2xx, no service action is required. If the SRC begins with B2xx, find 
the next 4 characters of the SRC (called the unit reference code) in the following table.
Table 1. (A2xx, B2xx) Logical partition reference codes

Reference Code Description/Action Perform all actions before exchanging Failing Items Failing Item 
1150 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

This is a partitioning configuration problem. The LPARCFG Symbolic FRU will help correct the problem.

If the problem persists, call your next level of support.
 LPARCFG
 
1225 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

The partition attempted to IPL prior to the platform fully initializing. Retry the partition IPL after the platform IPL 
has fully completed and the platform is not in standby mode. If that IPL fails, call your next level of support.
 SVCDOCS
 

1230 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

This is a partitioning configuration problem. The partition is lacking the necessary resources to IPL.

This error might occur when you shut down a partition that is set to automatically IPL and then turn the managed system off 
and back on. When the partition automatically IPLs, it uses the resources specified in PHYP NVRAM, and this error 
occurs when the server does not find the exact resources specified in NVRAM. The solution is to activate the partition 
by using the partition profile on the HMC. The HMC applies the values in the profile to NVRAM. When the partition IPLs, 
it uses the resources specified in the profile.
 LPARCFG
LICCODE
 
1260 A problem occurred during the IPL of a partition. 
The partition could not IPL at the Timed Power On setting because the IPL setting of the partition was not set to Normal. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).
 SVCDOCS
 
1265 A problem occurred during the IPL of a partition. 
The partition could not IPL. The partition ID is characters 3 and 4 of the B2xx reference code in 
word 1 of the SRC (in hexadecimal format). If characters 3 and 4 are both zero, then the partition ID is 
in extended word one as LP=xxx (in decimal format).

An operating system MSD IPL was attempted with the IPL side on D-mode. This is not a valid operating system IPL scenario, 
and the IPL will be halted. This SRC is usually seen when a D-mode SLIC install fails and attempts an MSD.
 SVCDOCS
 
1266 A problem occurred during the IPL of a partition. 
The partition could not IPL. The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC 
(in hexadecimal format). If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

You are attempting to IPL an operating system that is not supported.
 SVCDOCS
 
1280 A problem occurred during a partition Main Storage Dump. 
A mainstore dump IPL did not complete due to configuration mismatch. Contact your next level of support.
 NEXTLVL
 
1281 A partition memory error occurred 
An attempt to perform a partition dump failed. A partition memory error occurred. The failed memory will 
no longer be used. The partition dump was terminated. The partition ID is in extended word one as 
LP=xxx (in decimal format). Re-IPL the partition.
 SVCDOCS
 
1310 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
No alternate (D-mode) IPL IOP was selected. The IPL will attempt to continue, but there may not be enough 
information to find the correct D-mode load source.

Have the customer configure an alternate IPL IOP for the partition. Then retry the partition IPL.
 SVCDOCS
 
1320 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
No default load source IOP was selected for an A/B-mode IPL. The IPL will attempt to continue, but there may 
not be enough information to find the correct load source.

Have the customer configure a load source IOP for the partition. Then retry the partition IPL.
 SVCDOCS
 
1321 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
The IOA for the load source device needed an IOP, and none was detected. Check your LPAR configuration and make sure 
the correct slot is specified for the IPL load source. Then retry the partition IPL.
 SVCDOCS
 
1322 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
During the partition IPL, code tried to determine if the device in a slot was an I/O Processor or an I/O Adapter. 
That check failed. Check your LPAR configuration and make sure that the correct slot is specified for the IPL load source. 
Then retry the partition IPL. If this does not resolve the problem, perform LICIP15.
 SVCDOCS
 
2048 A problem occurred during a partition Main Storage Dump. 
A mainstore dump IPL did not complete due to a copy error. Contact your next level of support.
 NEXTLVL
 
2058 A problem occurred during a partition Main Storage Dump. 
A mainstore dump IPL did not complete due to a copy error. Contact your next level of support.
 NEXTLVL
 
2250, 2300 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A slot that was needed for the partition was unavailable. See the Symbolic FRU SLOTUSE for more information on the cause of this error.
 SLOTUSE
 
2310, 2320, 2425 to 2426 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
The platform LIC for this partition attempted an operation. There was a failure. Contact your next level of support.
 NEXTLVL
 
2475 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A slot that was needed for the partition was either empty or the device in the slot has failed. See the Symbolic 
FRU SLOTUSE for more information on the cause of this error.

If you have a RAID enablement card (CCIN 5709) on your system, it will disable an embedded SCSI adapter. If that embedded 
slot is called out in the error, you can safely ignore this error.
 SLOTUSE
 
2485 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
The platform LIC for this partition attempted an operation. There was a failure. Contact your next level of support.
 NEXTLVL
 
3000 System log entry only, no service action required 
A user requested an immediate termination and main store dump of a partition. The partition ID is in extended word 
one as LP=xxx in decimal format.
 
 
3081 A problem occurred during the IPL of a partition. 
IPL did not complete due to a copy error. Contact your next level of support.
 LICCODE
 
3110 A problem occurred during the IPL of a partition. 
The search for a valid load source device was exhausted. The partition ID is characters 3 and 4 of the B2xx reference code 
in word 1 of the SRC (in hexadecimal format). If characters 3 and 4 are both zero, then the partition ID is in extended word 
one as LP=xxx (in decimal format). Perform LICIP15.
 SVCDOCS
 
3113 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A problem occurred on the path to the load source for the partition.

If present, look in the Serviceable Event View for a B7xx xxxx during the partition's IPL. Correct that error and retry the partition IPL.
 SVCDOCS
 
3114 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

The B2xx xxxx SRC Format is Word 1: B2xx3114, Word 3: Bus, Word 4: Board, Word 5: Card.
 NEXTLVL
 
3120 System log entry only, no service action required 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
Retry count exceeded. This is logged for each unsuccessful attempt to IPL with a loadsource candidate. 
If the IPL fails, look for other serviceable errors.
 
 
3123 System log entry only, no service action required 
 
3125 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

This is a platform LIC main store utilization problem. The platform LIC could not obtain a segment of main storage 
within the platform's main store to use for managing the creation of a partition.
 LICCODE
 
3128 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
An unexpected failure return code was returned when attempting to query the IOA slots that are assigned to an IOP.

Look for B700 69xx errors in the Serviceable Event View and work those errors.
 NEXTLVL
 
3130 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
If word 3 is zero, then this SRC is informational and can be ignored.

Otherwise there is a problem in the platform LIC. A nonzero bus number has no associated bus object.

Look for B700 69xx errors in the Serviceable Event View and work those errors.

If there are no serviceable B700 69xx errors, or if correcting the errors did not correct this problem, contact your next level of support.
 NEXTLVL
 
3135 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
An unknown bus type was detected.
 NEXTLVL
 
3140 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
The load source IOP is not owned by the partition. This is a configuration problem in the partition. 
Have the customer reconfigure the partition to have the intended load source IOP.

If there is not a configuration problem then contact your next level of support.
 SVCDOCS
 
3141 System log entry only, no service action required 
The IOP in the slot used for the last successful IPL of the operating system was replaced with an I/O Adapter. 
The IPL will continue by searching for a valid load source device.

Check the LPAR configuration if required, and ensure that the tagged I/O for the partition is correct.
 
 
3142 System log entry only, no service action required 
The I/O Adapter in the slot used for the last successful IPL of the operating system was replaced with an I/O Processor. 
The IPL will continue by searching for a valid load source device.

Check the LPAR configuration if required, and ensure that the tagged I/O for the partition is correct.
 
 
3143 System log entry only, no service action required 
The I/O Adapter in the slot used for the last successful IPL of the operating system was removed. 
The IPL will continue by searching for a valid load source device.

Check the LPAR configuration if required, and ensure that the tagged I/O for the partition is correct.
 
 
3144 System log entry only, no service action required 
The I/O Processor in the slot used for the last successful IPL of the operating system was removed. 
The IPL will continue by searching for a valid load source device.

Check the LPAR configuration if required, and ensure that the tagged I/O for the partition is correct.
 
 
3200 System log entry only, no service action required 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

Look for a SRC in the Serviceable Event View logged at the time the partition was performing an IPL.

This error indicates a failure during a search for the load source. There may be a number of these failures 
prior to finding a good load source. This is normal. If a B2xx3110 error is logged, a B2xx3200 may be posted to the control panel. 
Work the B2xx3110 error in the Serviceable Event View. If the system IPL hangs at B2xx3200 and you cannot check the SRC history, 
perform the actions indicated for the B2xx3110 SRC.
 
 
4158 System log entry only, no service action required 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

Look for a SRC in the Serviceable Event View logged at the time the partition was performing an IPL.

This error indicates a failure during a search for the load source. It is usual for a number of these failures 
to occur prior to finding a valid load source. This is normal. If a B2xx3110 error is logged, a B2xx3200 may be 
posted to the control panel. Work the B2xx3110 error in the Serviceable Event View. If the system IPL hangs at B2xx3200 
and you cannot check the SRC history, perform the actions indicated for the B2xx3110 SRC.
 
 
5106 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There is not enough space to contain the partition main storage dump.

Contact your next level of support.
 NEXTLVL
 
5109 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was a partition main storage dump problem. Contact your next level of support.
 NEXTLVL
 
5114 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There is not enough space to contain the partition main storage dump. Contact your next level of support.
 NEXTLVL
 
5115 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was an error reading the partition's main storage dump from the partition's load source into main storage.
 NEXTLVL
 
5117 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A partition main storage dump has occurred but cannot be written to the load source device because a valid dump already exists.

Use the Main Storage Dump Manager to rename or copy the current main storage dump.
 SVCDOCS
 
5121 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was an error writing the partition's main storage dump to the partition's load source.
 NEXTLVL
 
5122 to 5123 System log entry only, no service action required 
A problem occurred during the IPL of a partition.

The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

An error occurred when writing the partition's main storage dump to the partition's load source. No service action required.
 
 
5135 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was an error writing the partition's main storage dump to the partition's load source.
 NEXTLVL
 
5137 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was an error writing the partition's main storage dump to the partition's load source.
 NEXTLVL
 
5145 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was an error writing the partition's main storage dump to the partition's load source.
 NEXTLVL
 
5148 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
An error occurred while doing a main storage dump that would have caused another main storage dump.

Contact your next level of support.
 NEXTLVL
 
6006 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A platform LIC error occurred when the partition's memory initialized. The IPL will not continue.

Contact your next level of support.
 NEXTLVL
 
6012 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
The partition's LID failed to completely load into the partition's mainstore area.

Contact your next level of support.
 NEXTLVL
 
6015 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

The load source media is corrupted or not valid.
 LSERROR
 
6025 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

This is a problem with the load source media being corrupt or not valid.
 LSERROR
 
6027 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format).

A failure occurred when allocating memory for an internal object used for LID load operations. Ensure the partition 
was allocated enough main storage, verify that no memory leaks are present, and then retry the operation.
 NEXTLVL
 
6110 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
Error on load source device.
 LSERROR
 
690A A problem occurred during the IPL of a partition. 
An error occurred while copying Open Firmware into the partition load area. Contact your next level of support.
 NEXTLVL
 
7200 System log entry only, no service action required 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
An error condition was encountered when communicating with the load source I/O Processor for the partition 
identified in the xx field of the B2xx SRC.

This informational error indicates a failure resetting the I/O Processor in the preceding B2xx3200 error. 
This may be normal. If there is a hardware failure there will be a different serviceable event. If the system IPL hangs 
at B2xx7200 and you cannot check the SRC history, perform the actions indicated for the B2xx3110 SRC.
 
 
8080 System log entry only, no service action required 
 
8081 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
An internal LIC timeout has occurred. The partition may continue to IPL but it may experience problems while running.
 LICCODE
 
8105 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was a failure loading the VPD areas of the partition. Possible causes are: 

Corrupted/unsupported load source media 
Insufficient resources allocated to the partition 
Unsupported partition configuration by the operating system
If the problem is due to media, replace the load source media. If the problem is due to insufficient resources, 
allocate enough resources to the partition. If the problem is due to unsupported partition configuration, 
correct the partition configuration.
 SVCDOCS
 
8107 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was a problem getting a segment of main storage in the platform's main store.
 LICCODE
 
8109 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A failure occurred. The IPL is terminated. Ensure that there is enough memory to IPL the partition.
 LICCODE
 
8112 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A failure occurred. The IPL is terminated.
 LICCODE
 
8113 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A problem occurred on the path to the load source for the partition.

There was an error mapping memory for the partition's IPL. Call your next level of support.
 LICCODE
 
8114 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A problem occurred on the path to the load source for the partition.

There was a failure verifying VPD for the partition's resources during IPL. Call your next level of support.
 LICCODE
 
8115 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was a low level partition to partition communication failure.
 LICCODE
 
8117, 8121, 8123, 8125, 8127, 8129 A problem occurred during the IPL of a partition. 
Partition did not IPL due to platform Licensed Internal Code error.

Contact your next level of support.
 NEXTLVL
 
813A A problem occurred during the IPL of a partition. 
Ensure that the console device cables are connected properly. If the cables are already connected properly, replace the cables. 
Re-IPL the partition. If the problem reoccurs, contact your next level of support.
 SVCDOCS
 
A100 to A101 A problem occurred after a partition ended abnormally. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
This partition could not stay running and shut itself down.

Work any error logs in the Serviceable Event View. If there are no errors, contact your next level of support.
 SVCDOCS
 
B07B System log entry only, no service action required 
 
B215 A problem occurred after a partition ended abnormally. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was a communications problem between this partition's service processor and the platform's service processor.

The platform will need to be re-IPLed before that partition can be used. Call your next level of support.
 NEXTLVL
 
C1F0 A problem occurred during a power off a partition 
Internal platform Licensed Internal Code error occurred during partition shutdown or re-IPL.

Contact your next level of support.
 NEXTLVL
 
D150 A problem occurred after a partition ended abnormally. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
There was a communications problem between this partition and code that handles resource allocation. Call your next level of support.
 LICCODE
 
E0AA A problem occurred during the IPL of a partition. 
Ensure that the console device cables are connected properly. If the cables are already connected properly, replace the cables. 
Re-IPL the partition. If the problem reoccurs, contact your next level of support.
 SVCDOCS
 
F001 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). An operation has timed out.

Ignore this error if there are other serviceable errors. Work those error logs for this partition and for the platform 
from the Serviceable Event View. If there are no errors, contact your next level of support.
 SVCDOCS
 
F003 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
Partition processors did not start LIC within the timeout window.

Capture a Partition Dump and call your next level of support.
 NEXTLVL
 
F004 A system request to power off a partition failed 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
The partition did not respond to a system request to power off the partition. This partition had a communications problem.

If the partition is an i5/OS partition, capture a Partition Dump. Contact your next level of support.
 NEXTLVL
 
F005 A system request to power off a partition failed 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
The partition did not respond to a system request to power off the partition. This partition had a communications problem.

If the partition is an i5/OS partition, perform a Partition Dump and contact your next level of support.

For all other partition types, a Partition Dump is not supported. If the system is Hardware Management Console (HMC) 
or Integrated Virtualization Manager (IVM) controlled, do an immediate partition power off. If the system is not HMC 
or IVM controlled, perform a Function 8 on the control panel. After the partition has powered off, re-IPL the partition, 
collect error logs and contact your next level of support.
 NEXTLVL
 
F006 A problem occurred during the IPL of a partition. 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
The code load operation for the partition's IPL timed out.

Work any error logs for this partition in the Serviceable Event View. If there are no errors, contact your next level of support.
 SVCDOCS
 
F007 A problem occurred during a power off a partition 
The partition ID is characters 3 and 4 of the B2xx reference code in word 1 of the SRC (in hexadecimal format). 
If characters 3 and 4 are both zero, then the partition ID is in extended word one as LP=xxx (in decimal format). 
A problem occurred on the path to the load source for the partition.

A timeout occurred during the process of trying to stop a partition from running. Contact your next level of support.
 LICCODE
 
F008 A problem occurred during the IPL of a partition. 
During an IPL, a timeout occurred while waiting for a ready message from the partition.

Look for other errors. Re-IPL the partition to recover.
 
 
F009 System log entry only, no service action required 
During an IPL, a timeout occurred while waiting for a response to a message.

Look for other errors. Re-IPL the partition to recover.
 
 
F00A to F00B System log entry only, no service action required 
During an IPL, a timeout occurred while waiting for a response to a message.

Look for other errors. If this SRC is displayed in the operator panel, then panel function 34 might be used to retry 
the current IPL while the partition is still in the failed state.
 
 
F00C System log entry only, no service action required 
During an IPL, a timeout occurred while waiting for a response to a message.

Look for other errors. If this SRC is displayed in the operator panel, then panel function 34 might be used to retry 
the current IPL while the partition is still in the failed state.
 
 
F00D Timeout occurred during a main store dump IPL 
If this SRC is displayed in the operator panel during a main store dump IPL, then panel function 34 might be used to retry 
the current main store dump IPL while the partition is still in the failed state.
 


#################################################################################


==========================================================
SECTION 2: IBM Partition firmware reference (error) codes:
==========================================================


(BAxx) Partition firmware reference (error) codes
The partition firmware detected a failure. The first eight characters in the display represent the SRC. 
Any additional characters represent the associated location code. Record the location code as well as the reference code, 
then find the SRC in the following table.

Table 1. (BAxx) Partition firmware reference (error) codes

Reference Code Description/Action Perform all actions before exchanging Failing Items Failing Item 
BA000010 The device data structure is corrupted FWFLASH 
BA000020 Incompatible firmware levels were found 
Reflash the platform firmware.
  
BA000030 An lpevent communication failure occurred FWFLASH 
BA000032 The firmware failed to register the lpevent queues FWFLASH 
BA000034 The firmware failed to exchange capacity and allocate lpevents FWFLASH 
BA000038 The firmware failed to exchange virtual continuation events FWFLASH 
BA000040 The firmware was unable to obtain the RTAS code lid details FWFLASH 
BA000050 The firmware was unable to load the RTAS code lid FWFLASH 
BA000060 The firmware was unable to obtain the open firmware code lid details FWFLASH 
BA000070 The firmware was unable to load the open firmware code lid FWFLASH 
BA000080 The user did not accept the license agreement 
There is no further action required. If the user did not accept the license agreement, the system will not function.
  
BA000081 Failed to get the firmware license policy FWFLASH 
BA000082 Failed to set the firmware license policy FWFLASH 
BA010000 There is insufficient information to boot the partition FWIPIPL 
BA010001 The client IP address is already in use by another network device FWIPIPL 
BA010002 Cannot get gateway IP address 
If the system is a model 185 or A50, refer to partition firmware progress code E174. 
For all other systems, refer to partition firmware progress code CA00E174.
 FWHOST 
BA010003 Cannot get server hardware address 
If the system is a model 185 or A50, refer to partition firmware progress code E174. 
For all other systems, refer to partition firmware progress code CA00E174.
 FWHOST 
BA010004 Bootp failed  
BA010005 File transmission (TFTP) failed 
Refer to partition firmware progress code CA00E174
 FWHOST
FWADIPL
 
BA010006 The boot image is too large FWADIPL 
BA020001 Partition firmware password entry error 
Reenter the password.
  
BA020009 Invalid password entered - system locked 
A password was entered incorrectly three times. Deactivate the partition using the HMC, 
then reactivate it. When asked for the password, enter the correct password.
  
BA030011 RTAS attempt to allocate memory failed FWFWPBL 
BA04000F Self test failed on device; no error or location code information available 
If there was a location code reported with the error, replace the device specified by the location code.
 NEXTLVL 
BA040010 Self test failed on device; cannot locate package NEXTLVL 
BA040020 The machine type and model are not recognized by the server firmware 
Check for server firmware updates, and apply them, if available.
 NEXTLVL 
BA040030 The firmware was not able to build the UID properly for this system. As a result, 
problems may occur with the licensing of the AIX® operating system. 
Using the Advanced System Management Interface (ASMI) menus, ensure that the machine type, 
model, and serial number in the VPD for this system are correct. 
If this is a new system, check for server firmware updates and apply them, if available.
  
BA040035 The firmware was unable to find the "plant of manufacture" in the VPD. This may cause problems 
with the licensing of the AIX operating system. 
Verify the that machine type, model, and serial number are correct for this system. If this is a new system, 
check for server firmware updates and apply them, if available.
  
BA040040 Setting the machine type, model, and serial number failed. FWFWPBL 
BA040050 The h-call to switch off the boot watchdog timer failed. FWFWPBL 
BA040060 Setting the firmware boot side for the next boot failed. FWFWPBL 
BA050001 Rebooting a partition in logical partition mode failed. FWFWPBL 
BA050004 Locating a service processor device tree node failed. FWFWPBL 
BA05000A Failed to send boot failed message to the service processor FWFWPBL 
BA060003 IP parameter requires 3 period (.) characters 
Enter a valid IP parameter. Example: 000.000.000.000
  
BA060004 Invalid IP parameter 
Enter a valid IP parameter. Example: 000.000.000.000
  
BA060005 Invalid IP parameter (>255) 
Enter a valid IP parameter. Example: 000.000.000.000
  
BA060007 A keyboard was not found 
Make sure that a keyboard is attached to the USB port that is assigned to the partition. 
Replace the USB card to which the keyboard is attached.
  
BA060008 No configurable adapters found by the remote IPL menu in the System Management Services (SMS) utilities 
This error occurs when the remote IPL menu in the SMS utilities cannot locate any LAN adapters that 
are supported by the remote IPL function.
 FWRIPL 
BA06000B The system was not able to find an operating system on the devices in the boot list. 
See Problems with loading and starting the operating system (AIX and Linux®)
  
BA06000C A pointer to the operating system was found in non-volatile storage. FWPTR 
BA060020 The boot-device environment variable exceeded the allowed character limit. FWNIM 
BA060021 The boot-device environment variable contained more than five entries. FWNIM 
BA060022 The boot-device environment variable contained an entry that exceeded 255 characters in length FWNIM 
BA060030 Logical partitioning with shared processors is enabled and the operating system does not support it. 
Install or boot a level of the operating system that supports shared processors. 
Disable logical partitioning with shared processors in the operating system.
  
BA060040 The system or partition is configured to use huge pages, but the operating system image does not support huge pages. 
Do one of the following:

Install a newer version of the operating system that supports huge pages. 
Use the ASMI to remove the huge pages.
 
BA060050 The Hypervisor supports dynamic partitioning of the huge page-type of memory allocation, 
but dynamic partitioning of huge pages is not supported. 
Use the ASMI to disable dynamic partitioning of huge pages.
  
BA060060 The operating system expects an IOSP partition, but the operating system failed to make the transition to alpha mode. 
Ensure that the alpha-mode operating system image is intended for this partition. 
Ensure that the configuration of the partition supports an alpha-mode operating system.
  
BA060061 The operating system expects a non-IOSP partition, but the operating system failed to make the transition to MGC mode. 
Ensure that the nonalpha-mode operating system image is intended for this partition. 
Ensure that the configuration of the partition supports a nonalpha-mode operating system.
  
BA07xxxx SCSI controller failure FWSCSI1 
BA080001 An IDE device remained busy for a longer period than the time out period FWFWPBL 
BA080002 The IDE controller senses IDE devices but with errors. 
Verify that the IDE devices are properly seated and cabled correctly 
Replace the IDE controller (model-dependent)
  
BA080010 An IDE device is busy longer than specified time-out period. 
Retry the operation.
 FWIDE1 
BA080011 An IDE command timed out; command is exceeding the period allowed to complete. 
Retry the operation.
 FWIDE1 
BA080012 The ATA command failed FWIDE2 
BA080013 The media is not present in the tray 
Retry the operation.
 FWIDE1 
BA080014 The media has been changed 
Retry the operation.
 FWIDE1 
BA080015 The packet command failed; the media might not be readable. 
Retry the operation.
 FWIDE1 
BA09xxxx SCSI controller failure. 
This checkpoint might remain in the control panel for up to 15 minutes If the checkpoint persists 
longer than 15 minutes, do the following:

Power off the server and reboot from the permanent side. Reject the firmware image on the temporary side. 
If the problem persists, before replacing any components, refer to the actions for BA090001.
  
BA090001 SCSI disk unit: test unit ready failed; hardware error FWSCSI1 
BA090002 SCSI disk unit: test unit ready failed; sense data available FWSCSI2 
BA090003 SCSI disk unit: send diagnostic failed; sense data available FWSCSI3 
BA090004 SCSI disk unit: send diagnostic failed: devofl command FWSCSI3 
BA100001 SCSI tape: test unit ready failed; hardware error FWSCSI1 
BA100002 SCSI tape: test unit ready failed; sense data available FWSCSI4 
BA100003 SCSI tape: send diagnostic failed; sense data available FWSCSI3 
BA100004 SCSI tape: send diagnostic failed: devofl command FWSCSI3 
BA110001 SCSI changer: test unit ready failed; hardware error FWSCSI1 
BA110002 SCSI changer: test unit ready failed; sense data available FWSCSI4 
BA110003 SCSI changer: send diagnostic failed; sense data available FWSCSI3 
BA110004 SCSI changer: send diagnostic failed: devofl command FWSCSI3 
BA120001 On an undetermined SCSI device, test unit ready failed; hardware error FWSCSI5 
BA120002 On an undetermined SCSI device, test unit ready failed; sense data available FWSCSI4 
BA120003 On an undetermined SCSI device, send diagnostic failed; sense data available FWSCSI4 
BA120004 On an undetermined SCSI device, send diagnostic failed; devofl command FWSCSI4 
BA130001 SCSI CD-ROM: test unit ready failed; hardware error FWSCSI1 
BA130002 SCSI CD-ROM: test unit ready failed; sense data available FWSCSI3 
BA130003 SCSI CD-ROM: send diagnostic failed; sense data available FWSCSI3 
BA130004 SCSI CD-ROM: send diagnostic failed: devofl command FWSCSI3 
BA130010 USB CD-ROM: device remained busy longer than the time-out period 
Retry the operation.
 FWFWPBL 
BA130011 USB CD-ROM: execution of ATA/ATAPI command was not completed within the allowed time. 
Retry the operation.
 FWCD1 
BA130012 USB CD-ROM: execution of ATA/ATAPI command failed. 
Verify that the power and signal cables going to the USB CD-ROM are properly connected and are not damaged. 
If any problems are found, correct them, then retry the operation. 
If the problem persists, the CD in the USB CD-ROM drive might not be readable. Remove the CD and insert another CD.
 NEXTLVL 
BA130013 USB CD-ROM: bootable media is missing from the drive 
Insert a bootable CD-ROM in the USB CD-ROM drive, then retry the operation.
 FWCD1 
BA130014 USB CD-ROM: the media in the USB CD-ROM drive has been changed. 
Retry the operation.
 FWCD2 
BA130015 USB CD-ROM: ATA/ATAPI packet command execution failed. 
If the problem persists, the CD in the USB CD-ROM drive might not be readable. Remove the CD and insert another CD.
 FWCD2 
BA131010 The USB keyboard was removed. 
Plug in the USB keyboard and reboot the partition. 
Check for system firmware updates and apply them, if available.
  
BA140001 SCSI read/write optical: test unit ready failed; hardware error FWSCSI1 
BA140002 SCSI read/write optical: test unit ready failed; sense data available FWSCSI1 
BA140003 SCSI read/write optical: send diagnostic failed; sense data available FWSCSI3 
BA140004 SCSI read/write optical: send diagnostic failed; devofl command FWSCSI3 
BA150001 PCI Ethernet BNC/RJ-45 or PCI Ethernet AUI/RJ-45 adapter: internal wrap test failure 
Replace the adapter specified by the location code.
  
BA151001 10/100 MBPS Ethernet PCI adapter: internal wrap test failure 
Replace the adapter specified by the location code.
  
BA151002 10/100 MBPS Ethernet card FWENET 
BA153002 Gigabit Ethernet adapter failure 
Verify that the MAC address programmed in the FLASH/EEPROM is correct.
  
BA153003 Gigabit Ethernet adapter failure 
Check for adapter firmware updates; apply if available. 
Remove other cards from the PHB in which the gigabit Ethernet adapter is plugged and retry the operation. 
If the operation is successful, plug the cards in again, one at a time, until the failing card is isolated. 
After you identify the failing card, replace it. 
Replace the adapter.
  
BA160001 PCI auto LANstreamer™ token ring adapter: failed to complete hardware initialization. 
Replace the adapter specified by the location code.
  
BA161001 PCI token ring adapter: failed to complete hardware initialization. 
Replace the adapter specified by the location code.
  
BA170xxx NVRAM problems FWNVR1 
BA170000 NVRAMRC initialization failed; device test failed FWNVR2 
BA170100 NVRAM data validation check failed 
Turn off, then turn on the system.
 FWNVR2 
BA170201 The firmware was unable to expand target partition - saving configuration variable FWNVR1 
BA170202 The firmware was unable to expand target partition - writing error log entry FWNVR1 
BA170203 The firmware was unable to expand target partition - writing VPD data FWNVR1 
BA170210 Setenv/$Setenv parameter error - name contains a null character FWNVR1 
BA170211 Setenv/$Setenv parameter error - value contains a null character FWNVR1 
BA170220 The firmware was not able to write a variable value into NVRAM because not enough space exists in NVRAM. 
Do the following:

Reduce the number of partitions, if possible, so that each of the remaining partitions has more NVRMA allocated to it. 
Contact your next level of support.
  
BA170221 The setenv/$setenv function had to delete network boot information to free space in NVRAM. 
You might need to use the SMS menus to reenter the parameters for network installation or boot.
  
BA170998 NVRAMRC script evaluation error - command line execution error. FWNVR3 
BA170999 NVRAMRC script evaluation error - stack unbalanced on completion. 
This is a firmware debug environment error. There is no user action or FRU replacement for this error.
 NEXTLVL 
BA180008 PCI device Fcode evaluation error. FWPCI1 
BA180009 The Fcode on a PCI adapter left a data stack imbalance 
You should load the new adapter Fcode before you use the adapter (specified by the location code 
associated with this error) for booting.
 FWPCI1 
BA180010 PCI probe error, bridge in freeze state FWPCI2 
BA180011 PCI bridge probe error, bridge is not usable FWPCI3 
BA180012 PCI device runtime error, bridge in freeze state FWPCI3 
BA180013 A PCI adapter was found that this machine type and model does not support. 
Is the system an IBM® Intellistation model?

Yes: Complete the following steps. 
Check for and apply any available server firmware udpates. 
Replace the adapter at the location code that was reported with the error.
No: Remove the PCI adapter specified by the location code.
  
BA180014 MSI software error FWFLASH 
BA180100 FDDI adapter Fcode driver is not supported on this system. 
This server does not support the Fcode driver of this adapter. Service support might have additional information.
  
BA180101 Stack underflow from fibre-channel adapter FWFWPBL 
BA188000 An unsupported adapter was found in a PCI slot 
Remove the unsupported adapter in the slot identified by the location code.
  
BA188001 EEH recovered a failing I/O adapter 
This is an informational code only, and no action is required. Since it is informational, no location code will be reported.
  
BA188002 EEH could not recover the failed I/O adapter 
Replace the adapter in the slot identified by the location code.
  
BA190001 Firmware function to get/set time-of-day reported an error FWFWPBL 
BA191001 The server firmware function to turn on the speaker reported an error FWFWPBL 
BA201001 The serial interface dropped data packets FWFWPBL 
BA201002 The serial interface failed to open 
Note:
Check console settings to ensure the console is defined to the correct port. Ensure the console cables 
are connected to the port that is defined as the console. FWFWPBL 
BA201003 The firmware failed to handshake properly with the serial interface FWFWPBL 
BA210000 Partition firmware reports a default catch FWFWPBL 
BA210001 Partition firmware reports a stack underflow was caught FWFWPBL 
BA210002 Partition firmware was ready before standout was ready FWFWPBL 
BA210010 The transfer of control to the SLIC loader failed FWFWPBL 
BA210020 The I/O configuration exceeds the maximum size allowed by partition firmware. 
Increase the logical memory block size to 256 megabytes (MB) and reboot the managed system.

Note:
If the logical memory block size is already 256 MB, contact your next level of support.  
BA210100 The partition firmware was unable to log an error with the server firmware. No reply was received 
from the server firmware to an error log that was sent previously NEXTLVL 
BA210101 The partition firmware error log queue is full NEXTLVL 
BA250010 dlpar error in open firmware FWLPAR 
BA250020 dlpar error in open firmware due to an invalid dlpar entity. This error may have been caused 
by an errant or hung operating system process. 
Check for operating system updates that resolve problems with dynamic logical partitioning (dlpar) and apply them, if available. 
Check for server firmware updates and apply them, if available.
  
BA250030 A hotplug operation in dynamic logical partitioning (dlpar) was terminated for concurrent firmware update. 
Retry the hotplug operation after the concurrent firmware update is complete.  
BA250040 The firmware was unable to generate a device tree node 
After you perform the FRU indicated in the Failing Items column, check for operating system updates 
and apply them, if available.
 FWFLASH 
BA278001 Failed to flash firmware: invalid image file 
Obtain a valid firmware update (flash) image for this system.
  
BA278002 Flash file is not designed for this eServer™ platform 
Obtain a valid firmware update (flash) image for this system.
  
BA278003 Unable to lock the firmware update lid manager 
Reboot the system. 
Make sure that the operating system is authorized to update the firmware. If the system is running 
multiple partitions, verify that this partition has service authority.
  
BA278004 An invalid firmware update lid was requested 
Obtain a valid firmware update (flash) image for this system.
  
BA278005 Failed to flash a firmware update lid 
Obtain a valid firmware update (flash) image for this system.
  
BA278006 Unable to unlock the firmware update lid manager 
Reboot the system.
  
BA278007 Failed to reboot the system after a firmware flash update 
Reboot the system.
  
BA278008 A server firmware update was attempted from the operating system. You must perform the update 
by using the Hardware Management Console (HMC). 
Perform the server firmware update by using the HMC.
  
BA278009 The server firmware update management tools for the version of Linux that you are running are 
incompatible with this system. 
Go to Service and productivity tools for Linux on POWER™ and download the latest service aids and productivity tools 
for the version of Linux that you are running.
  
BA280000 RTAS discovered an invalid operation that may cause a hardware error NEXTLVL 
BA290000 RTAS discovered an internal stack overflow FWFWPBL 
BA300010 The partition exceeded the maximum number of logical memory blocks allowed under the new memory allocation scheme. 
Reduce the total logical memory block limit in the partition profile, then reactivate the partition.

Note:
The maximum number of logical memory blocks per partition is 128 kilobytes (K) under the new memory allocation scheme.  
BA300020 Function call to isolate a logical memory block failed under the standard memory allocation scheme. 
Do the following:

Upgrade the firmware of the managed system to the latest level, if a newer level is available. 
Upgrade the operating system to a level that supports the new memory representation, or edit the profile to have fewer 
logical memory blocks than the 8K maximum. 
Reboot the partition.
  
BA300030 Function call to make a logical memory block unusable failed under the standard memory allocation scheme. 
Do the following:

Upgrade the firmware of the managed system to the latest level, if a newer level is available. 
Upgrade the operating system to a level that supports the new memory representation, or edit the profile to have 
fewer logical memory blocks than the 8K maximum. 
Reboot the partition.
  
BA300040 The partition, which is running the traditional memory representation, exceeded the limit of 8192 logical 
memory blocks allowed by the standard memory allocation scheme. 
Do the following:

Upgrade the operating system to one that supports the new memory representation, or edit the profile 
to have fewer than 8192 logical memory blocks. 
Reboot the partition.
  
BA310010 The firmware could not obtain the SRC history FWFLASH 
BA310020 The firmware received an invalid SRC history FWFLASH 
BA310030 The firmware operation to write the MAC address to vital product data (VPD) failed FWFLASH 


#################################################################################



=============================================
SECTION 3: IBM: Using system reference codes:
=============================================



Using system reference codes
System reference codes (SRCs) indicate a server hardware or software problem that can originate in hardware, 
in Licensed Internal Code, or in the operating system.

A server component generates an error code when it detects a problem. An SRC identifies the component that detected 
the error code and describes the error condition. Use the SRC information to identify a list of possible failing items 
and to find information about any additional isolation procedures.

SRC formats

SRCs are strings of either six or eight alphanumeric characters. The characters in the SRC typically represent the reference 
code type and the unit reference code (URC):

For SRCs displayed on the control panel, the first four characters designate the reference code type and the second four 
characters designate the URC. 
For SRCs displayed on software displays, characters 1 through 4 of word 1 designate the reference code type and characters 
5 through 8 of word 1 designate the URC.
Note:
For partition firmware SRCs (AAxx, BAxx, and DAxx) and service processor SRCs (A1xx and B1xx), only the first two characters 
of the SRC indicate the necessary action. For partition firmware SRCs that begin with 2xxx, only the first character indicates 
the necessary action. In these cases, the term URC does not apply.
A reference code that is 6 or 8 characters long and appears in either of the following formats (xxxxxx or xxxxxxxx) is an SRC, 
unless it fits one of the following conditions:

An 8-character code that begins with a C (except CB) or D (except DA) is a progress code 
An 8-character code that begins with an H is a Hardware Management Console (HMC) error code or message 
A 6-character code that begins with a zero (0) and does not include a hyphen is an HMC error code 
A code that begins with a number sign character (#) represents an AIX® diagnostics message.
Using the list of reference codes

The list of system reference codes is organized in hexadecimal sequence, with numeric characters listed before 
alphabetic characters. Each entry in the list represents the first four characters (the reference code type) of the SRC. 
The entries link to more information, typically a table that lists the URCs that are associated with that reference code type.

Unless specified otherwise on a particular SRC page, the SRC tables contain the following columns:

The Reference Code column contains numbers that represent the unit reference code (URC). 
The Description/Action column offers a brief description of the failure that this SRC represents. It may also contain 
instructions for continuing the problem analysis. 
The Failing Item column represents functional areas of the system unit. When available, the failing function code links 
to the FRU that contains this function for each specific system unit.
To use the list of system reference codes, complete the following steps:

Click the item in the list of system reference codes that matches the reference code type that you want to find. 
Note:
The SRC tables support only 8-character reference code formats. If the reference code provided contains only 4 or 6 characters, 
contact your next level of support for assistance.
When the SRC table appears, select the appropriate URC from the first column of the table. The tables list URCs 
in hexadecimal sequence, with numeric characters listed before alphabetic characters. 
Perform the action indicated for the URC in the Description/Action column of the table. 
If the table entry does not indicate an action or if performing the action does not correct the problem, exchange 
the failing items or parts listed in the Failing Item column in the order that they are listed. Use the following 
instructions to exchange failing items: 
Note:
Some failing items are required to be exchanged in groups until the problem is solved. Other failing items are flagged 
as mandatory exchange and must be exchanged before the service action is complete, even if the problem appears 
to have been repaired. For more information, see Block replacement of FRUs.

Exchange the failing item listed first. 
If exchanging the first failing item does not correct the problem, reinstall the original item 
and exchange the next failing item listed. 
Continue to exchange and reinstall the failing items, one at a time, until the problem is corrected. 
If exchanging the failing items does not correct the problem, ask your next level of support for assistance.


#################################################################################




#################################################################################




==========================================================
SECTION: Some more SOLARIS (and GENERIC) Errors:
==========================================================

 
 

A command window has exited because its child exited. 
=====================================================

The argument to a cmdtool(1) or a shelltool(1) window looks like
it is supposed to be a command, but the system cannot find the
command.

To run this command inside a cmdtool or a shelltool, make sure
the command is spelled correctly and is in your search path (if
necessary, use a full path name). If you intended this argument
as an option setting, use a minus sign (-) at the beginning of
the option.

Both the cmdtool and the shelltool are OpenWindows terminal
emulators.

admintool: Received communication service error 4 
=================================================

AdminTool could not start a display method because a remote
procedure call timed out, so it can't send the request. This
error results when admintool tries to access the NIS or NIS+
tables when networking is not enabled.

Verify the system network status with ifconfig -a to make sure
the system is connected to the network. Make sure the ethernet
cable is connected and the system is configured to run NIS or
NIS+.

answerbook: XView error: NULL pointer passed to xv_set 
======================================================

The AnswerBook navigator window comes up, but the document viewer
window does not. This message appears on the console, and the
message "Could not start new viewer" appears in the navigator
window. This situation indicates that you have an unknown client
or a problem with the network naming service.

Run the ypmatch(1) or nismatch(1) command o determine if the
client hostname is in the hosts map. If it isn't, add it to to
NIS hosts map on the NIS master server. Then make sure the
/etc/hosts file on the client contains an IP address and entry
for that hostname followed by loghost (reboot if you changed the
/etc/hosts file). Check that the ypmatch or nismatch client hosts
command returns the same IP host address as in the /etc/hosts
file. Finally, quit all existing AnswerBooks and restart.

For more information on the NIS hosts map, see the section on the
default search criteria in the NIS+ and FNS Administration Guide.
If you are using the AnswerBook, "NIS hosts map" is a good search
string.

Arg list too long 
=================

The system could not handle the number of arguments given to a
command or program when it combined those arguments with the
environment's exported shell variables. The argument list limit
is the size of the argument list plus the size of the
environment's exported shell variables.

The easiest solution is to reduce the size of the parent process
environment by unsetting extraneous environment variables. (See
the man page for the shell you're using to find out how to list
and change your environment variables.) Then run the program
again.

An argument list longer than ARG_MAX bytes was presented to a
member of the exec() family of system calls.

The symbolic name for this error is E2BIG, errno=7.

Argument out of domain 
======================

This is a programming error or a data input error.

Ask the program's author to fix this condition,or supply data in
a different format.

This indicates an attempt to evaluate a mathematical programming
function at a point where its value is not defined. The argument
of a programming function in the math package (3M) is out of the
domain of the function. This could happen when taking the square
root, power, or log of a negative number, when computing a power
to a non-integer, or when passing an out-of-range argument to a
hyperbolic programming function.

To help pinpoint a program's math errors, use the matherr(3M)
facility.

The symbolic name for this error is EDOM, errno=33.

Arguments too long 
==================

This C shell error message indicates that there are too many
arguments after a command. For example, this can happen by
invoking rm * in a huge directory. The C shell cannot handle more
than 1706 arguments.

Temporarily start a Bourne shell with sh and run the command
again. The Bourne shell dynamically allocates command line
arguments. Return to your original shell by typing exit.

assertion failed: variable, file variable, line N 
=================================================

A condition in the program that was never expected to happen has
happened.

Contact the vendor or author of the program to ask why it failed.
If you have the source code for the program, you can look at the
file and line number where the assertion failed. This might give
you an idea of how to run the program differently.

This message results from a diagnostic macro called assert() that
a programmer inserted into the specified line of a source file.
The expression that evaluated untrue precedes the file name and
line number.

automountd[N]: No network locking on variable:  
contact admin to install server change 
======================================= 

See "WARNING: No network locking on variable: contact admin to
install server" message for details. If the server is not
changed, data loss is possible in applications that depend on
locking.

automountd[N]: server variable not responding 
=============================================

This automounter message indicates that the system tried to mount
a filesystem from an NFS server that is either down or extremely
slow to respond. In some cases this message indicates that the
network link to the NFS server is broken, although that condition
produces other error messages as well.

If you are the system administrator responsible for the non-
responding NFS server, check it out to see whether the machine
needs repair or rebooting. Encourage your user community to
report such problems quickly but only once. When the NFS server
is back in operation, the automounter will be able to access the
requested file system.

For more information on NFS failures, seethe section on NFS
troubleshooting in the NFS Administration Guide. If you are using
the AnswerBook, a good search string is "NFS Service."

automount[N]: variable: Not a directory 
=======================================

The file specified after the first colon is not a valid mount
point because it is not a directory.

Ensure that the mount point is a directory, and not a regular
file or a symbolic link.

Bad address 
===========

The system encountered a hardware fault in attempting to access a
parameter of a programming function.

Check if the bad address resulted from supplying the wrong device
or option to a command. If that is not the problem, contact the
vendor or author of the program for an update.

This error could occur any time a function that takes a pointer
argument is passed an invalid address. Because processors differ
in their ability to detect bad addresses, on some architectures
passing bad addresses can result in undefined behaviors.

The symbolic name for this error is EFAULT,errno=14.

BAD/DUP FILE I=i OWNER=o MODE=m SIZE=s MIME ==== CLEAR? 

While checking anode link counts during phase 4, fsck(1M) found a
file (or directory) that either does not exist or exists
somewhere else.

To clear the anode of its reference to this file or directory,
answer yes. With the -p (preen) option, fsck automatically clears
bad or duplicate file references, so answering yes to this
question seldom causes a problem.

Bad file number 
===============

Generally this is a program error, not a usage error.

Contact the vendor or author of the program for an update.

Either a file descriptor refers to no open file, or a read (or
write) request is made to a file that is open only for writing
(or reading).

The symbolic name for this error is EBADF, errno=9.

N BAD I=N 
=========

Upon detecting an out-of-range block, fsck(1M) prints the bad
block number and its containing inode (after I=).

In fsck phases 2 and 4, you will decide whether ornot to clear
these bad blocks.  Before committing to repair with fsck, you
could determine which file contains this inode by passing the
inode number to the ncheck(1M) command: by passing the inode
number to the ncheck(1M) command:

# ncheck -iinum file system

For more information, see the chapter on checking file system
integrity in the System Administration Guide, Volume I.

bad module/chip at: variable 
============================

This message from the memory management system often appears with
parity errors, and indicates a bad memory module or chip at the
position listed. Data loss is possible if the problem occurs
other than at boot time.

Replace the memory module or chip at the indicated position.
Refer to the vendor's hardware manual for help finding this
location.

BAD SUPER BLOCK: variable 
=========================

This message from fsck(1M) indicates that a filesystem's super-
block is damaged beyond repair and must be replaced. At boot time
(with the -p option) this message is prefaced by the file system's
device name. After this message comes the actual damage
recognized (see Action). Unfortunately fsck does not print the
number of the damaged super-block.

The most common cause of this error is overlapping disk
partitions. Donot immediately rerun fsck as suggested by the
lines that display after the error message.  First make sure that
you have a recent backup of the file system involved; if not, try
to back up the file system now using ufsdump(1M). Then run the
format(1M) command, select the disk involved, and print out the
partition information.

# format : N > partition > print

Note whether the overlap occurs at the beginning or end of the
file system involved.  Then run newfs(1M) with the -N option to
print out the file system parameters, including the location of
backup super-blocks.

# newfs -N /dev/dsk/device

Select a super-block from a non-overlapping area of the disk, but
note that in most cases you have only one chance to select the
proper replacement super-block, which fsck soon propagates to all
the cylinders. If you select the wrong replacement super-block,
data corruption will probably occur, and you will have to restore
from backup tapes.  After you select a new super-block, provide
fsck with the new master super-block number:

# fsck -o b=NNNN /dev/dsk/device

Specific reasons for a damaged super-block include: a wrong magic
number, out of range NCG (number of cylinder groups) or CPG
(cylinders per group), the wrong number of cylinders, a
preposterously large super-block size, and trashed values in
super-block. These reasons are generally not meaningful because a
corrupt super-block is usually extremely corrupt.

For more information on bad super blocks, see the sections on
restoring bad super blocks in the System Administration Guide,
Volume I. If you are using the AnswerBook, "superblock" is a good
search string.

BAD TRAP 
========

A bad trap can indicate faulty hardware or a mismatch between
hardware and its configuration information. Data loss is possible
if the problem occurs other than at boot time.

If you recently installed new hardware, verify that the software
was correctly configured. Check the kernel trace back displayed on
the console to see which device generated the trap. If the
configuration files are correct, you will probably have to
replace the device.

In some cases, the bad trap message indicates a bad or down-rev
CPU.

A hardware processor trap occurred, and the kernel trap handler
was unable to restore system state. This is a fatal error that
usually precedes a panic, after which the system performs a sync,
dump, and reboot. The following conditions can cause a bad trap:
a system text or data access fault, a system data alignment
error, or certain kinds of user software traps.

bad trap = N 
============

See the message "BAD TRAP" for details.

/bin/sh: variable: too big 
==========================

This Bourne shell message indicates a classic "no memory" error.
While trying to load the program specified after thefirstcolon,
the shell noticed that the system ran out of virtual memory (swap
space).

See the message "Not enough space" for information on
reconfiguring your system to add more swap space.

Block device required 
=====================

A raw (character special) device was specified where a block
device was required, such as during a call to the mount(1M)
command.

To see which block devices are available, use ls -l to look in
/devices. Then specify a block device instead of a character
device. Block device modes start with a b, whereas raw character
device modes start with a c.

The symbolic name for this error is ENOTBLK, errno=15.

Boot device: /iommu/sbus/variable/variable/sd@3,0 
=================================================

This message alwaysappears at the beginning of rebooting. If
there is a problem, the system hangs, and no other messages
appear. This condition is caused by conflicting SCSI targets for
the boot device, which is almost always target 3.

The boot device is usually the machine's internal disk drive,
target 3. Make sure that external and secondary disk drives are
targeted to 1, 2, or 0, and do not conflict with each other. Also
make sure that tape drives are targeted to 4 or 5, and CD drives
to 6, avoiding any conflict with each other or with the disk
drives. You can set a device's target number using pushbutton
switches or a dial on the back near the SCSI cables. If the
targeting of the internal disk drive is in question, check it by
powering off the machine, removing all external drives, turning
the power on, and running the probe-scsi-all or probe-scsi
command from the PROM monitor.

Broadcast Message from root (pts/N) on server [date] 
====================================================

This message from the wall(1M) command gets transmitted to all
users logged into a system. You could see it during a rlogin or
telnet session, or on terminals connected to a timesharing
system.

Carefully read the broadcast message. Often this broadcast is
followed by a shutdown warning.

See the message "The system will be shut down in N minutes" for
details about system shutdown.

For more information on bringing down the system, see the section
on halting the system in the System Administration Guide, Volume
I. If you are using the AnswerBook, "halting the system" is a
good search string.

Broken pipe 
===========

This condition is often normal, and the message is merely
informational (as when piping many lines to the head program).
The condition occurs when a write on a pipe does not find a
reading process. This usually generates a signal to the executing
program, but this message displays when the program ignores the
signal.

Check the process at the end of the pipe to see why it exited.

The symbolic name for this error is EPIPE, errno=32.

Bus Error 
=========

A process has received a signal indicating that it attempted to
perform I/O to a device that is restricted or that does not
exist. This message is usually accompanied by a core dump, except
on read-only filesystems.

Use a debugger to examine the core file and determine what
program fault or system problem led to the bus error. If
possible, check the program's output files for data corruption
that might have occurred before the bus error.

Bus errors can result from either programming error or device
corruption on your system. Some common causes of bus errors are:
invalid file descriptors, unreasonable I/O requests, bad memory
allocation, misaligned data structures, compiler bugs, and
corrupt boot blocks.

Cannot allocate color map entry for "variable" 
=============================================

This message from libXt (X Intrinsics library) indicates that the
system color map was full even before the color name specified in
quotes was requested. Some applications can continue after this
message. Other applications, such as Workspace Properties Color,
fail to come up when the color map is full.

Exit the programs that make heavy use of the color map, then
restart the failed application and try again.

Can't create public message device (Device busy) 
================================================

This message comes from the lp print scheduler, indicating that
it is either extremely busy or hung.

If print jobs are coming out of the printer in question, wait
until they are finished and then resubmit this print job. If you
see this message again, the lp system is probably hung.

See the message "lp hang" for a procedure to clear the queue.

If lp is unable to create a device for printer messages, the
message FIFO could be already in use, or locked by another print
job.

For more information on the print scheduler, see the section on
administrating printers in the System Administration Guide Volume
II.

Can't invoke /etc/init, error N 
===============================

This message can appear while a system is booting, indicating
that the init program is missing or corrupted. Note that
/etc/init is a symbolic link to /sbin/init.

Boot the miniroot so you can replace init. Halt the machine by
typing Stop-A or by pressing the reset button. Reboot single-user
from CDROM, the net, or diskette. For example, type boot cdrom -s
at the ok prompt to boot from CDROM. After the system comes up
and gives you a # prompt, mount the device corresponding to the
original / partition somewhere, with a command similar to the
mount command below. Then copy the init program from the miniroot
to the original / partition, and reboot the system.

# mount /dev/dsk/c0t3d0s0 /mnt # cp /sbin/init /mnt/sbin/init #
reboot

If this doesn't work, other files might be corrupted, and you
might need to reinstall the entire system.

The error number is 2 if /sbin/init is missing, or 8 if
/sbin/init has an incorrect executable format. This is usually
followed by a "panic:icode" message. The system tries to reboot
itself, but goes into a loop, because rebooting is impossible
without init.

For more information on booting the system, see the section on
halting and booting the system in the System Administration
Guide, Volume I.

can't synchronize with hayes 
============================

This message sometimes appears when using a modem that the system
regards as a "Hayes" type modem, which includes most modems
manufactured today. The message can be caused by incorrect switch
settings, by poor cable connections, or by not turning the modem
on.

Check that the modem is on and that the cables between the modem
and your system are securely connected. Check the internal and
external modem switch settings. Turn the modem off and then on
again, if necessary.

cd: Too many arguments 
======================

The C shell's cd(1) command takes only one argument. Either more
than one directory was specified, or a directory name containing
a space was specified.  Directory names with spaces are easy to
create with File Manager.

Use only one directory name. To change to a directory whose name
contains spaces, enclose the directory name in double (") or
single (') quotes, or use File Manager.

Channel number out of range 
===========================

The system has run out of stream devices. This error results when
a stream head attempts to open a minor device that does not exist
or that is currently in use.

Check that the stream device in question exists and was created
with an appropriate number of minor devices. Make sure that the
hardware corresponds to this configuration. If the stream device
configuration is correct, try again later when more system
resources might be available.

The symbolic name for this error is ECHRNG, errno=37.

chmod: ERROR: invalid mode 
==========================

This message from the chmod(1) command indicates a problem in the
first non-option argument.

If you are specifying a numeric file mode, you can provide any
number of digits (although only the final one to four are
considered), but all digits must be between 0 and 7. If you are
specifying a symbolic file mode, use the syntax provided in the
chmod usage message to avoid the "invalid mode" error message:

Usage: chmod [ugoa][+-=][rwxlstugo] file ...

Note that some combinations of symbolic keyletters produce no
error message but fail to have any effect. The first group,
[ugoa], is truly optional. The second group, [+-=], is mandatory
for chmod to have an effect. The third group,[rwxlstugo], is
also mandatory for effect, and can be used in combination when
that combination does not conflict.

Command not found 
=================

The C shell could not find the program you gave as a command.

Check the form and spelling of the command line. If that looks
correct, echo $path to see if the user's search path is correct.
When communications are garbled, it is possible to unset a search
path to such an extent that only built-in shell commands are
available. Here is a command to reset a basic search path:

 % set path = (/usr/bin /usr/ccs/bin /usr/openwin/bin .)

If the search path looks correct, check the directory contents
along the search path to see if programs are missing or if
directories are not mounted.

For more information about the C shell, see csh(1).

Connection closed. 
==================

This message can appear when using rlogin(1) to another system if
the remote host cannot create a process for this user, if the
user takes too long to type the correct password, if the user
interrupts the network connection, or if the remote host goes
down. Data loss is possible if files were modified and not saved
before the connection closed.

Just try again. If the other system has gone down, wait for it to
reboot first.

Connection closed by foreign host. 
==================================

When a user telnets to another system, this message can appear if
the user takes too long to type the correct password, if the
remote host cannot create a login for this user,or if the remote
host goes down or terminates the connection. Data loss is
possible if files were modified and not saved before the
connection closed.

Just try again. If the other system has gone down, wait for it to
reboot first.

[Connection closed. Exiting] 
============================

After using the talk(1) command to communicate with another user,
the other person enters an interrupt (usually Control-c), and
this message appears on your screen.

Sending an interrupt like this is the usual way of exiting the
talk program. The talk session is over and you can return to your
work.

Connection refused 
==================

No connection could be made because the target machine actively
refused it. This happens either when trying to connect to an
inactive service or when a service process is not present at the
requested address.

Activate the service on the target machine, or start it up again
if it has disappeared. If for security reasons you do not intend
to provide this service, inform the user community, possibly
suggesting an alternative.

The symbolic name for this error is ECONNREFUSED, errno=146.

Connection timed out 
====================

This occurs either when the destination host is down or when
problems in the network cause lost transmission.

First check the operation of the host system, for example by
using ping(1M) and ftp (1), then repair or reboot as necessary.
If that doesn't solve the problem, check the network cabling and
connections.

No connection was established in a specified time. A connect or
send request failed because the destination host did not properly
respond after a reasonable interval. (The timeout period is
dependent on the communication protocol.)

The symbolic name for this error is ETIMEDOUT, errno=145.

console login: ^J^M^Q^K^K^P 
===========================

This usually occurs because OpenWindows exited abnormally,
leaving the system's keyboard in the wrong mode. The characters
that appear when someone attempts to login are garbage
transliterations of what someone types.

Find another machine and remote login to this system, then run
this command:

$ /usr/openwin/bin/kbd_mode -a

This puts the console back into ASCII mode. Note that kbd_mode is
not a windows program, it just fixes the console mode.

The usual reason for this problem occurring is an automated
script run from cron that clears out the /tmp directory every so
often. Ensure that any such scripts do not remove the /tmp/.X11-
pipe or /tmp/.X11-unix directories, or any files therein.

core dumped 
===========

A core file contains an image of memory at the point of software
failure, and is used by programmers to find the reason for the
failure.

To see which program produced a core file, run either the file(1)
command or the adb (1) command. The following examples show the
output of the file and adb commands on a core file from the
dtmail program.

$ file core core: ELF 32-bit MSB core file SPARC Version 1, from
`dtmail'

$ adb core core file = core -- program `dtmail' SIGSEGV 11:
segmentation violation ^D      (use Control-d to quit the
program)

Ask the vendor or author of this program for a debugged version.

Some signals, such as SIGQUIT, SIGBUS, and SIGSEGV, produce a
core dump. See the signal(5) man page for a complete list.

If youhave the source code for the program, you can try
compiling it with cc -g, and debugging it yourself using dbx or a
similar debugger. The where directive of dbx provides a stack
trace.

On mixed networks, it can be difficult to discern which machine
architecture produced a particular core dump, since adb on one
type of system generally cannot read a core file from another
type of system, and will produce an "unrecognized file" message.
Run adb on various machine architectures until you find the right
one.

The term "core" is archaic-- ferrite core memory was supplanted
by silicon RAM in the 1970s, although spaceships still employ
core memory for its imperviousness to radiation.

For information on saving and viewing crash information see the
System Administration Guide, Volume II. If you are using the
AnswerBook, "system crash" is a good search string.

Could not initialize tooltalk (tt_open): TT_ERR_NOMP 
====================================================

Various desktop tools display or print this message when the
ttsession(1) process is not available. The TookTalk service
generally tries to restart ttsession if it is not running. So
this error indicates that the ToolTalk service is either not
installed or is not installed correctly.

Verify that the ttsession command exists in /usr/openwin/bin or
/usr/dt/bin. If this command is not present, ToolTalk is not
installed correctly. The packages constituting ToolTalk are the
runtime SUNWtltk, developer support SUNWtltkd, and themanual
pages SUNWtltkm. CDE ToolTalk packages have the same names with
".2" appended.

The full TT_ERR_NOMP message string reads as follows: "No
ttsession is running, probably because tt_open() has not been
called yet. If this is returned from tt_open() it means ttsession
could not be started, which generally means ToolTalk is not
installed on the system."

Could not start new viewer 
==========================

This message appears in the AnswerBook navigator window, along
with an XView error messageon the console.

See the message "answerbook: XView error: NULL pointer passed to
xv_set" for details.

cpio: Bad magic number/header. 
==============================

A cpio(1) archive has either become corrupted or was written out
with an incompatible version of cpio.

Use the -k option to cpio to skip I/O errors and corrupted file
headers. This might permit you to extract other files from the
cpio archive. To extract files with corrupted headers, try
editing the archive with a binary editor such as emacs. Each cpio
file header contains a filename as a string.

For more information on magic numbers, see magic(4).

Cross-device link 
=================

An attempt was made to make a hard link to a file on another
device, such as on another file system.

Establish a symbolic link using ln -s instead. Symbolic links are
permitted across file system boundaries.

The symbolic name for this error is EXDEV, errno=18.

data access exception 
=====================

This message can result from running an old version of the
operating system that does not support new hardware, or by
running an operating system that is not configured for new
hardware. It can also result from incorrectly installed DSIMMs or
from a disk problem.

Upgrade your operating system to a version that supports the new
hardware or machine architecture. For example, upgrading a
SPARCstation 2 (with sun4c kernel architecture) to a SPARCstation
20 (with sun4m kernel architecture) requires an operating system
upgrade or reconfiguration.

For more information onupgrades, see the section describing
system and device configuration in the Solaris 1.x to Solaris 2.x
Transition Guide.

Data fault 
==========

This is a kind of bad trap that usually causes a system panic.
When this message appears after a bad trap message, a system text
or data access fault probably occurred.¤ In the absence of a bad
trap message, this message might indicate a user text or data
access fault. Data loss is possible if the problem occurs other
than at boot time.

Make sure the machine can reboot, then check the log file
/var/adm/messages for hints about what went wrong.

¤ See the message "BAD TRAP" for more information.

Deadlock situation detected/avoided 
===================================

A programming deadlock situation was detected and avoided.

If the system had not detected and avoided a deadlock, a piece of
software would have hung. Run the program again. The deadlock
might not reoccur.

This error usually relates to file and record locking, but can
also apply to mutexes, semaphores, condition variables, and
read/write locks.

The symbolic name for this error is EDEADLK, errno=45.

See the section on deadlock handling in the System Interface
Guide. See the section on avoiding deadlock in the Multithreaded
Programming Guide.

Device busy 
===========

An attempt was made to mount a device that was already mounted or
to unmount a device containing an active file (such as an open
file, a current directory, a mount point, or a running program).
This message also occurs when trying to enable accounting that is
already enabled.

To unmount a device containing active processes, close all the
files under that mount point, quit any programs started from
there, and change directories out of that hierarchy. Then try to
unmount again.

Mutexes, semaphores, condition variables, and read/write locks
set this error condition to indicate that a lock is held.

The symbolic name for this error is EBUSY, errno=16.

/dev/rdsk/variable: CAN'T CHECK FILE SYSTEM. 
============================================

The system cannot automatically clean (preen) this file system
because it appears to be set up incorrectly or is having hard
disk problems. This message asks that you run fsck(1M) manually,
since data corruption might already have occurred.

Run fsck to clean the file system in question. See the message
"/dev/rdsk/N:  UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY" for
proper procedures.

/dev/rdsk/variable: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. 
================================================================

At boot time the /etc/rcS script runs the fsck(1M) command to
check the integrity of file systems marked "fsck" in /etc/vfstab.
If fsck cannot repair a file system automatically, it interrupts
the boot procedure and produces this message. When fsck gets into
this state, it cannot repair a file system without losing one or
more files, so it wants to defer this responsibility to you, the
administrator. Data corruption has probably already occurred.

First run sack -n on the file system, to see how many and what
type of problems exist.  Then run fsck again to repair the
file system. If you have a recent backup of the file system, you
can generally answer "y" to all the fsck questions. It's a good
idea to keep a record of all problematic files and inode numbers
for later reference. To run fsck yourself, specify options as
recommended by the boot script. For example:

# fsck /dev/rdsk/c0t4d0s0

Usually the files lost during fsck repair are these that were
created just before a crash or power outage, and they cannot be
recovered. If you lose important files, you can recover them from
backup tapes.

If you don't have a backup, ask an expert to run fsck for you.

For more information on file checking, see the section on
checking file system integrity in the System Administration Guide,
Volume I.

Directory not empty 
===================

The directory operation that was attempted, such as directory
removal with rmdir, can be performed only on an empty directory.

To remove the directory, first remove all the files that it
contains. A quick way to remove a non-empty directory hierarchy
is with the rm -r command.

The symbolic name for this error is ENOTEMPTY, errno=93.

Disc quota exceeded 
===================

The user'sdisk limit has been exceeded on a user filesystem,
usually because a file was just created or enlarged beyond the
limit. This almost always refers to a magnetic disk, and not to
an optical disc. Any data created after this condition occurs
will be lost.

The user can delete files to bring disk usage under the limit, or
the server administrator can use the edquota(1M) command to
increase the user's disk limit.

The symbolic name for this error is EDQUOT, errno=49.

dumptm: Cannot open `/dev/rmt/variable': Device busy 
====================================================

During file system backup, the dump program cannot open the tape
drive because some other process is holding it open.

Find the process that has the tape drive open, and either kill(1)
the process or wait for it to finish.

# ps -ef | grep /dev/rmt # kill -9 processID

DUP/BAD I=i OWNER=o MODE=m SIZE=s MTIME=t FILE=f REMOVE? 
=========================================================

During phase 1, fsck(1M) found duplicate blocks or bad blocks
associated with the file or directory specified after FILE= whose
inode number appears after I= (with other information).

To remove this file or directory, answer yes. If you end up
removing more than a few files in this manner, data loss will
result, so it might be preferable to restore the filesystem from
backup tapes.

For more information on checking filesystems, see the section on
checking filesystem integrity in the System Administration Guide,
Volume I.

N DUP I=N 
=========

Upon detecting a block that is already claimed by another inode,
fsck(1M) prints the duplicate block number and its containing
inode (after I=).

In fsck phases 2 and 4, you will decide whether or not to clear
these bad blocks.  Before committing to repair with fsck, you
could determine which file contains this inode by passing the
inode number to the ncheck(1M) command:

# ncheck -iinum filesystem

For more information, see the chapter on checking filesystem
integrity in the System Administration Guide,Volume I. 
 
 
error: DPS has not initialized or server connection failed 
==========================================================

This message appears when trying to run AnswerBook with a generic
X11 window server or on a generic X terminal.

Running AnswerBook requires Display PostScript (DPS), or a NeWS
server, or the Adobe DPS NS remote display software. In addition,
a complete LaserWriterII Type-1 font set (including Palatino)
should be installed on the X server. To find out if your X server
has DPS, run xdpyinfo(1) to verify the presence of an "Adobe-
DPS-Extension" line. X servers without this line don't know about
DPS.

ERROR: missing file arg (cm3) 
=============================

An attempt was madd to run some sccs(1) operation that requires a
filename, such as create, edit, delget, or prt.

Supply the appropriate filename after the SCCS operation.

ERROR [SCCS/s.variable]: `SCCS/p.variable' nonexistent (ut4) 
============================================================

An attempt was made to sccs edit or sccs get a file that is not
yet under SCCS control.

Run sccs create on that file to place it under SCCS control.

ERROR [SCCS/s.variable]: writable `variable' exists (ge4) 
=========================================================

An attempt was made to sccs edit a file that is writable,
probably because it is already checked out.

Run sccs info to see who has the file checked out. If it is you,
go ahead and edit it. If it is somebody else, ask that personto
check in the file.

esp0: data transfer overrun 
===========================

When a user tries to mount a CDROM on a third-party CD drive,
mount(1M) fails with the above error, followed by the "sr0: SCSI
transport failed" message. The CD drive probably comes from a
vendor unknown to the system.

Third-party CD drives generally have an 8192 block size, as
opposed to the 512 block size on supported Sun drives. Check with
the vendor to see if any special configuration is possible to
allow the drive to operate on a Sun workstation.

Event not found 
===============

This C shell message indicates that a user tried to repeat a
command from the history list, but that command or number does
not exist in the list.

Run the C shell history command to display recent events in the
history list. If a user often tries to run commands that have
disappeared from the history list, make the list longer by
setting history to a higher value.

For more information about the C shell, see csh(1).

EXCESSIVE BAD BLKSI=N CONTINUE? 
==================================

During phase 1, fsck(1M) found more than 10 bad (out-of-range)
blocks associated with the specified inode number.

With this many bad blocks, it might be preferable to restore the
filesystem from backup tapes.

For more information on bad blocks, see the section on checking
filesystem integrity in the System Administration Guide, Volume
I. If you are using the AnswerBook, "bad blocks" is a good search
string.

EXCESSIVE DUP BLKS I=N CONTINUE? 
==================================

During phase 1, fsck(1M) found more than 10 duplicate (previously
claimed) blocks associated with the specified inode number.

With this many duplicate blocks, it might be preferable to
restore the filesystem from backup tapes.

For more informationon blocks, see the section on checking
filesystem integrity in the System Administration Guide, Volume
I. If you are using the AnswerBook, "bad blocks" is a good search
string.

Exec format error 
=================

This often happens when trying to runsoftware compiled for
different systems or architectures, such as when executing
Solaris 2.x programs on a SunOS 4.1.x system, or when trying to
execute SPARC-specific programs on an x86 machine. On a Solaris
2.x system, it can also occur if the BinaryCompatibility Package
was not installed.

Make sure that the software matches the architecture and system
you're using. The file(1) command can help you determine the
target architecture. If you're using SunOS 4.1.x softwareon a
Solaris 2.x system, make sure that the Binary Compatibility
Package is installed. You can check for it using this command:

$ pkginfo | grep SUNWbcp

A request was made to execute a file that, although it has the
appropriate permissions, does not start with a valid format.

The symbolic name for this error is ENOEXEC, errno=8.

See the a.out(4) man page for a description of executable files.

fd0: unformatted diskette or no diskette in the drive 
=====================================================

This message appears on the system console to indicate that the
floppy driver fd(7) could not read the label on a diskette.
Usually this is either because a new diskette has not yet been
formatted, or a formatted diskette has become corrupted. This
message often appears along with "read failed" and "bad format"
messages after volcheck(1) is run.

If you are certain that the diskette contains no data, run
fdformat -d to format the diskette in DOS format. (You can also
format a diskette in UFS format if you like, although then it is
not transportable to most other systems.) When the diskette is
formatted, you can write on it, if it was not corrupted beyond
repair.

File exists 
===========

The name of an existing file was mentioned in an inappropriate
context. For example,it is not allowed to establish a link to an
existing file, or to overwrite an existing file when the csh(1)
noclobber option is set.

Look at the names of files in the directory, then try again with
a different name or after renaming or removing the existing file.

The symbolic name for this error is EEXIST, errno=17.

File locking deadlock 
=====================

This is a programming problem, in some cases unavoidable.

All a user can do is restart the program and hope deadlock does
not reoccur.

Inthe file locking subsystem, two processes tried to modify some
lock at the same time. In the multithreading subsystem, two
threads became deadlocked and could not continue. When a program
using the threads library encounters this error, it should
restart the deadlocked threads.

The symbolic name for this error is EDEADLOCK, errno=56.

filemgr: mknod: Permission denied 
=================================

File Manager issues this message and fails to come up whenever
the /tmp/.removable directory is owned by another user and is not
1777 mode. This can happen, for example, when multiple users
share a workstation.

Have the original owner change the mode ((chmod(1)) of this file
back to 1777, its default creation mode. Rebooting the
workstation also resolves this problem.

This is a known problem that was fixed in Solaris 2.4.

File name too long 
==================

The specified file name has too many characters.

If a file name or path name component is too long, devise a
shorter name. If the totalpath name is longer than PATH_MAX
characters, first change to an intermediate directory, then
specify a shorter path name. Newly-created data will be lost
unless written to another file with a shorter name.

In a UFS or NFS-mounted UFS filesystem, the length of a path name
component exceeds MAXNAMLEN (255) characters, or the total length
of the path name exceeds PATH_MAX (1024) characters. In a System
V filesystem, the length of a path name component exceeds
NAME_MAX (14) characters while no-truncation mode is in effect.
These values are defined in the /usr/include/limits.h(4) file.

The symbolic name for this error is ENAMETOOLONG, errno=78.

FILE SYSTEM STATE IN SUPERBLOCK IS WRONG; FIX? 
==============================================

The fsck(1M) command has just checked a filesystem, and has
determined that the filesystem is clean. The filesystem's
superblock, however, still thinks the filesystem is "dirty" in
some way.

If you believe that the filesystem is adequately repaired, answer
yes to mark the filesystem as clean.

Different "dirty" filesystem types are listed in
/usr/include/sys/fs/ufs_fs.h, and include FSACTIVE, FSBAD, FSFIX,
FSLOG, and FSSUSPEND.

For more information on superblocks, see the section onchecking
filesystem integrity in the System Administration Guide, Volume
I. If you are using the AnswerBook, "bad superblock" is a good
search string.

File table overflow 
===================

The kernel file table is full because too many files are open on
the system.  Temporarily, no more files can be opened. New data
created under this condition will probably be lost.

Simply waiting often gives the system time to close files.
However, if this message occurs often, reconfigure the kernel to
allow more open files. To increasethe size of the file table in
Solaris 2.x, increase the value of maxusers in the /etc/system
file.  The default maxusers value is the amount of main memory in
MB, minus 2.

The symbolic name for this error is ENFILE, errno=23.

File too large 
==============

The file size exceeded the limit specified by ulimit(1), or the
file size exceeds the maximum supported by the file system. New
data created under this condition will probably be lost.

In the C shell, use the limit command to see or set the default
file size. In the Bourne or Korn shells, use the ulimit -a
command. Even when the shells claim that the file size is
unlimited, in fact the system limit is FCHR_MAX (usually 1
gigabyte).

The symbolic name for this error is EFBIG, errno=27.

FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? 
=============================================

During phase 5, fsck(1M) detected that the actual number of free
blocks in the filesystem did not match the superblock's free
block count.The df(1M) command accesses this free block count
when measuring filesystem capacity.

Generally you can answer yes to this question without harming the
filesystem.

For more information on superblocks, see the section on checking
filesystem integrity in the System Administration Guide, Volume
I. If you are using the AnswerBook, "bad superblock" is a good
search string.

fsck: Can't open /dev/dsk/variable 
==================================

The fsck(1M) command cannot open the disk device, because
although a similar filesystem exists, the partition specified
does not.

Run the mount(1M) or the format(1M) command to see what
filesystems are configured on the machine. Then run fsck again on
an existing partition.

fsck: Can't stat /dev/dsk/variable 
==================================

The fsck(1M) command cannot open the disk device, because the
specified filesystem does not exist.

Run the mount(1M) or the format(1M) command to see what
filesystems are configured on the machine. Then run fsck again on
an existing filesystem.

giving up 
=========

This message appears in the SCSI log to indicate that a read or
write operation has been retried until it timed out. With SCSI
disk the timeout period is usually 30 seconds; with tape the
period is usually 20 attempts. Timeout periods are generally
coded into the drivers.

Check that all SCSI devices are connected and powered on. Make
sure that SCSI target numbers are correct and not in conflict.
Verify that all cables are no longer than six meters, total, and
that all SCSI connections are properly terminated.

The scsi_log(9F) routine usually displays messages on the system
console and in the /var/adm/messages file. Run the dmesg(1M)
command to see the most recent message buffer.

Graphics Adapterdevice /dev/fb is of unknown type 
==================================================

The /dev/fb driver is either missing or corrupted.

See "InitOutput: Error loading module for /dev/fb" for details.

group.org_dir: NIS+ servers unreachable 
=======================================

This is the second of three messages that an NIS+ client prints
when it cannot locate an NIS+ server on the network.

See the message "hosts.org_dir: NIS+ servers unreachable" for
details.

/home/variable: No such file ordirectory 
=========================================

An attempt was made to change to a user's home directory, but
either that user does not exist or the user's fileserver has not
shared (exported) that filesystem.

To check on the existence of a particular user, run the
ypmatch(1) or nismatch(1) command, specifying the user name and
then the passwd map.

To export filesystems from the remote fileserver, become
superuser on that system and run the share(1M) command with the
appropriate options. If that system is sharing (exporting)
filesystems for the first time, also invoke
/etc/init.d/nfs.server start to begin NFS service.

For more information on sharing filesystems, see the
share_nfs(1M) man page.

Host is down 
============

A transport connection failed because the destination host was
down. For example, mail delivery was attempted over several days,
but the destination machine was not available during any of these
attempts.

Report this error to the system administrator for the host. If
you are the person responsible for this system, check to see if
the machine needs repair or rebooting.

This error results from status information delivered by the
underlying communication interface. If there is no known
connection to the host, a different message usually results. See
"No route to host" for details.

The symbolic name for this error is EHOSTDOWN, errno=147.

host name configuration error 
=============================

This is an old sendmail message, which replaced "I refuse to talk
tomyself" and is now replaced by the "Local configuration error"
message.

See the message "554 variable... Local configuration error" for
details.

hosts.org_dir: NIS+ servers unreachable 
=======================================

This is the third of three messages that an NIS+ client prints
when it cannot locate an NIS+ server on the network.

If other NIS+ clients are behaving normally, check the Ethernet
cabling on the workstation showing this message. On SPARC
machines, disconnected network cablingalso produces a series of
"no carrier" messages. On x86 machines, the NIS+ messages might
be your only indication that network cabling is disconnected.

If many NIS+ clients on the network are giving this message, go
to the NIS+ server in question and reboot or repair it, as
necessary. When the server machine is back in operation, NIS+
clients will give an "NIS server for domain OK" message.

I can't read your attachments. What mailer are you using? 
=========================================================

The SunView mailtool andpre-3.3 OpenWindows mailtool produce
this message when they cannot cope with an attachment. The
attachment is probably in MIME (Multipurpose Internet Mail
Extensions) format, using base64 encoding.

To read a mail message containing MIME attachments, use
mailtool(1) from Solaris 2.3 or later. If you are running an
earlier version of Solaris, rlogin(1) to a later version of
Solaris, set the DISPLAY environment variable back to the first
system, and run mailtool remotely. If those options prove
impossible, ask the originator to send the message again using
mailtool, or using the CDE dtmail compose File->SendAs-
>SunMailTool option.

Standard MIME attachments with base64 encoding, for example,
produce this message and fail to display in older mailtools.

Look into using metamail, available on the Internet, which allows
you to send and receive MIME attachments.

ie0: Ethernet jammed 
====================

This message can appear on SPARCservers or x86 machines with an
Intel 82586 Ethernet chip. It indicates that 16 successive
transmission attempts failed, causing the driver to give up on
the current packet.

If this error occurs sporadically or at busy times, it probably
means that the network is saturated. Wait for network traffic to
clear. If bottlenecks arise frequently, think about reconfiguring
the network or adding subnets.

Another possible cause of this message is a noise source
somewhere in the network, such as a loose transceiver connection.
Use snoop(1M)or a similar program to isolate the problem area,
then check and tighten network connectors as necessary.

ie0: no carrier 
===============

This message can appear on SPARCservers or x86 machines with an
Intel 82586 Ethernet chip. It indicates that thechip has lost
input to its carrierdetect pin while trying to transmit a
packet, causing the packet to be dropped.

Check that the Ethernet connector is not loose or disconnected.
Other possible causes include an open circuit somewhere in the
network and noise on the carrier detect linefrom the
transceiver. Use snoop(1M) or a similar program to isolate the
problem area, then check the network connectors and transceivers,
as needed.

Illegal Instruction 
===================

A process has received a signal indicating that it attempted to
execute an instruction that is not allowed by the kernel. This
usually results from running programs compiled for a slightly
different machine architecture. This message is usually
accompanied by a core dump, excepton read-only filesystems.

If you are booting from CDROM or from the net, check README files
to make sure you are using an image appropriate for your machine
architecture. Run df to make sure there is enough swap space on
the system; too little swap space can cause this error. If you
recently upgraded your CPU to a new architecture, replace your
operating system with one that supports the new architecture (an
operating system upgrade might be required).

Sometimes this condition results from programming error, such as
when a program attempts to execute data as instructions. This
condition can also indicate device file corruption on your
system.

Illegal instruction "0xN" was encountered at PC 0xN 
===================================================

The machine is trying to boot from a non-boot device, or from a
boot device for a different hardware architecture.

If you are booting from the net, check README files to make sure
you are using a boot image for that architecture. If you are
booting from disk, make sure the system is looking at the right
disk, which is usually SCSI target 3. Failing these solutions,
connect a CD drive to the system and boot from CDROM.

Illegal seek 
============

Using a pipe ("|") on the command line doesn't work here.

Rather than using a pipe on the command line, redirect the output
of the first program into a file and then run the second program
on that file.

A call to lseek(2) was issued to a pipe. This error condition can
also be fixed by altering the program to avoid using lseek().

The symbolic name for this error is ESPIPE, errno=29.

Image Tool: Unable to open XIL Library. 
=======================================

This message follows multiple multi-line "XilDefaultErrorFunc"
errors, indicating that ImageTool could not locate the X Imaging
Library. Many OpenWindows and CDE deskset programs require XIL.

Run pkginfo(1) to determine what packages are installed on the
system. If the following packages are not present, install them
from CDROM or over thenet:  SUNWxildg, SUNWxiler, SUNWxilow, and
SUNWxilrt.

Inappropriate ioctl for device 
==============================

This is a programming error.

Ask the program's author to fix this condition. The program needs
to be changed so it employs a device driver that can accept
special character device controls.

The ioctl() system call was given as an argument for a file that
is not a special character device. This message replaces the
traditional but puzzling "Not a typewriter" message.

The symbolic name for this error is ENOTTY, errno=25.

INCORRECT BLOCK COUNT I=N (should be N) CORRECT? 
=================================================

During phase 1, fsck(1M) determined that the specified inode
pointed to a number of bad or duplicate blocks, sothe block
count should be corrected to the actual number shown.

Generally you can answer yes to this question without harming the
filesystem.

For more information on bad blocks, see the section on checking
filesystem integrity in the System Administration Guide, Volume
I.

inetd[N]: execv /usr/sbin/in.uucpd: No such file or directory 
=============================================================

This message indicates that the Internet services daemon
inetd(1M) tried to start up the UUCP service without the UUCP
daemon existing on the system.

The SUNWbnuu package must be installed before the machine can run
UUCP. Run pkgadd(1M) to install this package from the
distribution CDROM or over the network.

inetd[N]: variable/tcp: unknown service 
=======================================

This message indicates that the Internet services daemon
inetd(1M) could not locate the TCP service specified after the
first colon.

Check the current machine's /etc/services file, and the NIS
services map, to see if the service is described. To start this
service, add an appropriate entry into the /etc/services file and
possibly the services map as well. Note that NIS+ does not
consult the local /etc/services file unless you put "files" right
after "nisplus" on the services line of the system's
/etc/nsswitch.conf file.

If you do not want to start this service, edit the system's
/etc/inetd.conf file and delete the entry that tries to start it
up.

For more information about NIS+, see the NIS+ and FNS
Administration Guide.

inetd[N]: variable/udp:unknown service 
=======================================

This message indicates that the Internet services daemon
inetd(1M) could not locate the UDP service specified after the
first colon.

See the message "inetd[N]: variable/tcp: unknown service" fora
solution.

inetd: Too many open files 
==========================

This message can appear when someone runs a command from the
shell or uses a third-party application. The sar(1M) command does
not indicate that the system-wide open file limit has been
exceeded.

The probable cause for this is that the shell limit has been
exceeded. The default open file limit is 64, but can be raised to
256.

See the message "Too many open files" for a solution.

INIT: Cannot create /var/adm/utmp or /var/adm/utmpx 
===================================================

This console message indicates that init(1M) cannot write in the
/var directory, which is usually part of the / (root) filesystem.
Some other messages follow, andthe system usually comes up
single-user. The problem is often that / or /var is mounted
read-only. Sometimes a brief power outage leaves the system
believing that many filesystems are still mounted.

If /var is a separate filesystem on the machine, andis not yet
not mounted, mount it now. If the filesystem containing /var is
mounted read-only, remount it read-write with a command similar
to this:

# mount -o rw,remount /

Then type Control-d and try to bring up the system multi-user. If
that fails, the root filesystem is probably corrupted.  Run
fsck(1M) on the root filesystem, halt the machine, power cycle
the CPU, and wait for the system to reboot. Should this problem
still occur, restore the root filesystem from backup tapes, or
re-install the system from net or CDROM to replace the root
filesystem.

InitOutput: Error loading module for /dev/fb 
============================================

This fatal X server error message indicates that /dev/fb, the
"dumb frame buffer," is either missing or corrupted. It is
usually followed by a "giving up" message and a few xinit errors.

If other devices on the system are working correctly, the most
likely reason for this error is that the SUNWdfb package was
removed or never installed. Insert the installation CD-ROM,
change to the Solaris_2.xdirectory, and run the following
command to install the packages SUNWdfbh and SUNWdfb (for your
machine architecture):

pkgadd -d .

If other devices on the system are not working correctly, the
system might havea corrupt /devices directory. Halt the system
and boot using the -r (reconfigure) option.  The system will run
fsck(1M) if the /devices filesystem is corrupted, most likely
fixing the problem.

Interrupted system call 
=======================

The user issued an interrupt signal (usually Control-c) while the
system was in the middle of executing a system call. When network
service is slow, interrupting cd(1) to a remote-mounted directory
can produce this message.

Proceed with your work, this message is purely informational.

An asynchronoussignal (such as interrupt or quit), which a
program was set up to catch, occurred during an internal system
call. If execution is resumed after processing the signal, it
will appear as if the interrupted programming function returned
this error condition, so the program might exit with an incorrect
error message.

The symbolic name for this error is EINTR, errno=4.

Invalid argument 
================

An invalid parameter was specified that the system cannot
interpret. For example, trying to mount an uncreated filesystem,
printing without sufficient system support, or providing an
undefined signal to a signal(3c) library function, can all
produce this message.

If you see this message when you are trying to mount a
filesystem, make sure that you have run newfs(1M) to create the
filesystem. If you see this message when you are trying to read a
diskette, make sure that the diskette was properly formatted with
fdformat(1), either in DOS format (pcfs) or as a UFS filesystem.
If you see this message while you are trying to print, make sure
that the print service is configured correctly.

The symbolic name for this error is EINVAL, errno=22.

Invalid null command 
====================

This C shell message results from a command line with two pipes
(|)in a row or from a pipe without a command afterwards.

Change the command line so that each pipe is followed by a
command.

I/O error 
=========

Some physical Input/Output error has occurred. If the process was
writing a file, data corruption is possible.

First find out which device is experiencing the I/O error. If the
device is a tape drive, make sure a tape is inserted into the
drive. When this error occurs with a tape in the drive, it is
likely that the tape contains an unrecoverable bad spot.

If the device is a floppy drive, an unformatted or defective
diskette could be at fault.  Format the diskette, or obtain a
replacement.

If the device is a hard disk drive, you might need to run
fsck(1M) and possibly even reformat the disk.

In some cases this error might occur on a call following the one
to which it actually applies.

The symbolic name for this error is EIO, errno=5.

Is a directory 
==============

An attempt was made to read or write a directory as if it were a
file.

Look at a listing of all the files in the current directory and
try again, specifying a file instead of a directory.

The symbolic name for this error is EISDIR, errno=21.

kernel read error 
=================

This message appears when savecore(1M), if activated, tries to
copy a debugging image of kernel memory to disk but cannot read
various kernel data structures correctly. Generally this occurs
after a system panic has corrupted main memory.  Data corruption
on the systemis possible.

Look at the kernel error messages that preceded this one to try
to determine the cause of the problem. Error messages such as
"BAD TRAP" usually indicate faulty hardware. Until the problem
that caused the kernel panic is resolved, a kernel core image
cannot be saved for debugging.

Killed 
======

This message is purely informational. If the killed process was
writing a file, some data might be lost.

Continue with your work.

This message from the signal handler or various shells indicates
that a process has been terminated with a SIGKILL. However, if
you don't see this message and cannot terminate a process with a
SIGKILL, you might have to reboot the machine to get rid of that
process.

kmem_free block already free 
============================

This is a programming error,probably from a device driver.

Determine which driver is giving this message and contact the
vendor for a software update, as this message indicates a bug in
the driver.

This message is from the DDI programming function kmem_free(9F),
which releases a block of memory at address addr of size siz that
was previously allocated by the DDI function kmem_alloc(9F). Both
addr and siz must correspond to the original allocation. If you
have source code for the driver, follow kmem_alloc() and
kmem_free() in the code to make sure they allocate and free the
same chunk of memory.

  
last message repeated N times 
=============================

This message comes from syslog(1M), the facility that prints
messages on the console and records them in /var/adm/messages. To
reduce the log size and minimize buffer usage, syslog collapses
any identical messages it sees during a 20 second period, then
prints this message with the number of repetitions.

Look above this message to see which message was repeated so
often. Then consider the repeated message and take action
accordingly. If repeated log entries such as "su ...  failed"
appear, consider the possibility of a security breach.

ld.so.1: variable: fatal: relocation error: symbol not found: 
 variable 
This message from the run-time linker ld.so.1 indicates that in
trying to execute the application given after the first colon,
the specified symbol could not be found for relocation. The
message goes on to say in what file the symbol was referenced.
Since this is a fatal error, the application terminates with this
message.

Run the ldd -d command on the application to show its shared
object dependencies and symbols that aren't found. Probably your
system contains an old version of the shared object that should
contain this symbol. Contact the library vendor or author for an
update.

This error does not necessarily occur when you first bring up an
application. It could take months to develop, if ordinary use of
the application seldom references the undefined symbol.

ld.so.1: variable: fatal: variable: can't open file: errno=2 
============================================================

This message indicates that the run-time linker, ld.so.1, while
running the program specified after the first colon, could not
find the shared object specified after the third colon. (A shared
object is sometimes called a dynamically linked library.) Error
number 2 translates to "No such file or directory" (ENOENT).

As a workaround, set the environment variable LD_LIBRARY_PATH to
include the location of the shared object in question, for
example:

/usr/dt/lib:/usr/openwin/lib

Better yet, if you have accessto source code, recompile the
program using the -Rpath loader option. Using LD_LIBRARY_PATH is
discouraged because it slows down performance.

le0: Memory error! 
==================

This message indicates that the network interface encountered an
access time-out from the CPU's main memory. There is probably
nothing wrong except system overload.

If the system is busy with other processes, this error can occur
frequently. If possible, try to reduce the system load by
quitting applications or killing some processes.

The Lance Ethernet chip timed out while trying to acquire the bus
for a DVMA transfer. Most network applications wait for a
transfer to occur, so generally no data gets lost. However, data
transfer might fail after too many time-outs.

For more information about the Lance Ethernet chip, see the
le(7D) man page.

le0: No carrier-- cable disconnected or hub link test disabled? 
===============================================================

Standalone machines with no Ethernet port connection get this
error when the system triesto access the network. If the
Ethernet cable is disconnected, SPARC machines with the sun4m
architecture usually display this message, whereas machines with
the sun4c architecture usually display the "le0: No carrier--
transceiver cable problem" message instead. If the Ethernet cable
is connected, this message could result from a mismatch between
the machine's NVRAM settings and the Ethernet hub settings.

If this message is continuous, try to save any workto local
disk.

When a machine is configured as a networked system, it must be
plugged into the Ethernet with a twisted pair J45 connector.

If the Ethernet cable is plugged in, find out whether or not the
Ethernet hub does a Link Integrity Test. Then become superuser to
check and possibly set the machine's NVRAM. If the hub's Link
Integrity Test is disabled, set this variable to false.

# eeprom | grep tpe tpe-link-test?=true # eeprom 'tpe-link-
test?=false'

The default setting is true. If for some reason tpe-link-test?
was set to false,and the hub's Link Integrity Test is enabled,
set this variable to true.

le0: No carrier-- transceiver cable problem? 
============================================

Standalone machines with no Ethernet port connection get this
error when the system tries to access the network.

If this message is continuous, try to save any work to local
disk.

When a machine is configured as a networked system, it must be
plugged into the Ethernet with either a twisted pair J45
connector or thicknet 10Base-T connector (depending on the
building's Ethernet cable type).

Older workstations have a thicknet connection on the back instead
of a twisted pair Ethernet connection, so they require a thicknet
to twisted pair transceiver to translate between cabling types.

LINK COUNT FILE I=i OWNER=o MODE=m SIZE=s MTIME=t COUNT... ADJUST? 
===================================================================

During phase 4, fsck(1M) determined that the inode's link count
for the specified file is wrong, and asks if you want to adjust
it to the value given.

Generally you can answer yes to this question without harming the
filesystem.

For more information on fsck, see the section on checking
filesystem integrity in the SystemAdministration Guide, Volume
I.

LL105W: Protocol error detected. 
================================

This error message comes from Lifeline Mail, an unbundled PC
compatibility application.

The likeliest cause for this problem is that someone set up a
user account without a password. Assign the user a password to
solve this problem.

ln: cannot create /dev/fb: Read-only file system 
================================================

During device reconfiguration at boot time, the system cannot
link to the frame buffer because /dev is on a read-only
filesystem.

Check that /dev/fb is a symbolic link to the hardware frame
buffer, such as cgsix or tcx. Ensure that the filesystem
containing /dev is mounted read-write.

lockd[N]: create_client: no name forinet address 0xN 
=====================================================

This lock daemon message usually indicates that the NIS
hosts.byname and hosts.byaddr maps are not coordinated.

Wait a short time for the maps to synchronize. If they don't,
takesteps to coordinate them.

For information on updating NIS data, see the section on NIS maps
in the NIS+ and FNS Administration Guide. If you are using the
AnswerBook, "hosts.byaddr" is a good search string.

Login incorrect 
===============

This message from the login(1) program indicates an incorrect
combination of login name and password. There is no way to tell
whether what's wrong is the login name, the password, or both.
Other programs such as ftp(1), rexecd(1M), sulogin(1M), and
uucp(1C) alsogive this error under similar conditions.

Check the /etc/passwd file and the NIS or NIS+ passwd map on the
local system to see if an entry exists for this user. If a user
has simply forgotten the password, su and set a new one with the
passwd usernamecommand. This command automatically updates the
NIS+ passwd map, but with NIS you'll need to coordinate the
update with the passwd map.

The "Login incorrect" problem can also occur with older versions
of NIS when the user name has more than eight characters. If this
is the case, edit the NIS password file, change the user name to
have eight or fewer characters, and then remake the NIS passwd
map.

If you cannot log in to the system as root, despite knowing the
proper password, it is possible that the /etc/passwd file is
corrupted. Try to log in as a regular user and su to root.

If that doesn't work, see the message "su: No shell" and follow
most of the instructions given there. Instead of changing the
default shell however, make the password field blank in
/etc/shadow.

lp hang 
=======

On a print server, the queue continues to grow but nothing comes
out of the printer.  The printer daemon is hung.

Here is a simple procedure for flushing a hung printing queue:

 1. Login or switch user to root.
 2. Issue the reject printername command to make sure no one
sends any job to the
   printer.
 3. Turn off power to the printer.
 4. If the active job appears to be causing the hang, remove it
from the print queue
   with the cancel jobnumber command, and ask the owner to
requeue that print
   job.
 5. Shut down the print queue with the /usr/lib/lpshut command.
 6. Remove the lock file /var/spool/lp/SCHEDLOCK and the
temporary files
   /var/spool/lp/tmp/*/*.
 7. Turn the printer back on.
 8. Restart the print queue with the /usr/lib/lpsched command.

For more information on print queuing, see the System
Administration Guide, Volume II. If you are using the AnswerBook,
"print server" is a good search string.

mailtool: Can't create dead letter: Permission denied 
=====================================================

An attempt was made to send a message with mailtool(1) from a
directory where the user does not have write permission, and the
user's home directory is currently unavailable.

Change to another directory and start mailtool again, or use
chmod(1) to change permissions for the directory (if possible).

mailtool: Could not initialize the Classing Engine 
==================================================

When a user runs mailtool(1) on a remote machine, setting the
DISPLAY environment back to the local machine, this message might
appear inside a dialog box window. The dialog box goes on to say
that the Classing Engine must be installed to use Attachments.
This problem occurs because rlogin(1) does not propagate the
user's environment.

Exit mailtool and set your OPENWINHOME environment variable to
/usr/openwin.  Then run mailtool again. The error message will
not appear, and you will be able to use Attachments.

Classing Engine is a new name for Tool Talk. Earlier versions of
mailtool said "Tool Talk: TT_ERR_NOMP" instead of Classing
Engine.

Mail Tool is confused about the state of your Mail File. 
========================================================

This message appears in a pop-up dialog box whenever you ask
mailtool(1) to access messages after another mail reader has
modified your inbox. A request follows:  "Please Quit this Mail
Tool."

Click "Continue" to close the dialog box, then exit mailtool. If
you continue trying to read mail, messages deleted by the other
mail reader will never appear, and mailtool will fail to see any
new messages.

mail: Your mailfile was found to be corrupted (Content-length mismatch). 
=======================================================================

This message comes from mail(1) or mailx(1) whenever it detects
messages with a different content length than advertised. The
mail program tells you which message might be truncated or might
have another message concatenated to it.

Two common causes of content length mismatches are the
simultaneous use of different mail readers (such as mail and
mailtool), or using a mail reading program (or an editor) that
does not update the Content-Length field after altering a
message.

The mailx program can usually recover from this error and
delineate mail message boundaries correctly. Pay close attention
to the message that might be truncated or combined with another
message, and to all messages after that one. If a mail file
becomes hopelessly corrupted, run it through a text editor to
eliminate all Content-Length lines, and ensure that each message
has a From (no colon) line for each message, preceded by a blank
line.

To avoid mailfile corruption, exit from mailtool without saving
changes when you are currently running mail or mailx.

Memory address alignment 
========================

This message can occur when printing large files on a
SPARCprinter attached to a SPARCstation 2.

Replace the SPARCstation 2 CPU with one that isat the most
recent dash level.

memory leaks 
============

An application uses up more and more memory, until all swap space
is exhausted.

Many developers have found that third party software (such as
Purify) can help identify memory leaks in their applications. If
you suspect that you have a memory leak, you can use sar(1) to
check on the Kernel Memory Allocation (KMA). Any driver or module
that uses KMA resources, but does not specifically return the
resources before it exits, can create a memory leak.

For more information on memory leaks, see the section on
monitoring system activity in the System Administration Guide,
Volume II. If you are using the AnswerBook, "displaying disk
usage" is a good search string. Also, see the section on system
resource problems in the NIS+ and FNS Administration Guide.

mount: /dev/dsk/variable is already mounted, /variable is busy, or... 
=====================================================================

While trying to mount a filesystem, the mount(1M) command
received a "Device busy" (EBUSY) error code.There are several
possible reasons: this /dev/dsk filesystem is already mounted on
a different directory, the busy path name is the working
directory of an active process, or the system has exceeded its
maximum number of mount points (unlikely).

Run /etc/mount to see if the filesystem is already mounted. If
not, check to see if any shells are active in the busy directory
(did the user cd into the directory?), or if any processes in the
ps(1) listing are active in that directory. If the reason for the
error message isn't obvious, try using a different directory for
the mount point.

mount: giving up on: /variable 
==============================

An existing server did not respond to an NFS mount request, so
after retrying a number of times (default1000), the mount(1M)
command has given up. Nonexistent servers or bad mount points
produce different messages.

If the "RPC: Program not registered" message precedes this one,
the requested mount serverprobably did not share (export) any
filesystems, so it has no NFS daemons running. Have the superuser
on the mount server share(1M) the filesystem, then run
/etc/init.d/nfs.server start to begin NFS service.

If the requested mount server is down or slow to respond, check
to see whether the machine needs repair or rebooting.

mount: mount-point /variable does not exist. 
============================================

Someone tried to mount a filesystem onto the specified directory,
but there is no suchdirectory.

If this is the directory name you want,run mkdir(1) to create
this directory as a mount point.

mount: the state of /dev/dsk/variable is not okay 
=================================================

The system was unable to mount the filesystem that was specified
because the super-block indicates that the filesystem might be
corrupted. This is not an impediment for read-only mounts.

If you don't need to write on this filesystem, mount(1M) it using
the -o ro option.  Otherwise, do as one of the message
continuation lines suggests and run fsck(1M) to correct the
filesystem state and update the super-block.

For more information on using fsck, see the section on checking
filesystem integrity in the System Administration Guide, Volume
I.

/net/variable: No such file or directory 
========================================

A user tried to change directory (for example with cd) to a
network partition on the system specified after /net/, but this
host either does not exist or has not shared (exported) any
filesystem.

To gain access to files on this system, try rlogin(1).

To export filesystems from the remote system, become superuser on
that system and run the share(1M) command with the appropriate
options. If that system is sharing filesystems for the first
time, also run /etc/init.d/nfs.server start to begin NFS service.

Network is down 
===============

A transport connection failed because it encountered a dead
network.

Report this error to the system administrator for the network. If
you are the person responsible for this network, check to see why
the network is dead and what repairs are necessary.

This error results from status information delivered by the
underlying communication interface.

The symbolic name for this error is ENETDOWN, errno=127.

Network is unreachable 
======================

An operational error occurred either because there was no route
to the network or because negative status information was
returned by intermediate gateways or switching nodes.

The returned status is not always sufficient to distinguish
between a network that is down and a host that is down. See the
"No route to host" message.

Check the network routers and switches to see if they are
disallowing these packet transfers. If they areallowing all
packet transfers, check network cablingand connections.

The symbolic name for this error is ENETUNREACH, errno=128.

NFS getattr failed for server variable: RPC: Timed out 
======================================================

This message appears on an NFS client that requested a service
from an NFS server whose hardware is failing. Often the message
"NFS read failed" appears along with this message. If the server
were merely down or slow to respond, the "NFS server not
responding" message would appear instead. Data corruption on the
server system is possible.

Because this message usually indicates server hardware failure,
initiate repair procedures as soon as possible. Check the memory
modules, disk controllers, and CPU board.

For more information on NFS tuning, see chapter on monitoring
network performance in the System Administration Guide, Volume
II.

nfs mount: Couldn't bind to reserved port 
=========================================

This message appears when a client attempts to NFS mount a
filesystem from a server that has more than one Ethernet
interface configured on the same physical subnet.

Always connect multiple Ethernet interfaces on one router system
to different physical subnetworks.

nfs mount: mount: variable: Device busy 
=======================================

This message appears when the superuser attempts to NFS mount on
top of an active directory. The busy device is actually the
working directory of a process.

Determine which shell on the workstation is currently located
below the mount point, and change out of that directory. Be wary
of subshells (such as su shells) that could be in different
working directories while the parents remain below the mount
point.

NFS mount: /variable mounted OK 
===============================

While booting, the system failed to mount the directory specified
after the first colon, probably because the NFS server involved
was down or slow to respond. The mount ran in the background and
successfully contacted the NFS server.

This is a purely informative message to let you know that the
mount process has completed.

NFS read failed for server variable 
===================================

This is generally a permissions problem. Perhaps a directory or
file permission was changed while the client held the file open.
Perhaps the filesystem's share or netgroup permissions changed.
If the server were down or the network saturated, the "NFS server
not responding" message would appear instead.

Log in to the NFS server and check the permissions of directories
leading to the file.  Make certain that the filesystem is shared
with (exported to) the client experiencing an NFS read failure.

For more information, see the chapter on NFS troubleshooting in
the NFS Administration Guide.

nfs_server: bad getargs for N/N 
===============================

This message comes from the NFS server when it gets a request
with unrecognized or incorrect arguments. Typically, it means the
request could not be XDR decoded properly. This can result from
corruption of the packet over the network, or from an
implementation bug causing the NFS client to improperly encode
its arguments.

If this message originates from a single client, investigate that
machine for NFS client software bugs. If this message appears all
over a network, especially accompanied by other networking
errors, investigate the network cabling and connectors.

NFS server variable not responding still trying 
===============================================

In mostcases this very common message indicates that the system
has requested a service from an NFS server that is either down or
extremely slow to respond. In some cases this message indicates
that the network link to this NFS server is broken, although
usually that condition generates other error messages as well. In
a few cases this message indicates NFS client set-up problems.

Check the non-responding NFS server to see whether the machine
needs repair or rebooting. Encourage your user community to
report such problems quickly but only once.

Should this message appear when booting a diskless client, make
sure that the client's /etc/hosts file and the network naming
service (NIS, NIS+, or other /etc/hosts files on the network)
have been updated.

Formore information, see the chapter on NFS troubleshooting in
the NFS Administration Guide.

NFS server variable ok 
======================

This message is the follow-up to the "NFS server not responding"
error. It indicates that the NFS server is back in operation.

When an NFS server first comes up, it will be busy fulfilling
client requests for a while. Be patient and wait for your client
system to respond. Making many extraneous requests only further
slows the NFS server response time.

nfs umount:variable: is busy 
=============================

This message appears when the superuser attempts to unmount an
active NFS filesystem. The busy point is the working directory of
a process.

Determine which shell (or process) on the workstation is
currently located in the remotely mounted filesystem, and change
(cd) out of that directory. Be wary of subshells (such as su
shells) that could be in different directories while the parent
shells remain in the NFS filesystem.

NFS write error on host variable: No space left on device. 
==========================================================

This console message indicates that an NFS-mounted partition has
filled up and cannot accept writing of new data. Unfortunately,
software that attempts to overwriteexisting files will usually
zero out all data in these files. This is particularly
destructive on NFS-mounted /home partitions.

Find the user or process that is filling up the filesystem, and
get the out-of-control process stopped as soon as you can. Then
delete files as necessary to create more space on the filesystem
(large core files are good candidates for deletion). Have users
write any modified files to local disk if possible. If this error
occurs often, redistribute directories to ease demandon this
partition.

For more information on disk usage, see the System Administration
Guide, Volume II.  If you are using the AnswerBook, "managing
disk use" is a good search string.

NFS write failed for server variable: RPC: Timed out 
====================================================

This error can occur when a file system is soft-mounted, and
server or network response time lags. Any data written to the
server during this period could be corrupted.

If you intend to write on a filesystem, never specify the soft
mount option. Use the default hard mount for all the filesystems
that are mounted read-write.

For more information, see the chapter on NFS troubleshooting in
the NFS Administration Guide.

NIS+ authentication failure 
===========================

This is a Federated Naming Service message. The operation could
not be completed because the principal making the request could
not be authenticated with the name service involved.

Run the nisdefaults(1) command to verify that you are identified
as the correct NIS+ principal. Also check that the system has
specified the correct public key source.

For more information, see the authentication and authorization
overview in the NIS+ and FNS Administration Guide.

No buffer space available 
=========================

An operation on a transport endpoint or pipe was not performed
because the system lacked sufficient buffer space or because a
queue was full. The target system probably ran out of memory or
swap space. Any data written during this condition will probably
be lost.

To add more swap area, use the swap -a command on the target
system.  Alternatively, reconfigure the target system to have
more swap space. As a general rule, wwap space should be two to
three times as large as physical memory.

The symbolic name for this error is ENOBUFS,errno=132.

No child processes 
==================

This message can appear when an application tries to communicate
with cooperating process that do not exist.

Restart the parent process so it can create the child processes
again. If that doesn't help, this could be the result of
programming error; contact the vendor or author of the program
for an update.

A wait(2) system call was executed by a process that had no
existing or unwaited-for child processes. The child processes
could have exited prematurely, or might never have been created.

The symbolic name for this error is ECHILD, errno=10.

No default media available 
==========================

The volume manager issues this message if a user makes an
eject(1) request when the drives containno diskette or CDROM to
eject.

Insert a diskette or CDROM. If the volume manager is confused and
there actually is a diskette or CDROM in a drive, run volcheck to
update the volume manager. If the system remains confused, try
booting with the -r option to reconfigure devices.

No directory! Logging in with home=/ 
====================================

The login(1) program could not find the home directory listed in
the password file or NIS passwd map, so it deposited the user in
the root directory.

Check that the user's home directory is mounted and is owned by
and accessible to that user. Perhaps the automounter tried to
mount the home directory, but the NFS server did not respond
quicklyenough. Try listing the files in /home/username. If the
NFS server responds to this request, have the user log out and
log in again.

It is possible that the automounter daemon is not running. Run
the ps command to see if automountd is present. If not,run the
second command; if it appears to be wedged, run both these
commands:

# /etc/init.d/autofs stop # /etc/init.d/autofs start

When the automounter daemon is running, verify that the
/etc/auto_master file has a line like this:

/home  auto_home

Verify that the /etc/auto_home file has a line like this:

+auto_home

These entries depend on the NIS auto_home map.

It is also possible that the NFS server has not shared (exported)
this /home directory, or that the NFS daemons on the server have
disappeared.

For more information on NFS, see the NFS Administration Guide.

No message of desired type 
==========================

An attempt was made to receive a message of a type that does not
exist on the specified message queue. See the msgop(2) man page
for details.

This indicates an error in the System V IPC message facility.
Generally the message queue is empty or devoid of the desired
message type, while IPC_NOWAIT is set.

The symbolic name for this error is ENOMSG, errno=35.

No recipients specified 
=======================

This message comes from the mailx(1) command whenever a user
doesn't provide an address in the To: field.

See the message "Recipient names must be specified" for details.

No record locks available 
=========================

No more record locks are available. The system lock table is
full.

The symbolic name for this error is ENOLCK, errno=46.

Perhaps a process called fcntl(2) with the F_SETLK or F_SETLKW
option, and the system maximum was exceeded. The system contains
several different locking subsystems, including fcntl,the NFS
lock daemon, and mail locking, all of which can produce this
error.

Try again later, when more locks might be available.

No route to host 
================

An operational error occurred because there was no route to the
destination host, or because of status information returned by
intermediate gateways or switching nodes.

The returned status is not always sufficient to distinguish
between a host that is down and a network that isdown. See the
"Network is unreachable" message.

Check the network routers and switches to see if they are
disallowing these packet transfers. If they are allowing all
packet transfers, check network cabling and connections.

The symbolic name for thiserror is EHOSTUNREACH, errno=148.

No shell Connection closed 
===========================

A user has attempted to remote login to the system, and has a
valid account name and password, but the shell specified for
their account is not available on that system. For example, the
seventh field could request the GNUBourne-again shell /bin/bash,
which does not exist on standard Solaris distributions.

If you have a copy of the requested shell, become superuser and
install the missing shell on that system. Otherwise, change the
user's password file entry (perhaps only in the NIS+ or NIS
passwd map) to specify an available shell such as /bin/csh or
/bin/ksh.

No space left on device 
=======================

While writing an ordinary file or creating a directory entry,
there was no free space left on the device. The disk, tape, or
diskette is full of data. Any data written to that device during
this condition will be lost.

Remove unneeded files from the hard disk or diskette until there
is space for all the data you are writing. It might be advisable
to move some directories onto another filesystem and create
symbolic links accordingly. When a tape is full, continue on
another one, use a higher density setting, or obtain a higher-
capacity tape.

To create multi-volume tapes or diskettes, use the pax(1) or
cpio(1) command; tar(1) is still limited to a single volume.

The symbolic name for this error is ENOSPC, errno=28.

No such device 
==============

An attempt was made to apply an operation to an inappropriate
device, such as writing to a nonexistent device.

Look in the /devices directory to see why this device does not
exist, or why the program expects it to exist. The similar "No
such device or address" message tends to indicate I/O problems
with an existing device, whereas this message tends to indicate a
device that does not exist at all.

The symbolic name for this error is ENODEV, errno=19.

No such device or address 
=========================

This can occur when a tape drive is off-line or when a device has
been powered off or removed from thesystem.

For tape drives, make sure the device is connected, powered on,
and toggled on-line (if applicable). For disk and CDROM drives,
check that the device is connected and powered on.

With all SCSI devices, ensure that the target switch or dial is
set to the number where the system originally mounted it. To
inform the system of a change to the target device number, reboot
using the -r (reconfigure) option.

This message results from I/O to a special file's subdevice that
either does not exist or that exists beyond the limit of the
device.

The symbolic name for this error is ENXIO, errno=6.

No such file or directory 
=========================

The specified file or directory does not exist. Either the file
name or path name was entered incorrectly.

Check the file name and path name for correctness and try again.
If the specified file or directory is a symbolic link, it
probably points to a nonexistent file or directory.

The symbolic name for this error is ENOENT, errno=2.

no such map in server's domain 
==============================

A user or an application tried to look up something using Network
Information Services (NIS), but NIS has no corresponding database
for this request.

Make sure the NIS map name is spelled correctly. To see a list of
nicknames for the various NIS maps, run the ypcat -x command. To
see a full list of the various NIS maps (databases), run the
ypwhich -m command. If the NIS service were not running on the
current machine, these commands would result in a "can't
communicate with ypbind" message.

No such process 
===============

This process cannot be found. The process could have finished
execution and disappeared, or it might still be in thesystem
under a different numeric ID.

Use the ps(1) command tocheck that the process ID you're
supplying is correct.

No process corresponds to the specified process ID (PID), light-
weight process ID, or thread_t.

The symbolic name for this error is ESRCH, errno=3.

No such user as variable-- cron entries not created 
===================================================

A file exists in /var/spool/cron/crontabs for the specified user,
but this user is not in /etc/passwd or the NIS passwd map. The
system cannot create cron entries for nonexistent users.

To eliminate this message at boot time, remove the cron file for
the nonexistent user, or rename it if the user's login name has
changed. If this is a valid user, create an appropriate password
entry for this name.

Not a directory 
===============

A non-directory was specified where a directory is required, such
as in a path prefix or as an argument to the chdir(2) system
call.

Look at a listing of all the files in the current directory and
try again, specifying a directory instead of a file.

The symbolic name for this error is ENOTDIR, errno=20.

Not enough space 
================

This message indicates that the system is running many large
applications simultaneously, and has run out ofswap space
(virtual memory). It could also indicate that applications failed
without freeing pages from the swap area. Swap space is an area
of disk set aside to store portions of applications and data not
immediately required in memory. Any data written during this
condition will probably be lost.

Reinstall or reconfigure the system to have more swap space. A
general rule of thumb is that swap space should be two to three
times as large as physical memory.  Alternatively, use mkfile(1M)
and swap(1M) to add more swap area. This example shows how to add
16 MB of virtual memory in the /usr/swap file (any filesystem
with enough free space would work):

# mkfile 16m /usr/swap # swap -a /usr/swap

To make this automatic at boot time, add the following line to
the /etc/vfstab file:

/usr/swap   -   -   swap   -   no  -

In calling the fork(2), exec(2), sbrk(2), or malloc(3C) routine,
a program asked for more memory than the system could supply.
This is not a temporary condition; swap space is a system
parameter.

The symbolic name for this error is ENOMEM, errno=12.

not found 
=========

This message indicates that the Bourne shell could not find the
program name given as a command.

Check the form and spelling of the command line. If that looks
correct, echo $PATH to see if the user's search path is correct.
When communications are garbled, it is possible to unset a search
path to such an extent that only built-in shell commands are
available. Here is a command to reset a basic search path:

$ PATH=/usr/bin:/usr/ccs/bin:/usr/openwin/bin:.

If the search path looks correct, check the directory contents
along the search path to see if programs are missing or if
directories are not mounted.

NOTICE: /variable: out of inodes 
================================

The filesystem specified after the first colon probably contains
many small files, exceeding the per-filesystem limit for inodes
(file information nodes).

If many small files were created unintentionally, removing them
will resolve the problem.

Otherwise, follow these steps to increase filesystem capacity for
small files. Make several backup copies of the filesystem on
different tapes (for safety), then bring the machine down to
single-user mode. Use the newfs(1M) command with the -i option to
increase inode density for this filesystem. Here is an example:

# newfs -i 1024 /dev/rdsk/partition

Finally, restore the filesystem from a backup tape. Note that
increasing the inode density slightly reduces total filesystem
capacity.

Not login shell 
===============

This message results when a user triesto logout(1) from a shell
other than the one started at login time.

To quit a non-login shell, use the exit(1) command. Continue
doing so until you have logged out.

For more general information on the login shell, see the section
on customizing your work environment in the Solaris Advanced
User's Guide.

Not on system console 
=====================

A user tried to login(1) to a system as the superuser (uid=0,
which is not necessarily root) from a terminal other than the
console.

Login to that system as a normal user, then run su(1M) to become
superuser. To allow superuser logins from any terminal, comment
out the CONSOLE line in /etc/default/login (this is not
recommended for security reasons).

Not owner 
=========

Either an ordinary user tried to do something reserved for the
superuser, or the user tried to modify a file in a way restricted
to the file's owner or to the superuser.

Switch user to root and try again.

The symbolic name for this error is EPERM, errno=1.

Not supported 
=============

This version of the system does not support the feature
requested, although future versions of the system might provide
support.

This is generally not a system message from the kernel, but an
error returned by an application. Contact the vendor or author of
the application for an update.

The symbolic name for this error is ENOTSUP, errno=48. 
 

operation failed [error 185], unknown group error 0, variable 
=============================================================

When you use admintool to add a user to a newly-created group,
admintool issues this error.

Apply patch 101384-05 to fix bug ID 1151837 and to provide a
workaround for bug ID 1153087.

Operation not applicable 
========================

This error indicates that no system support exists for some
function that the application requested.

Ask the system vendor for an upgrade, or contact the vendoror
author of the application for an update.

This message indicates that no system support exists for an
operation. Many modules set this error when a programming
function is not yet implemented. If you are writing a program
that produces this message while calling a system library, try to
find and use an alternative library function. Future versions of
the system might support this operation; check system release
notes for further information.

The symbolic name for this error is ENOSYS, errno=89.

out of memory 
=============

Hundreds of different programs can produce this message when the
system is running many large applications simultaneously. This
message usually means that the system has run out of swap space
(virtual memory).

See the message "Not enough space" for details. Any data written
during this condition will probably be lost.

PARTIALLY ALLOCATED INODE I=N CLEAR? 
=====================================

During phase 1, fsck(1M) found that the specified inode was
neither allocated nor unallocated. The reason is probably that
the system crashed in the middle of a sync(2) or write(2)
operation.

Should you answer yes to this question, "UNALLOCATED" messages
might result during phase 2, if any directory entries point to
this inode. If you are being careful, exit fsck(1M) and run
ncheck(1M) (specifying the inode number after the -i option) to
determine which file or directory is involved here. You might be
able restore this file or directory from another system. It is
also possiblethat fsck will copy this file to the lost+found
directory in a later phase.

For more information, see the chapter on checking filesystem
integrity in the System Administration Guide, Volume I.

passwd.org_dir: NIS+ servers unreachable 
========================================

This is the first of three messages thatan NIS+ client prints
when it cannot locate an NIS+ server on the network.

See the message "hosts.org_dir: NIS+ servers unreachable" for
details.

Password does not decrypt secret key for unix.uid@variable 
==========================================================

This message appears at login time when a user's password is not
identical to the user's keylogin network password. When a system
is running NIS+, the login program firstperforms UNIX
authentication, and then attempts a keylogin(1) for secure RPC
authentication.

To gain credentials for secure RPC, users can run keylogin (after
login) and type in their secret key. To stop this message from
appearing at login time, users can run the chkey -p command and
set their network password to bethe same as their NIS+ password.
If a user doesn't remember the network password, the system
administrator should delete and re-create the user's credentials
table entry so the user can establish a new network password with
chkey.

Permission denied 
=================

An attempt was made to access a file in a way forbidden by the
protection system.

Check the ownership and protection mode of the file (with a long
listing from the ls-l command) to see who is allowed to access
the file. Then change the file or directory permissions as
needed.

The symbolic name for this error is EACCES, errno=13.

Please specify a recipient. 
===========================

With mailtool, this message comes up in a dialog box whenever a
user tries to deliver a message with no address in the To: field.

See the message "Recipient names must be specified" for details.

Protocol not supported 
======================

The requested networking protocol hasnot been configured into
the system, or no implementation for it exists. (A protocol is a
formal description of the messages to be exchanged and the rules
to be followed when systems exchange information.)

Verify that the protocol is in the /etc/inet/protocols file and
in the NIS protocols map, if applicable. If the protocol is not
listed, and you want to permit its use, configure the protocol as
documented or as required.

The symbolic name for this error is EPROTONOSUPPORT, errno=120.

Protocol wrong type for socket 
==============================

This message indicates either application programming error, or
badly configured protocols.

Make sure that the /etc/protocols file corresponds number-for-
number with the NIS protocols map. It it does, ask the vendor or
author of the application for an update.

A protocol was specified that does not support the semantics of
the socket type requested. This amounts to a request for an
unsupported type of socket. Look at the source code that made
this socket request and check that it requested one of the types
specifiedin /usr/include/sys/socket.h.

The symbolic name for this error is EPROTOTYPE, errno=98.

Read error from network: Connection reset by peer 
=================================================

This message appears when a user is remotely logged into a
machine that crashes or gets rebooted during the rlogin(1) or
rsh(1) session. Any data changes that were not saved are probably
lost. Sometimes this message appears only when the user types
something, even though the system went down hours before.

Try torlogin again, perhaps after waiting a few minutes for the
system to reboot.

Read-only file system 
=====================

Files and directories on filesystems that are mounted read-only
cannot be changed.

If you only modify these files and directoriesoccasionally,
rlogin(1) to the servers from which the filesystems are mounted
and change the files or directories there. If you change these
files and directories frequently, mount(1M) the filesystems
read/write.

The symbolic name for this error is EROFS, errno=30.

rebooting... 
============

This message appears on the console to indicate that the machine
is booting, either after the superuser issued a reboot command,
or after a system panic if the EEPROM's watchdog-reboot? variable
is set to true.

Allow the machine to boot itself. In case of a system panic, look
above this message for other indications of what went wrong.

Recipient names must be specified 
=================================

Somebody sent mail without a valid recipient in the To: field, so
sendmail could not deliver the mail message. Using mail(1), the
recipient's address might have been specified using spaces or
non-alphanumeric characters. The mailtool(1)and mailx(1)
commands try to prevent this by issuing "Please specify a
recipient" or "No recipients specified" messages instead. If
there is at least one valid recipient, each invalid recipient
address will generate a "User unknown" message.

Look in the sender's dead.letter file for the automatically saved
message, andhave the originator send it again, this time
specifying a recipient.

For more information about sendmail, see the Mail Administration
Guide.

Reset tty pgrp from N to N 
==========================

The C shell sometimes issues this message when it clears away the
window process group after the user exits the window system. This
can happen when the window system doesn't clean up after itself.

Proceed with your work. This message is purely informational.

Resource temporarily unavailable 
================================

This indicates that the fork(2) system call failed because the
system's process table is full, or that a system call failed
because of insufficient memory or swap space. It is also possible
that a user is not allowed to create anymore processes.

Simply waiting often gives the system time to free resources.
However if this message occurs often on a system, reconfigure the
kernel and allow more processes.  To increase the size of the
process table in Solaris 2.x, increase the value of maxusers in
the /etc/system file. The default maxusers value is the amount of
main memory in MB, minus 2.

If one user is not allowed to create any more processes, that
user has probably exceeded the memorysize limit; see the limit(1)
man page for details.

The symbolic name for this error is EAGAIN, errno=11.

Result too large 
================

This is a programming error or a data input error.

Ask the program's author to fix this condition.

This indicates an attempt to evaluate a mathematical programming
function at a point where its value would overflow or underflow.
The value of a programming function in the math package (3M) is
not representable within machine precision. This could occur
after floating point overflow or underflow (either single or
double precision), or after total loss of numeric significance in
Bessel functions.

Note that this message can indicate "Result too small" in the
case of floating pointunderflow.

To help pinpoint a program's math errors, use the matherr(3M)
facility.

The symbolic name for this error is ERANGE, errno=34.

rmdir: variable: Directory not empty 
====================================

The rmdir(1) command can remove empty directories, only. The
directory whose name appears after the first colon in the message
still contains some files or directories.

Use rm(1) instead of rmdir. To remove this directory and
everything underneath it, use the rm -ir command to recursively
descend the directory, being asked if you want to delete each
element. To remove the directory and all its contents without
being asked for approval, use the rm -r command.

ROOT LOGIN /dev/console 
=======================

This syslog message indicates that someone has logged in as root
on the system console.

If you have just logged in as root, don't worry. If this is not
you, consider the possibility of a security breach. The best
site-wide policy is for all system administrators to su instead
oflogging in as root.

ROOT LOGIN /dev/pts/N FROM variable 
===================================

This syslog message indicates that someone has remote logged in
as root on a pseudo-terminal from the system specified after the
FROM keyword.

For security reasons, it is a bad idea to allow root logins from
anywhere besides the console. To restrict superuser logins to the
console, remove the comment from the CONSOLE line in
/etc/default/login.

rx framing error 
================

Usually this error indicates a hardware problem.

Check the Ethernet cabling and connectors to locate a problem.

A framing error occurs when the Ethernet I/O driver receives a
non-integral unit of octets, such as 63 bytes and then 3 bits.
(Ethernet specifies the use of octets.) Framing errors are caused
by corruption of the starting or ending frame delimiters. These
can be corrupted by some violation of the encoding scheme.

Framing errors are a subset of CRC errors, which are usually
caused by anomalies on the physical media.An "alignment/framing
error" is a type of CRC error where octet boundaries do not line
up.

SCSI bus DATA IN phase parity error 
===================================

The most common cause of this problem is unapproved hardware.
Some SCSI devices for thePC market do not meet the high I/O
speed requirements for the UNIX market.  Other possible causes of
this problem are improper cabling or termination, and power
fluctuations. Data corruption is possible but unlikely to occur,
because this parity error prevents data transfer.

Check that all SCSI devices on the bus are Sun approved hardware.
Then verify that all cables are no longer than six meters, total,
and that all SCSI connections are properly terminated. If power
fluctuations are occuring, invest in an uninterruptible power
supply.

SCSI transport failed: reason 'reset' 
=====================================

This message indicates that the system sent data over the SCSI
bus, but the data never reached its destination because of a SCSI
bus reset. The most common cause of this condition is conflicting
SCSI targets.¤Data corruption is possible but unlikely to
occur, because this failure prevents data transfer.

Verify that all cables are no longer than six meters, total, and
that all SCSI connections are properly terminated. If power
surges are a problem, acquire a surge suppressor or
uninterruptible power supply.

A machine's internal disk drive is usually SCSI target 3. Make
sure that external and secondary disk drives are targeted to 1,
2, or 0, and do not conflict with each other.  Also make sure
that tape drives are targeted to 4 or 5, and CD drives to 6,
avoiding any conflict with each other or with disk drives. If the
targeting of the internal disk drive is in question, power off
the machine, remove all external drives, turn the power on, and
from the PROM monitor run the probe-scsi-all or probe-scsi
command.

If SCSI device targeting is acceptable, memory configuration
could be the problem, especially for machines with the sun4c
architecture. Ensure that high-capacity memory chips (such as 4MB
SIMMs) are in lower banks, while lower-capacity memory chips
(such as 1MB SIMMs) are in the upper banks.

Note that SPARC systems do not always support third party CDROM
drives, and might generate a similar "unknown vendor" error
message. Check with the CDROM vendor for specific configuration
requirements.

Some third party disk drives have a read-ahead cache that
interferes with Solaris device drivers. Make sure that any
existing read-ahead cache facility is turned off.

¤ For more information on SCSI targets, see the section on
device naming conventions in the Solaris 1.x to Solaris 2.x
Transition Guide. If you are using the AnswerBook, "scsi targets"
is a good search string.

Segmentation Fault 
==================

Segmentation faults usually result from programming error. This
message is usually accompanied by a core dump, except on read-
only filesystems.

To see which program produced a core file, run either the file(1)
command or the adb (1) command. The following examples show the
output of the file and adb commands on a core file from the
dtmail program.

$ file core core: ELF 32-bit MSB core file SPARC Version 1, from
`dtmail'

$ adb core core file = core -- program `dtmail' SIGSEGV  11:
segmentation violation ^D      (use Control-d to quit the adb
rogram)

Ask the vendor or author of this program for a debugged version.

A process has received a signal indicating that it attempted to
access an area of memory that is protected or that does not
exist. The two most common causes of segmentation faults are
attempting to dereference a null pointer or indexing past the
bounds of an array.

sendmail[N]: NOQUEUE: SYSERR: net hang reading from variable 
============================================================

This is a sendmail message that appears on the console and in the
log file /var/adm/messages. If this message occurs once for a
particular user, it is possible that a mail message from this
user ends with a partial line (having no terminating newline
character). If this message appears frequently or at busy times,
especially along with other networking errors, it could indicate
network problems.

Check the user's mail spool file to see if a message ends without
a newline character.  If so, talk with the user and determine how
to prevent the problem from occurring again. If these messages
are the result of network problems, you could try moving the mail
spool directory to another machine with a faster network
interface.

During the SMTP receipt of DATA phase, a message-terminating
period on a line of its own never arrived, so sendmail timed out
and produced this error.

setmnt: Cannot open /etc/mnttab for writing 
===========================================

The system is having problems writing to /etc/mnttab. It is
possible that the filesystem containing /etc is mounted read-
only, or is not mounted at all.

Check that this file exists and is writable by root. If so,
ensure that the /etc filesystem has been mounted, and is mounted
read-write rather than read-only.

share_nfs: /home: Operation not applicable 
==========================================

This message usually indicates that the system has a local
filesystem mounted on /home, which is where the automounter
usually mounts users' home directories.

When a systemis running the automounter, do not mount local
filesystems on the /home directory. Mount them on another
directory, such as /disk2, which on most systems you will have to
create.You could also change the automounter auto_home entry,
but that is a more difficult solution.

Soft error rate (N%) during writing was too high 
================================================

This message from the SCSI tape drive appears when Exabyteor DAT
tapes generate too many soft (recoverable) errors. It is followed
bythe advisory "Please, replace tape cartridge" message. Soft
errors are an indication that hard errors could soon occur,
causing data corruption.

First clean the tape head witha cleaning tape as recommended by
the manufacturer. If that doesn't work, replace the tape
cartridge. You might need to replace the tape drive if the
problem still occurs with new tape cartridges.

Soft error rate (retries = N) during writing was too high 
=========================================================

This message from the SCSI tape drive appears when Archive tapes
generate too many soft (recoverable) errors. It is followed by
the advisory "Periodic head cleaning required and/or replace tape
cartridge" message. Soft errors are an indication that hard
errors couldsoon occur, causing data corruption.

First clean the tape head with a cleaning tape as recommended by
the manufacturer. If that doesn't work, replace the tape
cartridge. Youmight need to replace the tape drive if the
problem still occurs with new tape cartridges.

Stale NFS file handle 
=====================

A file or directory that was opened by an NFS client was either
removed or replaced on the server.

If you were editing this file, write it to a local filesystem
instead. Try remounting the filesystem on top of itself or
shutting down any client processes that refer to stale file
handles. If neither of these solutions works, reboot the system.

The original vnode isno longer valid. The only way to get rid of
this error is to force the NFS server and client to renegotiate
file handles.

The symbolic name for this error is ESTALE, errno=151.

statd: cannot talk to statd at variable 
=======================================

This message comes from the NFS status monitor daemon statd,
which provides crash recovery services for the NFS lock daemon
lockd. The message indicates that statd has left old references
in the /var/statmon/sm and /var/statmon/sm.bak directories. After
a user has removed or modified a host in the hosts database,
statd might not properly purge files in these directories, which
results in its trying to communicate with a nonexistent host.

Remove the file named variable (where variable is the hostname)
from both the /var/statmon/sm and /var/statmon/sm.bak
directories. Then kill the statd daemon and restart it. If that
doesn't get rid of the message, kill and restart lockd as well.
If that doesn't work, reboot the machine at your convenience.

stty: TCGETS: Operation not supported on socket 
===============================================

This message results when a user tries to remote copy with rcp(1)
or remote shell with rsh(1) from one machine to another, but has
an stty(1) command in the remote

The solution is to move the stty command to the user's .login (or
equivalent) file.  Alternatively, execute the stty command in
.cshrc only when the shell is interactive.  Here is a test to do
just that:

if ($?prompt) stty ...

The rcp andrsh commands make a connection using sockets, which
do not support stty's TCGETS ioctl.

su: No shell 
============

This message indicates that someone changed the default login
shell for root to a program missing from the system. For example,
the final colon-separated field in /etc/passwd could have been
changed from /sbin/sh to/usr/bin/bash, which does not exist in
that location. Possibly an extra space was appended at the end of
line. The outcome is that you cannot login as root or switch user
to root, and so cannot directly fix this problem.

The only solution is to reboot the system from another source,
then edit the password file to correct this problem. Invoke
sync(1M) several times, then halt the machine by typing Stop-A or
by pressing the reset button. Reboot single-user from CDROM, the
net, or diskette, such as by typing boot cdrom -s at the ok
prompt.

After the system comes up and gives you a # prompt, mount the
device corresponding to the original / partition somewhere, such
as with a mount(1M) command similar to the one below. Then run an
editor on the newly-mounted system password file (use ed(1) if
terminal support is lacking):

# mount /dev/dsk/c0t3d0s0 /mnt # ed /mnt/etc/passwd

Use the editor to change the password file's root entry to call
an existing shell, such as /usr/bin/csh or /usr/bin/ksh.

To keep the "No shell" problem from happening, habitually use
admintool or /usr/ucb/vipw to edit the password file. These tools
make it difficult to change password entries in ways that make
the system unusable.

su: 'su root' failed for variable on /dev/pts/N 
===============================================

The user specified after "for" tried to become superuser, but
typed the wrong password.

If the user is supposed to know the root password, wait to see if
the correct password is supplied. If the user is not supposed to
know the root password, ask why he or she is attempting to become
superuser.

su: 'su root' succeeded for variable on /dev/pts/N 
==================================================

The user specified after "for" just became superuser by typing
the root password.

If the user is supposed to know the root password, this message
is purely informational. If the user is not supposed to know the
root password, change this password immediately and ask how the
user learned it.

syncing file systems... 
=======================

This indicates that the kernel is updating the super-blocks
before taking the system down, to ensure filesystem integrity.
This message appears after a halt(1M) or reboot (1M) command. It
can also appear after a system panic, in which case the system
might contain corrupted data.

If you just halted or rebooted the machine, don't worry-- this
message is normal. In case of a system panic, look up the panic
messages that appear above this one. Your system vendor might be
able to help diagnose the problem. So that you can describe the
panic to the vendor, either leave your system in its panicked
state or be sure that you can reproduce the problem.

Numbers that sometimes display after the three dots in the
message show the count of dirty pages that are being written out.
Numbers in brackets show an estimate of the number of busy
buffers in the system.

syslog service starting. 
========================

During system reboot, this message might appear and theboot
seems to hang. After starting syslogd(1M) service, the system
runs /etc/rc2.d/S75cron, which in turn calls ps(1). Sometimes
after an abrupt system crash /dev/bd.off becomes a link to
nowhere, causing the ps command to hang indefinitely.

Reboot single user (for example with boot -s) and run ls -l
/dev/bd* to see if this is the problem. If so, remove
/dev/bd.off, then run bdconfig off or reboot with the -r
(reconfigure) option.

This is the most commonly reported situation that causes ps to
hang.


tar: /dev/rmt/0: No such file or directory 
==========================================

The default tape device /dev/rmt/0, or possibly the device
specified by the TAPE environment variable, is not currently
connected to the system, is not configured, or its hardware
symbolic link is broken.

List the files in the /dev/rmt directory to see which tape
devices are currently configured. If none are configured, 
 ensure
that a tape device is correctly attached to the system, and
reboot with the -r option to reconfigure devices.

If tape devices other than /dev/rmt/0 are configured, you 

could
specify one of them after the -f option of tar(1).

tar: directory checksum error 
=============================

This error message from tar(1) indicates that the checksum of the
directory and the files it has read from tape does not match the
checksum advertised in the header block. Usually this indicates
the wrong blocking factor, although it could indicate corrupt
data on tape.

To resolve this problem, make certain that the blocking factor
you specify on the command line (after -b) matches the blocking
factor originally specified. If in doubt, leave out the block
size and let tar determine it automatically. If that doesn't
help, tape data could be corrupted.

tar: tape write error 
=====================

A physical write error has occurred on the tar(1) output file,
which is usually a tape, although it could be a diskette or disk
file. Look on the system console, where the device driver should
provide the actual error condition. This might be a write-
protected tape, a physical I/O error, an end-of-tape condition,
or a File too large limitation.

In the case of write-protectedtapes, enable the write switch.
For physical I/O errors, the best course of action is to replace
the tape with a new one. For end-of-tape conditions, try using a
higher density if the device supports one, or use cpio(1) or pax
(1) for their multi-volume support., When encountering File too
large limitations, use the parent shell'slimit(1) or ulimit
facility to increase the maximum file size.

For more information on tar tapes, see the section on copying UFS
files in the System Administration Guide,Volume I.

Text is lost because the maximum edit log size has been exceeded. 
=================================================================

This message appears at the beginning of a cmdtool(1) session
after 100,000 characters have gone by in the scrolling window.
Clicking on the top rectangle of the scrollbar might display this
message. No data were lost, but the user cannot scroll back
before this wraparound point.

To increase the maximum size of the Command Tool log file, use
cmdtool with the-M option, specifying more than 100,000 bytes.

THE FOLLOWING FILE SYSTEM(S) HAD AN UNEXPECTED INCONSISTENCY: 
============================================================

At boot time the /etc/rcS script runs the fsck(1M) command to
check the integrity of filesystems marked "fsck" in /etc/vfstab.
If fsck cannot repair a filesystem automatically, it interrupts
the boot procedure and produces this message. When fsck gets into
this state, it cannot repair filesystems without losing one or
more files, so it wants to defer this responsibility to you, the
administrator. Data corruption has probably already occurred.

First run fsck -n on the filesystem, to see how many and what
type of problems exist.  Then run fsck again to repair the
filesystem. If you have a backup of the filesystem, you can
generally answer "y" to all the fsck questions. It's a good idea
to keep a record of all problematic files and inode numbers for
later reference. To run fsck yourself, specify options as
recommended by the boot script. For example:

# fsck /dev/rdsk/c0t4d0s0

Usually, files lost during fsck repair were created just before a
crash or power outage, and cannot be recovered. If important
files are lost, you can recover them from backup tapes.

If you don't havea backup, ask an expert to run fsck for you.

For more information, see the sectionon checking filesystem
integrity in the System Administration Guide, Volume I.

The SCSI bus is hung. Perhaps an external device is turned off. 
===============================================================

This message appears near the beginning of rebooting, immediately
after a "Boot device: ..." message, and then the system hangs.
The problem is conflicting SCSI targets for a non-boot device.
Having an external device turned off is unlikely to cause this
problem.

See the message "Boot device:
/iommu/sbus/variable/variable/sd@3,0" for a solution.

For more information, see the section on halting and booting in
the System Administration Guide, Volume I.

THE SYSTEM IS BEING SHUT DOWN NOW !!! 
=====================================

This message means the system is going down immediately and it's
too late to save any changes.

This message is often preceded by messages telling you that the
system is going down in 15 minutes, 10 minutes, and so on. When
you see these initial broadcast shutdown messages, save all your
work, send any e-mail you're working on, and close your files.
Fortunately vi sessions are automatically saved for later
recovery, but many otherapplications have no crash protection
mechanism. Data loss is likely.

For more information on shutting down the system, see the System
Administration Guide, Volume I. If you are using the AnswerBook,
"halting the system" is a good search string.

The system will be shut down in N minutes 
=========================================

Thismessage from the system shutdown(1M) script informs you that
the superuser is taking down the system.

Save all changes now or your work will be lost. Write out any
files you were changing, send any e-mail messages you were
composing, and close your files.

For more information on shutting down the system, see the System
Administration Guide, Volume I. If you are using the AnswerBook,
"halting the system" is a good search string.

This mail file has been changed by another mail reader. 
=======================================================

This message appears in a pop-up dialog box whenever you start
mailtool(1) while another mail reader has the inbox locked. A
question follows: "Do you wish to ask that mail reader to save
the changes?" You are given three choices.

If you choose "Save Changes" mailtool will request the other mail
reader to relinquish its lock and write out any changes it has
made to your inbox. If you choose "Ignore" mailtool will read
your inbox without locking it. If you choose "Cancel" mailtool
will exit.

Timeout waiting for ARP/RARP packet 
===================================

This problem can occur while booting from the net, and indicates
a network connection problem.

Make sure the Ethernet cable is connected to the network. Check
that this system has an entry in the NIS ethers map or locally on
the boot server. Then check the IP address of the server and the
client to make sure they are on the same subnet. Local /etc/hosts
files must agree with each other and withthe NIS hosts map.

If those are not causing the problem, go to the system's PROM
monitor ok prompt and run test net to test the network
connection. (On older PROM monitors, use test-net instead.) If
the network test fails, check the Ethernet port, card, fuse, and
cable, replacing them if necessary. Also check the twisted pair
port to make sure it is patched to the correct subnet.

For more information on packets, see SPARC: Installing Solaris
Software. If you are using the AnswerBook, "ARP/RARP" isa good
search string.

Too many links 
==============

An attempt was made to create more than the maximum number of
hard links (LINK_MAX, by default 32767) to a file. Because each
subdirectory is a link to its parent directory, the same error
results from trying to create too many subdirectories.

Check to see why there are so many links to the same file. To get
more than the maximum number of hard links, use symbolic links
instead.

The symbolic name for this error is EMLINK, errno=31.

Too many open files 
===================

A process has too many files open at once. The system imposes a
per-process soft limit on open files, OPEN_MAX (usually 64),
which can be increased, and a per-process hard limit (usually
1024), which cannot be increased.

You can control the soft limit from the shell. In the C shell,
use the limit command to increase the number of descriptors. In
the Bourne or Korn shells, use the ulimit command with the -n
option to increase the number of file descriptors.

If the window system refuses to start new applications because of
this error, increase the open file limit in your login shell
before starting the window system.

The symbolic name for this error is EMFILE, errno=24.

umount: warning: /variable not in mnttab 
========================================

This message results when the superuser attempts to unmount a
filesystem that is not mounted. Note that subdirectories of
filesystems,such as /var, cannot be unmounted.

Run the mount(1M) or df(1M) command to see what filesystems are
mounted. If you really want to unmount one of them, specify the
existing mount point.

Unable to install/attach driver 'variable' 
==========================================

These messages appear in /var/adm/messages at boot time, when the
system tries to load drivers for devices the machine does not
have.

Despite the alarmist tone, this message is intended as purely
informational. You probably don't want all these device drivers,
because they make your system kernel larger, requiring more
memory.

undefined control 
=================

This message, prefaced by the file name and line number involved,
is from the C preprocessor /usr/ccs/lib/cpp, and indicates a line
starting with a sharp (#) but not followed by a valid keyword
such as define or include.

A piece of software might be running the C preprocessor on an
initialization file that you thought was interpreted by a shell.
In most shells, the sharp (#) indicates a comment. The C
preprocessor considers comments to be anythingbetween /* and */
delimiters.

Unmatched ` 
===========

This message from the C shell csh(1) indicates that a user typed
a command containing a backquote symbol (`) without a closeing
backquote. Similar messages result from an unmatched single quote
(') or an unmatched double quote ("). Other shells generally give
a continuation prompt when a command line contains an unmatched
quote symbol.

Correct the command line and try again. To continue typing on
another line, give the C shell a backslash right before the
newline.

UNREF FILE I=i OWNER=o MODE=m SIZE=s MTIME=t
============================================= CLEAR? 
======

During phase 4, fsck(1M) discovered that the specified file was
orphaned because the inode had no record of its pathname. In
other words, the file was not connected into any directory.

Answer yes to reconnect the file into the lost+found directory.
Then contact the file's owner to ask whether they want it back,
and where they want you to place it.

For more information, see the chapter on checking filesystem
integrity in the System Administration Guide, Volume I.

Use "logout" to logout. 
=======================

This C shell message might come as a surprise to Bourne or Korn
shell users accustomed to logging out with a Control-d.

When ignoreeof is set, the C shell requires users to logout by
typing logout or exit.  Write any modified files to disk before
exiting.

/usr/openwin/bin/xinit: connection to X server lost 
===================================================

This means that the xinit(1) program, which sets up X11 resources
and starts a window manager, failed to locate the X server
process. Perhaps the user interrupted window system startup, or
exited abnormally from OpenWindows (for example, by killing
processes or by rebooting). It is possible that the X server
crashed. Data loss is possible in some cases. Depending on
process timing, this message might be normal when OpenWindows
exits during a system reboot.

The only solution is to exit and restart OpenWindows. You do not
need to reboot the system unless it hangs and fails to give you a
console prompt. To exit OpenWindows, select Workspace->Exit. To
restart OpenWindows, type openwin at the system prompt.

Value too large for defined data type 
=====================================

The user ID or group ID of an IPC object or file system object
was too large to be stored in an appropriate member of the
caller-provided structure.

Run the application on a newer system, or ask the program's
author to fix this condition.

This error occurs only on systems that support a larger range of
user or group ID values than a declared member structure can
support. This condition usually occurs because the IPC or file
system object resides on a remote machine with a larger value of
type uid_t, off_t, or gid_t than that of the local system.

The symbolic name for this error is EOVERFLOW, errno=79.

WARNING: Clock gained N days-- CHECK AND RESET THE DATE! 
========================================================

Each workstation contains an internal clock powered by a
rechargeable battery. After the system is halted and turned off,
the internal clock continues to keep time. When the system is
powered on and reboots, the system notices that the internal
clock has gained time since the workstation was halted.

In most cases, especially if the power has been off for less than
a month, the internal clock keeps the correct time, and you do
not have to reset the date. Use the date(1) command to check the
date andtime on your system. If the date or time is wrong,
become superuser and use the date(1) command to reset them.

WARNING: No network locking on variable: 
 contact adminto install server change 
=====================================

The Solaris 2.x mount(1M) command issues this message whenever it
mounts a filesystem that doesn't have NFS locking, such as a
standard SunOS 4.1.x exported filesytem. Data loss is possible in
applications that depend on locking.

On the remote SunOS 4.1.x system, install the appropriate
rpc.lockd jumbo patch to implement NFS locking. For SunOS 4.1.4,
install patch #102264; for SunOS 4.1.3, install patch #100075;
for earlier 4.1 releases, install patch #101817.

WARNING: processorlevel 4 interrupt not serviced 
=================================================

This message is basically a diagnostic from the SCSI driver.
Especially on machineswith the sun4c architecture, it can appear
on the console every 10 minutes or so.

To reduce the frequency of this message, add this line near the
bottom of the /etc/system file and reboot:

set esp:esp_use_poll_loop=0

You might also see this message repeatedly after manually
removing a CD when it was busy. Don't do this! To get the system
back to normal, reboot the system with the -r (reconfigure)
option.

WARNING: /tmp: File system full, swap space limit exceeded 
==========================================================

The system swap area (virtual memory) has filled up. You needto
reduce swap space consumption by killing some processes or
possibly by rebooting the system.

See the message "Not enough space" for information about
increasingswap space.

WARNING: TOD clock not initialized-- CHECK AND RESET THE DATE! 
========================================================-=====

This message indicates that the Time Of Day (TOD) clock reads
zero, so its time is the beginning of the UNIX epoch: midnight 31
December 1969. On a brand-new system, the manufacturer might have
neglected to initialize the system clock. On older systems it is
more likely that the rechargeable battery has run out and
requires replacement.

First replace the batteryaccording to the manufacturer's
instructions. Then become superuser and use the date(1) command
to set the time and date. On SPARC systems the clock is powered
by the same battery as the NVRAM, so a dead battery also causes
loss of the machine's Ethernet address and host ID, which are
more serious problems for networked systems.

WARNING:Unable to repair the / filesystem. Run fsck 
====================================================

This message comes at boot time from the /etc/rcS script whenever
it gets a bad return code from fsck(1) after checking a
filesystem. The message recommends an fsck command line, and
instructs you to exit the shell when done to continue booting.
Then the script places the system in single-user mode so fsck can
be run effectively.

See "/dev/rdsk/variable: UNEXPECTED INCONSISTENCY" for
information about repairing UFS filesystems.

See "THE FOLLOWING FILE SYSTEM(S) HAD AN UNEXPECTED
INCONSISTENCY" for information about repairing non-UFS
filesystems.

Watchdog Reset 
==============

This fatal error usually indicates some kind of hardware problem.
Data corruption on the system is possible.

Look for some other message that might help diagnose the problem.
By itself, a watchdog reset doesn't provide enough information;
because traps are disabled, all information has been lost. If all
that appears on the console is an ok prompt, issue the PROM
command below to view the final messages that occurred just
before system failure:

ok f8002010 wector p

Yes, that word iswector, not vector.

The result is a display of messages similar to those produced by
the dmesg(1M) command. These messages can be useful in finding
the cause of system failure.

This message doesn't come from the kernel, but from the OpenBoot
PROM monitor, a piece of Forth software that gives you the ok
prompt before you boot UNIX. If the CPU detects a trap when traps
are disabled (an unrecoverable error), it signals a watchdog. The
OpenBoot PROM monitor detects the watchdog, issues this message,
and brings down the system.

Watchdog Reset, Rebooting. 
==========================

See the message "Watchdog Reset" for details. This rebooting
message occurs under the same conditions, but when the EEPROM's
watchdog-reboot? variable is set to true, causing the machine to
automatically reboot itself. Data corruption on the system is
possible.

Who are you? 
============

Many networking programs can print this message, including
from(1B), lpr(1B), lprm(1B), mailx(1), rdist(1), sendmail(1M),
talk(1), and rsh(1). The command prints this message when it
cannot locate a password file entry for the current user.  This
might occur if a user logged in just before the superuser deleted
that user's password entry, or if the network naming service
fails for a user who has no entry in the local password file.

If a user's password file entry was accidentally deleted, restore
it from backups or from another password file. If a user's login
name or user ID was changed, ask that user to logout and login
again. If the network naming service failed, check the NIS
server(s) and repair or reboot as necessary.

There is a known problem (bug 1138025) with starting hundreds of
rsh processes on another machine. This message appears because
rsh hangs while binding to a reserved port, and responds too
slowly to interact with the network naming service.

Window Underflow 
================

This message often occurs at boot time, sometimes along with a
"Watchdog Reset" error. It comes from the OpenBoot PROM monitor,
which was passed a processor trap from the hardware. This error
indicates that some programtried to access a SPARC register
window that wasn't accessible from the processor.

On some system architectures, specifically sun4c, the problem
could be that different capacity memory chips are mixed together.
Someone might have placed 1MB SIMMs in the same bank with 4MB
SIMMs. If this is so, rearrange the memory chips. Make sure to
put higher-capacity SIMMs in the first bank(s), and lower-
capacity SIMMs inthe remaining bank(s); never mix different
capacity SIMMs in the same bank.

The problem could also be that cache memory on the motherboard
has gone bad and needs replacement. If main memory is installed
correctly, try swapping the motherboard.

The best way to isolate the problem is to look at the %pc
register to see where it got its arguments from, and why the
arguments were bad. If you can reproduce the condition causing
this message, your system vendor might be able to help diagnose
the problem.

X connection to variable:0.0 broken (explicit kill or 
 server shutdown). 
=================

This means that the client has lost its connection to the X
server. The "0.0" represents the display device, which is usually
the console. This message can appear when a user is running an X
application on a remote system with the DISPLAY set back to the
original system and the remote system's X server disappears,
perhaps because someone exited X windows orrebooted the machine.
It sometimes appears locally when a user exits the window system.
Dataloss is possible if applications were killed before saving
files.

Try to run the application again in a few minutes after the
system has rebooted and the window system is running.

xinit: not found 
================

OpenWindows was probably not installed properly, and the
openwin(1) program could not find xinit(1) to start up the X
windows system. If the user is running another version of X
windows, such as the MIT X11 distribution, the startx program
serves the same function as xinit.

Check the PATH environment variable to make sure it contains the
appropriate X windows install directory. Verify that xinit is in
this directory as an executable program.

XIO: fatal IO error 32 (Broken pipe) on X server "variable:0.0" 
===============================================================

This means that I/O with the X server has been broken. The "0.0"
represents the display device, which is usually the console. This
message can appear when a user is running Display PostScript
applications and the X server disappears or the client is shut
down. Data loss is possible if applications disappeared before
saving files.

Try to run the application again in a few minutes after the
system has rebooted and the window system is running.

Xlib: Client is not authorized to connect to Server 
===================================================

See the message "Xlib: connection to ... refused by server" for
details.

Xlib: connection to "variable:0.0" refused by server 
====================================================

This message is immediately followed by the "Xlib: Client is not
authorized to connect to Server" message. These messages indicate
that an X windows application tried to run on the X server
specified inside double quotes, which did not allow the request.
The "0.0" represents the display device, which is usually the
console. If no server name appears, the superuser probably tried
to run an X application on the current machine in an X session
that was owned by somebody else.

To allow this client to connect to the X server, run xhost
+clientname on the X server system. Only the owner of the current
X session (who is not necessarily the superuser) isallowed to
run the xhost command. If somebody else is running X windows on
the server, ask them to log out and then start your own X session
on that server; remote X connections are usually allowed for the
same user ID.

xterm: fatal IO error 32 (Broken Pipe) or KillClient on X server
variable:0.0" 
=============

This means that xterm(1) has lost its connection to the X server.
The "0.0" represents the display device, which is usually the
console. This message can appear when a user is running xterm and
the X server disappears or the client gets shut down. Data loss
is possible if applications were killed before saving files.

Try to run the terminal emulator again in a few minutes after the
system has rebooted and the window system is running.

XView warning: Cannot load font set 'variable' (Font Package) 
=============================================================

This message from the XView library warns that a requested font
is not installed on the X server. Often multiple warnings appear
about the same font. The set of available fonts can vary from
release to release.

To see which fonts are available on the X server, run the
xlsfonts(1) program. Then specify another font name that you see
in the output of xlsfonts. Sometimes it is possible to locate a
similar font from a different vendor.

There are two packages of X windowsfonts: the common but not
required fonts (SUNWxwcft), and the optional fonts (SUNWxwoft).
Run pkginfo(1) to see if both these packages are installed, and
add them to the system as you wish.

ypbind[N]: NIS server for domain "variable" OK 
==============================================

This message appears after an "NIS server not responding" message
to indicate that ypbind(1M is able to communicate with an NIS
server again.

Proceed with your work. This message is purely informational.

ypbind[N]: NIS server not responding for domain 
 "variable"; still trying 
=========================

This means that the NIS client daemon ypbind(1M) cannot
communicate with an NIS server for the specified domain. This
message appears when a workstation running the NIS naming service
has become disconnected from the network, or when NIS servers are
down or extremely slow to respond.

If other NIS clients are behaving normally, check the Ethernet
cabling on the workstation that is getting this message. On SPARC
machines, disconnected network cabling also produces a series of
"no carrier" messages. On x86 machines, the above message might
be your only indication that network cabling is disconnected.

If many NIS clients on the network are giving this message, go to
the NIS server in question and reboot or repair as necessary. To
locate the NIS server for a domain, run the ypwhich(1) command.
When the server machine comes back in operation, NIS clients give
an "NIS server for domain OK" message.

For more information about ypbind, see the section on
administering secure NFS in the NFS Administration Guide.

ypwhich: can't communicate with ypbind 
======================================

This message from the ypwhich(1) command indicates that the NIS
binder process ypbind(1M) is not running on the local machine.

If the system is not configured to use NIS, this message is
normal and expected.  Configure the system to use NIS if
necessary.

If the system is configured to use NIS, but the ypbind process is
not running, invoke the following command to start it up:

# /usr/lib/netsvc/yp/ypbind -broadcast

zsN: silo overflow 
==================

This message means that the Zilog 8530 character input silo (or
serial portFIFO) overflowed before it could be serviced. The
zs(4S) driver, which talks to a Zilog Z8530 chip, is reporting
that the FIFO (holding about two characters) has been overrun.
The number after zs shows which serial port experienced an
overflow:

zs0 - tty serial port 0 (/dev/ttya) zs1 - tty serial port 1
(/dev/ttyb) zs2 - keyboard port (/dev/kbd) zs3 - mouse port
(/dev/mouse)

Silo overflows indicate that data in the respective serial port
FIFO has been lost.  However, consequences of silo overflows
might be negligible if the overflows occur infrequently, if data
loss is not catastrophic, or if data can be recovered or
reproduced.  For example, although a silo overflow on the mouse
driver (zs3) indicates that the system could not process mouse
events quickly enough, the user can perform mouse motions again.
Similarly, lost data from a silo overflow on a serial port with a
modem connection transferring data using uucp(1C) will be
recovered when uucp discovers the loss of data and requests
retransmission of the corrupted packet.

Frequent silo overflow messages can indicate a zs hardware FIFO
problem, a serial driver software problem, or abnormal data or
system activity. For example, the system ignores interrupts
during system panics, so mouse and keyboard activity result in
silo overflows.

If the serial ports experiencing silo overflows are not being
used, a silo overflow could indicate the onset of a hardware
problem.

Another type of silo overflow is one that occurs during reboot
when an HDLC line is connected to any of the terminal ports. For
example, an X.25 network could be sending frames before the
kernel has been told to expect them. Such overflow messages can
be ignored.
Subscribe to: Posts ( Atom )