interview
TECHNOLOGY
Th e
JO EL
GOO D MAN w
e
i
v
r
e
t
in
14
Oracle Scene Summer 2011
TECHNOLOGY
In Conversation with
Oracle Expert Joel Goodman
Technical Team Leader, Oracle Core Delivery
Europe, Middle East, Africa.
OS: Joel, let’s start by
explaining your role within
Oracle?
Joel: Core Delivery is the
part of Oracle University that
delivers courses on the core
technology stack - from the
middle tier application server
to the database, storage and
operating systems layer. It
doesn’t include applications. It
includes languages and tools,
and everything that uses those
in the Middle Tier.
I don’t have expertise in
all those areas but, from a
technical management point of
view, I’m the Technical Team
Leader advising management
on technical strategy. I spend
about a quarter of my time
doing seminars and courses for
Oracle University. About a third
of my time is spent working
with the Oracle Certification
Team, leading the development
of certification examinations
for Oracle Database
Administrators. This includes
Oracle Certified Associate,
Professional, Expert and Master
Exams.
I also work fairly closely
with the Curriculum Group
on helping to design and
develop the database and
Linux systems course content,
specialising in Dataguard,
RAC, Grid Infrastructure, and in
the past two years, the Oracle
Exadata Database Machine.
Oracle Expert - Joel Goodman
OS: Could you give a brief
overview of Grid Infrastructure
installations, for example, for
clusters or standalone servers?
Joel: Sure. Firstly, there are two
flavours of Grid Infrastructure;
for a cluster and for a
standalone server. In order to
understand this we should
go back to the origins of this
software component.
Grid Infrastructure consists
of two separate components
in each of those flavours. The
origins of Grid Infrastructure
start with the Clusterware
from previous releases.
Clusterware belongs to a
family of products known as
high availability software (HA)
which Oracle has provided
since Oracle 10g Release 1.
Other operating systems
vendors have their versions of
HA software. For example, Sun
cluster for Sun and HACMP for
IBM AIX and HP Serviceguard.
Clusterware allows managing
multiple nodes as a cluster
and also provides for failure
isolation of I/O from failing
nodes. It also provides tools for
the management of resources
that run on particular nodes
in the cluster - including restarting place or failing over a
resource to surviving node is
required in the event of node
failure.
Oracle Clusterware has been
a pre-requisite bit of software
since Oracle 10g, to run RAC
databases. Traditionally the
OS Administrator managed
High Availability software,
although Oracle Clusterware
is sometimes administered by
DBAs.
When Oracle Clusterware
was originally released, it
had dependencies on other
technologies.
One of the requirements for
running a cluster is some form
of shared storage. You can
have shared storage provided
ukoug.org
15
TECHNOLOGY
in using Network Attached
Storage (NAS) on filers. There
is also the possibility of using
shared storage from Storage
Area Networks or SANS and
traditionally on RAC databases,
the Clusterware shared storage
was provided
by one of these
methods,
usually by a
SAN.
Oracle 10g
Release 1 also
contained
Oracle’s
own storage
technology
called ASM - a
shared storage
solution
for Oracle
databases.
The problem was that the
Clusterware itself has certain
files that are required for it
to operate. It has the OCR
- the Clusterware repository
files - which contain a little
repository describing the
Clusterware, the nodes of the
cluster, the resources that run
on the cluster, which nodes
they’re meant to run and so
on. These must be shared
files but couldn’t be stored in
ASM. There are also voting files
which need shared storage
but these couldn’t use ASM in
Oracle 10g or in Oracle 11g
Release 1. This meant that
customers running Clusterware
required some other shared
storage solution such as using
SAN or using filers in addition
to using ASM if they were using
ASM for their database shared
storage. One of the design
requirements for 11g Release
2 was to have ASM become
a shared storage solution for
every possible requirement
including the Clusterware. To
do this required that ASM be
available during the installation
of the Clusterware to allow
TECHNOLOGY
Grid Infrastructure. It provides
the infrastructure for grid
computing including RAC,
which is the database layer for
grid computing.
That explains why there was
a name change and it also
explains one
of the very
important new
features in Oracle
11g Release 2
which is that ASM
can be used not
only for RAC
database storage
but shared
cluster storage
as well. And,
that’s not the only
shared storage
requirement.
There are
others, which I’ll get to when
we discuss some of the ASM
features of 11g Release 2.
Inside the Clusterware there’s
a component called the Cluster
Ready Services daemon,
which is the bit that manages
this and its job is to start, stop
and monitor the resources
that are meant to run on the
various nodes of the cluster.
It can make decisions about
restarting failed components
in place, failing these
resources over to other nodes
and maintaining metadata
dependency relationships. A
resource which depends on
another resource to be active
for it to work properly, will have
that resource automatically
started and the Clusterware will
not start something before one
of its dependent resources has
started as well.
In Oracle 10g and 11g Release
“Core Delivery is the part of
Oracle University that delivers
courses on the core technology
stack - from the middle tier
application server to the database,
storage and operating
systems layer.”
16
Oracle Scene Summer 2011
the Clusterware files to be
defined as residing inside
ASM. That required ASM
to be installed at the same
time as the Clusterware was
being installed otherwise you
wouldn’t have ASM there to
begin with! The consequence
of this was that the installation
of the Clusterware would
have to include the installation
of all the executables and
other files needed to support
ASM as well. So, this couldn’t
be called this ‘Clusterware’
anymore, because it combined
technology from two different
areas - one was the ASM
technology which originated as
a database oriented technology
and one was the Clusterware
which originated as an
operating system administrator
orientated technology. A new
name was needed to describe
this mix, and the result was
1, if a customer wished to use
ASM, they would install another
Oracle Home from which they
would typically run their ASM
instances. ASM, in a nonclustered environment, still
requires a very small subset
of Clusterware to do specific
work involving resource locks
and a few other types of
cashing mechanisms so that
the ASM environment can
coordinate its activities with the
database instances running
on the same machine. This is
done for releases prior to 11g
Release 2, by having a small
subset of Clusterware running
called Group Services even
when running on a standalone
server. Grid Infrastructure
for a standalone server, still
includes this functionality, but
also contains the resource
management functionality
of the Clusterware with no
failover capability. But it can
automatically start resources
in the correct order, monitor
their health and restart them
in place; all nice extras for a
standalone server that can help
customers to avoid having to
do their own implementations.
Even though there are no other
nodes on a standalone server,
there’s still the possibility of:
A) Having resources start up
automatically;
B) Having them shut down
cleanly;
C) Monitoring them and then
restarting them in place as
appropriate.
Prior to 11g Release 2 DBAs
would need to script this
themselves. There were some
tools from Oracle to automate
start-up and shut down of
databases but there was no
monitoring capability built in.
Oracle Restart is the name of
the Clusterware subset that’s
used for managing resources
on a standalone server. It
not only manages start-up in
the correct way but provides
a comprehensive set of
administrative tools. Of course,
many customers work in both
environments - a combination
of standalone Oracle servers
as well as clusters. Now that
Oracle Restart exists, certain
new features that have been
developed in 11g Release 2
require that the customers
use Oracle Restart. I predict
that, as time goes by, more
and more features will depend
on customers having Oracle
Restart installed because it’s
there now and if it’s there now
it’s there to be used. So it’s
an ease of use solution, it’s a
way of making administration
consistent and it’s very much
the case that most people
can benefit strongly from the
automated management and
monitoring rather than having
to write their own scripts to
provide the solution.
OS: Given that the install
process and the set-up of
Grid Infrastructure is well
documented, what are some
good tips and pointers would
you say, for troubleshooting
cluster problems?
Joel: It’s all about preparation.
Installing Grid Infrastructure is
more complex than a Database
Administrator might expect.
Grid Infrastructure has many
pre-requisite steps to make
certain that the cluster is in
a proper condition for the
Grid Infrastructure install to
be successful. It’s more the
sort of things an Operating
System Administrator would
be familiar with, but then they
wouldn’t know some of the
Oracle specific requirements.
We spend quite a bit of time on
preparation and configuration
during our Grid Infrastructure
administration classes, to help
customers appreciate the need
for planning.
A hefty chunk of time is spent
doing installs of the Grid
Infrastructure, so students learn
not only to do the install, but
to recognise the kind of set-up
that’s required. Fortunately,
Oracle provides two tools
that might help administrators
handle this. We have a Linux
package - an RPM- called
Oracle_Validated. Installing this
will make the operating system
‘Oracle ready’ to some degree
meaning it will create certain
users and set certain OS kernel
parameters to appropriate
values for an Oracle install.
It is really designed primarily
for single instance installs
not for RAC but can still save
some time when installing Grid
Infrastructure for a standalone
server.
For clusters, there is a tool
called the Cluster Verification
Utility (Cluvfy). It can be used
in a variety of contexts. It can
be used at different stages
of the proceedings to check
whether or not the environment
is correct. It can also be
used, after everything has
been installed successfully, to
check different components
ukoug.org
17
TECHNOLOGY
of the Clusterware. There are
stage checks and component
checks in the tool. So, from
an installation point of view,
my advice is to always follow
the requirements in the
documentation.
Installation will:
• Require a certain amount of
shared storage;
• Require network adaptors
such as a public network
adapter for each node;
• Require another network
adapter for an interconnect
between the nodes;
• Check for certain OS users;
• Check for the presence of
certain packages on the
operating system; and,
• Conduct a post HWOS
stage check (post hardware
and operating system) to
check if you are ready to
start installing your Grid
Infrastructure.
There are other stages along
the way, which students cover
in class, but this should be your
first approach - be prepared
and think it through. The Grid
Infrastructure installation for
a cluster has certain optional
components which may be
installed using the advanced
install which have other setup requirements. But I think
it’s beyond the scope of this
little discussion to do that in
great deal of detail. One of the
features, called Grid Naming
Service (GNS) is in effect, a
Domain Name Server and
knowledge of the administration
of DNS, servers, DHCP servers
and networks is required to use
some of these extra advanced
18
Oracle Scene Summer 2011
features.
This makes GNS in 11g R2 very
interesting but it also creates
certain challenges because
the scope of the technology
stack is quite extensive. It is
doing some things which are
OS Administrator oriented. It’s
interfacing with Network Admin
tools. It’s also doing other
things involving shared storage
which might involve Storage
Administrators. It requires a
broad multidisciplinary skill set
to the point where, at the very
least, the people responsible
for the install can hold an
intelligent discussion with their
colleagues.
OS: Is GI functionality better
than the proceeding Oracle
Clusterware and automatic
storage management
functionality?
Joel: You can run Clusterware
without using ASM if you don’t
want to. You can use filers for
example. The new Oracle 11g
Release 2 Clusterware has
several new features. One of
them is a new approach to
managing the Clusterware
itself. In previous versions,
there were certain platform
dependent implementations for
getting the Clusterware stack
initialised when the operating
system itself was booted. So,
if you were running on Linux
then it might be done in a
certain way. If you were running
on Solaris it would be slightly
different and so on. Oracle
made a decision that it was
easier to have a consistent way
of managing the start-up, shut
down and monitoring of the
TECHNOLOGY
Grid Infrastructure components
themselves, so what they’ve
done is create a new layer
called the High Availability
Services Demon (OHASD).
OHASD’s responsibility is to
get the Grid Infrastructure
stack started up on a node.
OHASD is generic so when
an operating system reboots,
OHASD is started, it looks to
see whether the Clusterware
is meant to run enabled or
disabled. If the administrator
runs it disabled, it means the
Clusterware is only started
if you start it manually. If its
running enabled then OHASD
will start it automatically. But
either way once you get the
Clusterware started OHASD
starts up the components of
the Clusterware in the correct
sequence using the same
kind of logic used by CRSD to
manage the resources that are
themselves managed by the
Clusterware.
CRSD manages resources that
can be cluster-wide resources,
meaning resources that can
either run on multiple nodes
or resources that may fail over
from one node to another.
Interestingly, when you run Grid
Infrastructure for a standalone
server there is no cluster,
there is only the local node
and therefore OHASD does
everything. There is no CRSD
when you use a standalone
server because there’s no
cluster. The code for OHASD
and the code for CRSD are
almost identical, since they’re
more or less doing the same
kinds of things; they’re just
doing them to different types
of resources. In previous
differing responsibilities. Right
management was restricted to
releases, for example, if I
now they’re still running the
wanted to check the state of the the ASM instance as a whole
same executable.
and not to the individual disk
Clusterware itself on an 8 node
One of the new features for
groups that it managed. That’s
cluster, I’d have to connect to
ASM that I would point to is the
now been changed in Oracle
each node and run the crsctl
11g Release 2. Each disk group ability to support the voting
command. Now I have an
files and the OCR in ASM. ASM
OHASD daemon that talks to its is a separate resource and
now provides a shared storage
the dependency relationships
counterparts on other nodes. I
solution for every conceivable
that are defined in the OCR
like that feature particularly but
type of shared storage needed
are down at a level of the disk
there are others. The number
on an Oracle cluster.
groups upon which a database
of different resources managed
The Grid Infrastructure
instance depends.
by the Clusterware within the
also supports the ability to
Grid Infrastructure, for
example, is far more
“CRSD manages resources that can manage third party nonOracle applications such
granular than it used
be cluster-wide resources, meaning as Apache based web
to be. For example,
applications. They may
in older releases, we
resources that can either run on
be monitored using the
knew that database
multiple nodes or resources that may fail Oracle Clusterware and
instances existed
are provided with the same
and we knew that
over
from
one
node
to
another.”
kind of high availability and
they could depend
fail-over as are provided
on the ASM instance located
for
Oracle
RAC database
When the Clusterware wants
on the same node. If the ASM
instances, listeners etc. These
to start a database instance
instance was not running, then
applications may also require
on a particular node, it needs
the database instance wouldn’t
shared storage on conventional
to ensure the ASM instance
be started because it would
file systems. If a web
is up on that node and that
not be able to access its data.
application runs on node 1 of
specific disk groups, needed
However, if the ASM instance
a cluster and it accesses data,
by the database instance are
was started, but one or more
but then fails over to another
mounted. So the granularity
dependent disk groups were
node it still requires access to
of resource dependency
not mounted, then errors would
the same data as it was using
is another change in the
occur because the granularity
on the original node. ASM did
Clusterware that I quite like.
of dependency was at the level
Another new feature that I like is not support that in previous
of disk groups and not simply
releases.
called the Single Client Access
to the ASM instance. ASM
Oracle database homes are
Name (SCAN). The SCANs
disk groups are equivalent in
stored in a flat file system too,
and SCAN listeners have
some ways to a logical volume
but older versions or Oracle did
made the architecture of the
and if you have an ASM that’s
not provide general purpose
Oracle listeners very different.
running and there are five disk
cluster file systems stored
They provide a separation
groups but three of them are
inside ASM. Grid Infrastructure
of responsibility for listeners
mounted and two of them are
permits this, allowing an Oracle
not mounted then any database between those listeners which
database home to be stored
handle connection requests
instance that depends on one
inside ASM. This can help
of those disk groups that wasn’t and those which make the load
balancing decisions. It presents when single instance Oracle
mounted could get an error if
databases are supported
the possibility that the two
it tried to start up. The trouble
under the Clusterware for cold
different listener types will end
was that, in previous releases,
failover, and the home directory
up diverging based on their
the granularity of resource
ukoug.org
19
TECHNOLOGY
is only mounted on one node at
any one time. The file system is
stored inside ASM, and is also
defined as a cluster resource
on which the database instance
depends. This guarantees
that the file system is mounted
by the Clusterware before
attempting to use any files in
the home directory.
ASM was extended in 11g
Release 2 to include the ASM
Cluster File System (ACFS).
The space for files in the file
system is allocated from ASM
disk groups rather than being
provided from device files
or luns. You can use ACFS
for a shared Oracle home
directory; you can use them
for application files, or for
DBA oriented files such as
the directory specified in the
diagnostic_dest parameter.
ASM now provides a complete
shared storage solution so that
it’s possible to run the Grid
Infrastructure without using any
other shared storage solution.
There are some other very nice
features in ACFS as well. ACFS
was available in Oracle Release
11.2.0.1 for Linux and Windows
only. From Oracle Release
11.2.0.2, it’s also available
on Solaris. It supports copy
on write (COW) snapshots,
something that traditional Linux
EXT 3 file systems didn’t have
and the Windows file system
didn’t have. Furthermore in
Oracle Release 11.2.0.2, ACFS
has been extended to support
replication as well where an
ACFS file system on one node
is replicated automatically
to another nodes It has its
own logs and its own method
of transport to the remote
20
Oracle Scene Summer 2011
location providing a protection
mechanism in case of failure at
the original location.
OS: What are the Oracle
University courses covering
this kind of content that you’d
recommend?
Joel: There is a 4 day
course on Grid Infrastructure
Administration that covers the
install of the Grid Infrastructure
for a cluster. It includes
monitoring, administration
of the Clusterware and
administration of the ASM
as plus a bit of diagnostics
and tuning. The course is
aimed at OS Administrators,
Storage Administrators and
Database Administrators. In
my experience, the attendees
have been primarily OS
Administrators with a few DBAs
attending too. That skill set is a
prerequisite for the three day
RAC Course - a stand alone
three day 11g Release 2 RAC
Administrators Course.
To administer RAC databases
assumes some knowledge
of Grid Infrastructure. So if
people have already got RAC
administrators skills, then they
only require Grid Infrastructure
administration skills. They
should attend the 4 day Grid
Infrastructure course only and
not bother with the 3 day, RAC
admin course, because 11g
Release 2 RAC isn’t very much
different than 11g Release 1.
There are one or two changes
involving services that really
make a difference. Someone
with minimum or no RAC skills
would probably need to attend
TECHNOLOGY
both courses.
Oracle has another option,
but it’s challenging. There’s
a course called the Grid
Infrastructure and RAC
Administration accelerated
course which is a five day
event packing in seven days
of content. The first day tends
to run till 6:30 or 7 o’clock at
night. The second day is also
long but and in the first two
days we probably cover nearly
three days worth of work by
having an extra four or five
hours. I also assign a bit of
homework in the middle of the
week. The ‘boot camp’ is a
popular choice. In the Americas
where travel distances are
considerable, they don’t run
the four and three day course
separately very often if at
all; they tend to run only the
accelerated version.
Joel Goodman is Oracle’s
Senior Principal Technical
Team Leader.
For more information about
Oracle University contact:
T: 0845 777 7 711
T:+44 11 89 726 500
E:[email protected]
March saw an innovation for
the UKOUG Application Server
and Middleware SIG: a new
“dual-venue” format. Working
to a single agenda we had some
speakers, at each location, with
each presentation being relayed
live to the other site.
The venues
were
Oracle’s
City Office
in London
and Fujitsu’s
office in
Warrington
(thanks to
both for
providing such excellent facilities)
and our theme for the day was
“Oracle Fusion Middleware 11g
Upgrade” – a subject close to many
people’s hearts in light of the end of
10g Premier Support later this year.
The plan was to transmit the slides
via a live webcast and broadcast
the speech over a conference
phone line (not wanting to put all
eggs in one internet basket!). We
also distributed the slide decks
to both sites in advance so our
fallback would have been “next slide
please,” or perhaps “beeeep” like
the audio-visual presentations in
French lessons (for those educated
in the 1980’s!).
On the day, the technology worked
almost perfectly, with overhead
loudspeakers in Oracle’s office
meaning that even people at the
back of the room could hear
clearly. Pushing the technology we
attempted a networking session,
though without roving microphones
this wasn’t quite so successful – I’ve
a few ideas up my sleeve for the
next time!
One of the reasons many people
had attended the SIG was to hear
John Stegeman’s Forms 11g Upgrade
presentation. Unfortunately family
circumstances, the previous day,
meant that John wasn’t able to
appear in person but, thanks to
the technology, he still delivered
his presentation, much to the
appreciation of the delegates.
took the time to complete the
survey.
Given that we made the SIG much
easier for some members to reach,
and increased our attendance
numbers by more than 25 per cent
(which I suspect could be higher for
other locations), we considered this
a successful
experiment. By
the time you
read this, we
will have had
our second
dual-venue SIG
(Birmingham
with London
as a secondary,
and Virtualisation as its theme).
Hopefully we will have been able to
fine-tune the format and the event
will have had a bumper turnout!
This is just one of the ways the SIG
committees are trying to make the
events more accessible and valuable
to members. If you have other
suggestions please send them to
Application Server
& Middleware
by Simon Haslam
The post-event feedback was
generally positive. Only one
delegate (in Warrington) felt the
new format was “much worse” than
a normal, single-venue SIG; 23% felt
it was “slightly worse”; and the rest
thought it was the same or better.
In addition nearly 60% said that the
dual SIG format was “slightly better”
or “much better” than webcasts
delivered to people’s desktops.
Notably, feedback from London
(where 5 of the 8 sessions were
delivered from) was generally more
positive than Warrington, suggesting
that we need to improve the
experience at the secondary site.
A few other comments from
people there on the day: “all the
sessions were geared perfectly
around the issues we have”,
“we wouldn’t get 1:1 or 1: few
discussions at coffee/lunch and
subsequent swapping of email
addresses with people if done as a
webinar”, “the SIG format still has
the networking advantage”, and “I
was surprised and pleased at how
well this worked”.
Thank you to all of those who
[email protected]
UKOUG
AUTHOR
PROFILE
Simon Haslam
is in his 4th year
as Chair of the UKOUG Application
Server and Middleware SIG, and is a
regular speaker at Oracle events. He’s
one of only two Oracle ACE Directors
(Middleware & SOA) in the UK and
an enthusiast of all things to do with
Oracle infrastructure. His time is split
between middleware consultancy and
troubleshooting, so he can be usually
found in his native habitat of ‘the
customer meeting room’ (scrawling
on a whiteboard) or else frantically
tapping at a keyboard! Simon’s Fusion
Middleware blog can be found via:
http://simonhaslam.co.uk/
ukoug.org
21
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement