Effective Planning and Use of TSM V6 Deduplication
Effective Planning and Use of IBM Tivoli Storage Manager V6
Deduplication
08/17/12
1.0
Authors:
Jason Basler
Dan Wolfe
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 1 of 42
Effective Planning and Use of TSM V6 Deduplication
Document Location
This is a snapshot of an on-line document. Paper copies are valid only on the day they are printed. The
document is stored at the following location:
https://www.ibm.com/developerworks/wikis/display/tivolistoragemanager/deduplication
Revision History
Revision Revision
Number Date
1.0
08/17/12
Summary of Changes
Initial publication
Disclaimer
The information contained in this document is distributed on an "as is" basis without any warranty either
expressed or implied.
This document has been made available as part of IBM developerWorks WIKI, and is hereby governed by the
terms of use of the WIKI as defined at the following location:
http://www.ibm.com/developerworks/tivoli/community/disclaimer.html
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 2 of 42
Effective Planning and Use of TSM V6 Deduplication
Contents
Document Location........................................................................................................................................2
Revision History.............................................................................................................................................2
Disclaimer......................................................................................................................................................2
Contents..................................................................................................................................3
1 Introduction..........................................................................................................................5
1.1 Overview..................................................................................................................................................5
1.1.1 Description of deduplication technology............................................................................................5
1.1.2 Data reduction and data deduplication..............................................................................................6
1.1.3 Server-side and client-side deduplication..........................................................................................7
1.1.4 Pre-requisites for configuring TSM deduplication..............................................................................8
1.1.5 Choosing between TSM deduplication and appliance deduplication.................................................8
1.2 Conditions for effective use of TSM deduplication.................................................................................10
1.2.1 Traditional TSM architectures compared with deduplication architectures......................................10
1.2.2 Examples of appropriate use of TSM deduplication........................................................................11
1.2.3 Data characteristics for effective deduplication................................................................................12
1.3 When is it not appropriate to use TSM deduplication?...........................................................................13
1.3.1 Primary storage of backup data is on VTL or physical tape.............................................................13
1.3.2 No flexibility with the backup processing window............................................................................13
1.3.3 Restore performance considerations...............................................................................................13
2 Resource requirements for TSM deduplication.................................................................15
2.1 Database and log size requirements......................................................................................................15
2.1.1 TSM database capacity estimation..................................................................................................15
2.1.2 TSM database log size estimation...................................................................................................16
2.2 Estimating capacity for deduplicated storage pools...............................................................................16
2.2.1 Estimating storage pool capacity requirements...............................................................................16
2.3 Hardware recommendations and requirements.....................................................................................18
2.3.2 Hardware requirements for TSM client deduplication......................................................................19
3 Implementation guidelines.................................................................................................20
3.1 Deciding between client and server deduplication.................................................................................20
3.2 TSM Deduplication configuration recommendations..............................................................................21
3.2.1 Recommendations for deduplicated storage pools..........................................................................21
3.2.2 Recommended options for deduplication.........................................................................................23
3.2.3 Best practices for ordering backup ingestion and data maintenance tasks.....................................24
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 3 of 42
Effective Planning and Use of TSM V6 Deduplication
4 Estimating deduplication savings......................................................................................27
4.1 Factors that influence the effectiveness of deduplication.......................................................................27
4.1.1 Characteristics of the data...............................................................................................................27
4.1.2 Impacts from backup strategy decisions..........................................................................................29
4.2 Interaction of compression and deduplication........................................................................................30
4.2.1 How deduplication and compression interact with TSM...................................................................30
4.2.2 Considerations related to compression when choosing between client-side and server-side
deduplication............................................................................................................................................31
4.3 Understanding the TSM deduplication tiering implementation...............................................................33
4.3.1 Controls for deduplication tiering.....................................................................................................33
4.3.2 The impact of tiering to deduplication storage reduction..................................................................34
4.3.3 Client controls that optimize deduplication efficiency.......................................................................35
4.4 What kinds of savings can I expect for different application types.........................................................35
4.4.1 IBM DB2..........................................................................................................................................36
4.4.2 Microsoft SQL..................................................................................................................................37
4.4.3 Oracle..............................................................................................................................................37
4.4.4 VMware...........................................................................................................................................39
5 How to determine deduplication results.............................................................................40
5.1 Simple TSM Server Queries...................................................................................................................40
5.1.1 QUERY STGPOOL..........................................................................................................................40
5.1.2 Other server queries affected by deduplication...............................................................................41
5.2 TSM client reports..................................................................................................................................41
5.3 TSM deduplication report script..............................................................................................................42
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 4 of 42
Effective Planning and Use of TSM V6 Deduplication
1 Introduction
Data deduplication is a technology that removes redundant data to reduce the storage capacity requirement
for retaining the data. When deduplication technology is applied to data protection it can provide a highly
effective means for reducing overall cost of a data protection solution. Tivoli Storage Manager introduced
deduplication technology beginning with TSM V6.1. This document describes the benefits of deduplication
and provides guidance on how to make effective use of the TSM deduplication feature as part of a welldesigned data protection solution.
Following are key points regarding TSM deduplication:
•
TSM deduplication is an effective tool for reducing overall cost of a backup solution
•
Additional resources (DB capacity, CPU, and memory) must be configured for a TSM server that is
configured with TSM deduplication. However, when properly configured, the benefit of storage pool
capacity reduction will result in a significant cost reduction benefit.
•
Cost reduction is the result of data reduction. Deduplication is just one of several methods that TSM
provides for data reduction (such as progressive incremental backup). The goal is overall data
reduction when all of the techniques are combined, rather than just on the deduplication ratio.
•
TSM deduplication is an appropriate data reduction method for many situations, but some
environments may benefit more by using other technologies such as appliance/hardware
deduplication.
This document is intended to provide guidance specific to the use of TSM deduplication. The document does
not provide comprehensive instruction and guidance for the administration of TSM, and should be used in
addition to the TSM product documentation.
1.1 Overview
1.1.1 Description of deduplication technology
Deduplication technology uses a computational technique to detect patterns within data that appear multiple
times within the scope of a collection of data. For the purposes of this document, the collection of data
consists of TSM backup, archive, and HSM data (all of these types of data will be referred to as “backup
data” throughout this document). The patterns that are detected are represented as a hash value that is
much smaller than the original pattern, specifically 20 bytes. Except for the original instance of the pattern,
subsequent instances of the chunk are referenced by the hash value. As a result, for a pattern that appears
many times throughout a given collection of data, significant reduction in storage can be achieved.
Unlike compression, deduplication can take advantage of a pattern that occurs multiple times within a
collection of data. With compression, a single instance of a pattern is represented by a smaller amount of
data that is used to algorithmically recreate the original data pattern. Compression cannot take advantage of
data redundancy for patterns that reoccur throughout the collection of data, and this significantly reduces the
potential reduction capability. However, compression can be combined with deduplication to take advantage
of both techniques and further reduce the required amount of data storage beyond what would be required by
using just one technique or the other.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 5 of 42
Effective Planning and Use of TSM V6 Deduplication
1.1.1.1 How does TSM perform deduplication
TSM uses a proprietary algorithm to analyze variable sized, contiguous segments of data, called “chunks”, for
patterns that are likely to be duplicated within the same TSM storage pool. This process is explained in more
detail in a later section in this document.
The implementation of TSM deduplication only applies to the FILE device class (sequential-access disk)
storage pools, and can be used with primary, copy, or active-data pools.
1.1.2 Data reduction and data deduplication
Data deduplication creates substantial opportunity for reduction of storage capacity requirements for backup
data. However, it is important to consider deduplication within the context of other data reduction techniques
that are available. When considering the effectiveness of deduplication, the deduplication ratio, or
percentage of reduction is considered to be the ultimate measurement of effectiveness. However, it is more
important to consider overall effectiveness of data reduction, including deduplication and other techniques
that are available, rather than focus exclusively on deduplication effectiveness.
Unlike other backup products, TSM provides a substantial advantage in data reduction through its
incremental-forever technology. Combined with deduplication, compression, exclusion of specified objects,
and appropriate retention policies, TSM provides highly effective data reduction. Therefore, the business
objectives should be clearly defined and understood when considering how to measure data reduction
effectiveness. If reduction of storage and infrastructure costs is the ultimate goal, the focus will be on overall
data reduction effectiveness, with data deduplication effectiveness as one component. The following table
provides a summary of the data reduction technologies that TSM offers:
Client
compression
Incremental
forever
How data
reduction is
achieved
Client
compresses
files
Client only
Client only sends
Eliminates redundant
sends changed changed regions of
data chunks
files
a file
Conserves
network
bandwidth?
Yes
Yes
Data supported
Backup,
archive, HSM, Backup
API
Scope of data
reduction
Redundant data Files that do not Unchanged regions Redundant data from
within same file change between within previously
any data in storage
on client node backups
backed up files
pool
Avoids storing
identical files
renamed, copied, No
or relocated on
client node?
Document:
No
Subfile backup
Yes
Backup
(Windows only)
No
Effective Planning and use of TSM V6 Deduplication
Deduplication
When client-side
deduplication is used.
Backup, archive, HSM,
API (HSM supported
only for server-side
deduplication)
Yes
Date: 08/17/2012
Version: 1.0
Page 6 of 42
Effective Planning and Use of TSM V6 Deduplication
Removes
redundant data for
No
files from different
client nodes?
No
No
Yes
Can be used with
any type of
Yes
storage pool
configuration?
Yes
Yes
No
1.1.3 Server-side and client-side deduplication
TSM provides two options for performing deduplication: client-side and server-side deduplication. Both
methods use the same algorithm to identify redundant data, however the “when” and “where” of the
deduplication processing is different.
1.1.3.1 Server-side deduplication
With server-side deduplication, all of the processing of redundant data occurs on the TSM server, after the
data has been backed up. Server-side deduplication is also called “target-side” deduplication. The key
characteristics of server-side deduplication are:
•
Duplicate data is identified after backup data has been transferred to the storage pool volume.
•
The duplicate identification processing must run regularly on the server, and will consume TSM
server CPU and TSM database resources.
•
Storage pool data reduction is not realized until data from the deduplication storage pool is moved to
another storage pool volume, usually through a reclamation process, but can also occur during a
TSM “MOVE DATA” process.
1.1.3.2 Client-side deduplication
Client-side deduplication processes the redundant data during the backup process on the host system where
the source data is located. The net results of deduplication are virtually the same as with server-side
deduplication, except that the storage savings are realized immediately, since only the unique data needs to
be sent to the server in its entirety. Data that is duplicate requires only a small signature to be sent to the
TSM server. Client-side duplication is especially effective when it is important to conserve bandwidth
between the TSM client and server.
1.1.3.2.1 Client deduplication cache
Although it is necessary for the backup client to “check in” with the server to determine whether a chunk is
unique or a duplicate, the amount of data transfer is small. The client must query the server for each chunk
of data that is processed. The overhead associated with this query process can be reduced substantially by
configuring a cache on the client, which allows previously discovered chunks on the client (during the backup
session) to be identified without a query to the TSM server. For the backup-archive client (including VMware
backup,) it is recommended to always configure a cache when using client-side deduplication. For
applications that use the TSM API, the deduplication cache should not be used due to the potential for
backup failures caused by the cache being out of sync with the TSM server. If multiple, concurrent TSM
client sessions are configured (such as with a TSM for VMWare vStorage backup server), there must be a
separate cache configured for each session.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 7 of 42
Effective Planning and Use of TSM V6 Deduplication
1.1.4 Pre-requisites for configuring TSM deduplication
This section provides general description of pre-requisites when using TSM deduplication. For a complete
list of pre-requisites refer to the TSM administrator documentation.
1.1.4.1 Pre-requisites common to client and server-side deduplication
•
The destination storage pool must be of type “FILE” (sequential disk)
1.1.4.2 Pre-requisites specific to client-side deduplication
When configuring client-side TSM deduplication, the following requirements must be met:
•
The client and server must be at version 6.2.0 or later. The latest maintenance version should
always be used.
•
The client must have the client-side deduplication option enabled (DEDUPLICATION YES).
•
The server must enable the node for client-side deduplication with the DEDUP=CLIENTORSERVER
parameter using either the REGISTER NODE or UPDATE NODE commands.
•
The target storage pool must be a deduplication-enabled storage pool.
•
Files must be bound to a correct management class whose destination is a deduplication-enabled
storage pool.
•
Files must not be excluded from client-side deduplication processing (by default all files are
included). See the “exclude.dedup” client option for details.
•
Files must be larger than 2 KB, and transactions must be below the value that is specified by the
clientdeduptxnlimit option.
The following TSM features are incompatible with TSM client-side deduplication:
•
Client encryption
•
LAN-free/storage agent
•
UNIX HSM client
•
Subfile backup
•
Simultaneous storage pool write
1.1.5 Choosing between TSM deduplication and appliance deduplication
Deduplication of backup data can also be accomplished by using a deduplicating storage device in the TSM
storage pool hierarchy. Virtual Tape Libraries (VTLs) such as IBM’s ProtecTIER and EMC’s Data Domain
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 8 of 42
Effective Planning and Use of TSM V6 Deduplication
provide deduplication capability at the storage device level. NAS devices are also available that provide NFS
or CIFS mounted storage that removes redundant data through deduplication.
A choice should be made between TSM deduplication and storage appliance deduplication. Although it is
possible to use both deduplication techniques together, it would result in inefficient use of resources. For a
deduplicating VTL, the TSM storage pool data would need to be “rehydrated” before moving to the VTL (as
with any tape device), and there would be no data reduction as a result of the TSM deduplication. For a
deduplicating NAS device, a FILE device type could be created on the NAS. However, since the data is
already deduplicated by TSM there would be little to no additional data reduction possible by the NAS device.
1.1.5.1 Factors to consider when deciding between TSM and appliance
deduplication
There are three major factors to consider when deciding which deduplication technology to use:
•
Scale
•
Scope
•
Cost
1.1.5.1.1 Scale
The software implementation of TSM deduplication makes heavy use of TSM database transactions and also
has an impact on daily server processes such as reclamation and storage pool backup. For a specific TSM
server hardware configuration (for example, TSM database disk speed, processor and memory capability,
and storage pool device speeds), there is a practical limit to the amount of data that can be backed up using
deduplication. Deduplication appliances have dedicated resources for deduplication processing and do not
have a direct impact on TSM server performance and scalability. Therefore, if the scale of data to back up
exceeds the recommended maximum of 300TB of source data, then appliance deduplication should be
considered. ”Source data” refers to the original non-deduplicated backup data and all retained versions.
In addition to the scale of data stored, the scale of the daily amount of data backed up will also have a
practical limit with TSM, currently 3-4TB of backup data per day (per TSM instance). Although more data can
be backed up, post-processing such as reclamation (for server-side deduplication) and other operations such
as storage pool backup to tape will be a limiting factor. Deduplicating appliances have far greater throughput
capability due to the dedicated resources for deduplication processing, and are limited only by the throughput
capacity.
1.1.5.1.2 Scope
The scope of TSM deduplication is limited to a single TSM server instance and more precisely within a TSM
storage pool. A single, shared deduplication appliance can provide deduplication across multiple TSM
servers.
1.1.5.1.3 Cost
TSM deduplication functionality is embedded in the product without an additional software license cost. It is
important to consider that hardware resources must be appropriately sized and configured. Additional
expense should be anticipated when planning a TSM server configuration that will be used with
deduplication. However, these additional costs can easily be offset by the savings in disk storage. Also, the
software license costs are reduced when capacity-based pricing is in effect.
Deduplication appliances are priced for the performance and capability that they provide, and generally are
considered more expensive per GB than the hardware requirements for TSM native deduplication. A detailed
cost comparison should be done to determine the most cost-effective solution.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 9 of 42
Effective Planning and Use of TSM V6 Deduplication
1.2 Conditions for effective use of TSM deduplication
Although TSM deduplication provides a cost-effective and convenient method for reducing the amount of disk
storage required for backups, there are specific conditions that can provide the most benefit when using TSM
deduplication. Conversely, there are conditions where TSM deduplication will not be effective and in fact may
reduce the efficiency of a backup operation.
Conditions that lead to effective use of TSM deduplication including the following:
•
Need for reduction of the disk space required for backup storage.
•
Need for remote backups over limited bandwidth connections.
•
Use of TSM node replication for disaster recovery across geographically dispersed locations.
•
Total amount of backup data and data backed up per day are within the recommended limits of less
than 300TB total and 3-4TB per day.
•
Either a disk-to-disk backup should be configured (where the final destination of backup data is on a
deduplicating disk storage pool), or data should reside in the FILE storage pool for a significant time
(e.g., 30 days), or until expiration. The deduplication storage pools should not be used as a
temporary staging pool before moving to tape or another non-deduplicating storage pool since this
can be highly inefficient.
•
Backup data should be a good candidate for data reduction through deduplication. This topic is
covered in greater detail in later sections.
•
High performance disk must be used for the TSM database to provide acceptable TSM deduplication
performance.
1.2.1 Traditional TSM architectures compared with deduplication
architectures
A traditional TSM architecture ingests data into disk storage pools, and moves this data to tape on a frequent
basis to maintain adequate free space on disk for continued ingestion. An architecture that includes
deduplication changes this model to store the primary copy of data in a sequential file storage pool for its
entire life cycle. Deduplication provides enough storage savings to make keeping the primary copy on disk
an affordable possibility.
Tape storage pools still have a place in this architecture for maintaining a secondary storage pool backup
copy for disaster recovery purposes. Other architectures are possible where data remains in deduplicated
storage pools for only a portion of its life cycle, but this requires reconstructing the deduplicated objects and
can defeat the purpose of spending the processing resources that are required to deduplicate the data.
Tip: Avoid architectures where data is moved from a deduplicated storage pool to a non-deduplicated
storage pool, which will force the deduplicated data to be reconstructed and lose the storage savings that
were previously gained.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 10 of 42
Effective Planning and Use of TSM V6 Deduplication
1.2.2 Examples of appropriate use of TSM deduplication
This section contains examples of TSM architectures that can make the most effective use of TSM
deduplication.
1.2.2.1 Deduplication with a secondary storage pool backup architecture
In this example the primary storage pool is a file-sequential disk storage pool configured for TSM
deduplication. The deduplication storage pool is backed up to a tape library copy storage pool.
1.2.2.2 Deduplication with node replication copy
The TSM 6.3 release provides a node replication capability, which allows for an alternative architecture where
deduplicated data is replicated to a second server in an incremental fashion that takes advantage of
deduplication, and avoids reconstructing the data.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 11 of 42
Effective Planning and Use of TSM V6 Deduplication
1.2.2.3 Disk-to-disk backup
“Disk-to-disk backup” refers to the scenario where the preferred backup storage device is disk-based, as
opposed to tape or a virtual tape library (VTL). Disk-based backup has become more popular as the unit cost
of disk storage has fallen. It has also become more common as companies distinguish between backup
data, which is kept for a relatively short amount of time, and archive data, which has long term retention.
Disk-to-disk backup still requires a backup of the storage pool data, and the backup or copy destination may
be tape or disk. However, with disk-to-disk backup, the primary storage pool data remains on disk until it
expires. A significant reduction of disk storage can be achieved if the primary storage pool is configured for
deduplication.
1.2.3 Data characteristics for effective deduplication
When considering the use of TSM deduplication, you should assess whether the characteristics of the
backup data are appropriate for deduplication. A more detailed description of data characteristics for
deduplication is provided in the section on estimating deduplication efficiency. General types of structured
and unstructured data are good candidates for deduplication, but if your backup data consists mostly of
unique binary images or encrypted data, you may wish to exclude these data types from a management class
that uses a deduplicated storage pool.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 12 of 42
Effective Planning and Use of TSM V6 Deduplication
1.3 When is it not appropriate to use TSM deduplication?
TSM deduplication can provide significant benefits and cost savings, but it does not apply to all situations.
The following situations are not appropriate for using TSM deduplication:
1.3.1 Primary storage of backup data is on VTL or physical tape
Movement to tape requires “rehydration” of the deduplicated data. This takes extra time and requires
processing resources. If regular migration to tape is required, the benefits of using TSM deduplication may
be reduced, since the goal is to reduce disk storage as the primary location of the backup data.
1.3.2 No flexibility with the backup processing window
TSM deduplication processing requires additional resources, which can extend backup windows or server
processing times for daily backup activities. For example, a duplicate identification process must run for
server-side deduplication. Additional reclamation activity is required to remove the duplicate data from a
storage pool after the duplicate identification processing completes. For client-side deduplication, the client
backup speed will generally be reduced for local clients (remote clients may not be impacted if there is a
bandwidth constraint).
If the backup window has already reached the limit for service level agreements, TSM deduplication could
possibly impact the backup window further unless careful planning is done.
1.3.3 Restore performance considerations
Restore performance from deduplicated storage pools is slower than from a comparable disk storage pool
that does not use deduplication. However, restore from a deduplicated storage pool can compare favorably to
restore from tape devices for certain workloads.
If fastest restore performance from disk is a high priority, then restore performance benchmarking should be
done to determine whether the effects of deduplication can be accommodated. The following table compares
the restore performance of small and large object workloads across several storage scenarios.
Storage pool type
Small object workload
Tape
Typically slower due to tape Typically faster due to
mounts and seeks
streaming capabilities of
modern tape drives
Non-deduplicated disk
Typically faster due to
Comparable to or slightly
absence of tape mounts and slower than tape
quick seek times
Document:
Effective Planning and use of TSM V6 Deduplication
Large object workload
Date: 08/17/2012
Version: 1.0
Page 13 of 42
Effective Planning and Use of TSM V6 Deduplication
Deduplicated disk
Document:
Faster than tape, slower
Slowest since data must be
than non-deduplicated disk rehydrated, when compared to
tape which is fast for
streaming large objects that
are not spread across many
tapes.
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 14 of 42
Effective Planning and Use of TSM V6 Deduplication
2 Resource requirements for TSM deduplication
TSM deduplication provides significant benefits as a result of its data reduction technology, particularly when
combined with other data reduction techniques available with TSM. However, the use of deduplication in
TSM adds additional requirements for hardware and database/log storage, which are essential for a
successful implementation. When configuring TSM to use deduplication, you must ensure that proper
resources have been allocated to support the use of the technology. The resources include hardware
requirements necessary to meet the additional processing performed during deduplication, additional storage
requirements for handling the TSM database records used to store the deduplication catalog, and additional
storage requirements for the TSM server database logs.
The TSM internal database plays a central role in enabling the deduplication technology. Deduplication
requires additional database capacity to be available. In addition, there is a significant increase in the
frequency of references to records in the database during many TSM operations including backup, restore,
duplicate identification, and reclamation. These demands on the database require that the database disk
storage be capable of sustaining higher rates of I/O operations than would be required without the use of
deduplication.
As a result, planning for resources used by the TSM database is critical for a successful deduplication
deployment. This section guides you through the estimation of resource requirements to support TSM
deduplication.
2.1 Database and log size requirements
2.1.1 TSM database capacity estimation
Use of TSM deduplication significantly increases the capacity requirements of the TSM database. This
section provides some guidelines for estimating the capacity requirements of the database. It is important to
plan ahead for the database capacity so an adequate amount of higher-performing disk can be reserved for
the database (refer to the next section for performance requirements).
The estimation guidelines are approximate, since actual requirements will depend on many factors including
ones that cannot be predicted ahead of time (for example, a change in the data backup rate, the exact
amount of backup data, and other factors).
2.1.1.1 Planning database space requirements
The use of deduplication in TSM requires more storage space in the TSM server database than without the
use of deduplication. One important point to note is that when using deduplication, the TSM database grows
proportionally to the amount of data that is stored in deduplicated storage pools. This is because each
“chunk” of data that is stored in a deduplicated storage pool is referenced by an entry in the database.
Without deduplication, each backed-up object (typically a file) is referenced by a database entry, and the
database grows proportionally to the number of objects that are stored. With deduplication, the database
grows proportionally to the total amount of data backed up.
The document Determining the impact of deduplication on TSM server database and storage pools provides
detailed information for estimating the amount of disk storage that will be required for your TSM database.
The document provides formulas for estimating database size based on the volume of data to be stored.
As a simplified rule-of-thumb for taking a rough estimate, you can plan for 150GB of database storage for
every 10TB of data that will be protected in deduplicated storage pools.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 15 of 42
Effective Planning and Use of TSM V6 Deduplication
2.1.2 TSM database log size estimation
The use of deduplication adds additional requirements for the TSM server database, active log, and archive
log storage. Properly sizing the storage capacity for these components is essential for a successful
implementation of deduplication.
2.1.2.1 Planning active log space requirements
The database active log stores information about database transactions that are in progress. With
deduplication, transactions can run longer, requiring more space to store the active transactions.
Tip: Use the maximum allowed size for the active log which is 128GB.
2.1.2.2 Planning archive log space requirements
The archive log stores older log files for completed transactions until they are cleaned up as part of the TSM
server database backup processing. The file system holding the archive log must be given sufficient capacity
to avoid running out of space, which can cause the TSM server to be halted. Space is freed in the archive
log every time a full backup is performed of the TSM server’s database.
See the document on Sizing the TSM archive log for detailed information on how to carefully calculate the
space requirements for the TSM server archive log.
Tip: A file system with 500GB of free space has proven to be more than adequate for a large-scale TSM
server that ingests several terabytes a day of new data into deduplicated storage pools and performs a full
TSM database backup once a day.
2.2 Estimating capacity for deduplicated storage pools
TSM deduplication ratios typically range from 2:1 (50% reduction) to 15:1 (93% reduction), and is data
dependent. Lower ratios are associated with backups of unique data (e.g., such as progressive incremental
data), and higher ratios are associated with backups that are repeated, such as repeated full backups of
databases or virtual machine images. Mixtures of unique and repeated data will result in ratios within that
range. If you aren't sure of what type of data you have and how well it will reduce, use 3:1 for planning
purposes when comparing with non deduplicated TSM storage pool occupancy. This ratio corresponds to an
overall data reduction ratio of over 15:1 when factoring in the data reduction benefits of progressive
incremental backups.
2.2.1 Estimating storage pool capacity requirements
2.2.1.1 Delayed release of storage pool data
Due to the latency for deletion of data chunks with multiple references, there is a need for “transient” storage
associated with data chunks that must remain in a storage pool volume even though their associated file or
object is deleted or expired. As a result of this behavior, storage pool capacity sizing must account for some
percentage of data that is retained because of references by other objects. This latency results in the delayed
deletion of a storage pool volume if it contains a single chunk that is still being referenced.
2.2.1.2 Delayed effect of post-identification processing
Storage reduction does not always occur immediately with TSM deduplication. In the case of server-side
deduplication, sufficient storage pool capacity is required to ingest the full amount of daily backup data. With
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 16 of 42
Effective Planning and Use of TSM V6 Deduplication
server-side deduplication, removal of redundant data does not occur until after storage pool reclamation
completes, which in turn may not complete until after a storage pool backup is done. If client-side
deduplication is used, this delay will not apply. Sufficient storage pool free capacity must be maintained to
accommodate continued backup ingestion.
2.2.1.3 Estimating storage pool capacity requirements
You can roughly estimate storage pool capacity requirements for a deduplicated storage pool using the
following technique:
•
Estimate the base size of the source data
•
Estimate the daily backup size, using an estimated change and growth rate
•
Determine retention requirements
•
Estimate the total amount of source data by factoring in the base size, daily backup size, and
retention requirements.
•
Apply the deduplication ratio factor
•
Uplift the estimate to consider transient storage pool usage
The following example illustrates the estimation method:
Parameter
Value
Notes
Base size of the source data
40TB
Estimated daily change rate
2%
Retention requirement
Estimated deduplication ratio
Uplift for “transient” storage pool volumes
Data from all clients that will be
backed up to the deduplicated
storage pool.
Includes new and changed data
30 days
3:1
3:1 assumes compression is used
with client-side deduplication
30%
Computed Values:
Parameter
Base source data
Computation
Result
40TB
40TB
Estimated daily backup amount
40TB * 0.02 change rate
0.8TB
Total changed data retained
30 * 0.8TB daily backup
24TB
40TB base data + 24TB retained
64TB
Total data retained
Retained data after deduplication (3:1 ratio)
Uplift for delays in chunk deletion (30%)
Add full daily backup amount
Round up: Storage pool capacity requirement
Document:
Effective Planning and use of TSM V6 Deduplication
64TB/3
21.3TB
21.3TB * 1.3
27.69TB
27.69TB + 0.8TB
28.49TB
29TB
Date: 08/17/2012
Version: 1.0
Page 17 of 42
Effective Planning and Use of TSM V6 Deduplication
2.3 Hardware recommendations and requirements
The use of deduplication requires additional processing, which increases the TSM server hardware
requirements beyond what is required without the use of deduplication. The most critical hardware
requirement when using deduplication is the I/O capability of the disk system that is used for the TSM
database.
You should begin by understanding the base hardware recommendations for the TSM server, which are
described in the following documents: AIX, HPUX, Linux x86, Linux on Power, Linux on system Z, Solaris,
Windows.
Additional hardware recommendations are made in the TSM Version 6 deployment guide: TSM V6
Deployment Recommendations
2.3.1.1 Database I/O requirements
For optimal performance, fast disk storage is always recommended for the TSM database as measured in
terms of Input/Output Operations Per Second (IOPS). Due to the random access I/O patterns of the TSM
database, minimizing the latency of operations that access the database volumes is critical for optimizing the
performance of the TSM server. The large tables used for storing deduplication information in the TSM
database bring about an even more significant demand for disk storage that can handle a large number of
IOPS.
In general, systems based on solid-state disk technology and SAS/FC provide the best capabilities in terms
of increased IOPS. Because the claims of disk manufacturers are not always reliable, we recommend
measuring actual IOPS of a disk system before implementing a new TSM database.
Details about how to configure high performing disk storage are beyond the scope of this document. The
following key points should be considered when configuring disk storage for the TSM database:
•
The disk used for the TSM database should be configured according to best practices for a
transactional database.
•
Low-latency, enterprise-class disk devices or storage subsystems should be used for the TSM
database.
•
Disk devices or storage systems that are capable of a minimum of approximately 3000 IOPS are
suggested for the TSM Database disk device. An additional 1000 IOPS per TB
of daily ingested data (pre-deduplication) should be considered. Lower-performing disk devices can
be used, but performance may not be optimal. Refer to the Deduplication FAQ's for an example
configuration.
•
Disk I/O should be distributed over as many disk devices and controllers as possible.
•
TSM database and logs should be configured on separate disk volumes (LUNS), and should not
share disk volumes with the TSM storage pool or any other application or file system.
2.3.1.2 CPU
The use of deduplication requires additional CPU resources on the TSM server, particularly for performing
the task of duplicate identification. You should consider using a minimum of at least 8 (2.2Ghz or equivalent)
processor cores in any TSM server that is configured for deduplication.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 18 of 42
Effective Planning and Use of TSM V6 Deduplication
2.3.1.3 Memory
For the highest performance of a large-scale TSM server using deduplication, additional memory is
recommended. The memory is used to optimize the frequent lookup of deduplication chunk information
stored in the TSM database.
A minimum of 64GB of system memory should be considered for TSM servers using deduplication. If the
retained capacity of backup data grows, the memory requirement may need to be as high as 128GB. It is
beneficial to monitor memory utilization on a regular basis to determine if additional memory is required.
2.3.2 Hardware requirements for TSM client deduplication
Client-side deduplication (and compression if used with deduplication) requires resources on the client
system for processing. Prior to deciding to use client-side deduplication you should verify that client systems
have adequate resources available during the backup window to perform the deduplication processing. A
suggested minimum CPU requirement is the equivalent of one 2.2ghz CPU core per backup process with
client-side deduplication. As an example, a system with a single-socket, quad-core, 2.2Ghz processor that is
utilized 75% or less during the backup window would be a good candidate to use client-side deduplication
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 19 of 42
Effective Planning and Use of TSM V6 Deduplication
3 Implementation guidelines
A successful implementation of TSM deduplication requires careful planning in the following areas:
•
Implementing an appropriate architecture suitable for using deduplication
•
Properly sizing your TSM server hardware and storage
•
Configuring TSM following best practices for separating data ingestion and data maintenance
tasks
3.1 Deciding between client and server deduplication
After you decide on an architecture using deduplication for your TSM server, you need to decide whether you
will perform deduplication on the TSM clients, the TSM server, or using a combination of the two. The TSM
deduplication implementation allows storage pools to manage deduplication performed by both clients and
the TSM server. The server is optimized to only perform deduplication on data that has not been
deduplicated by the TSM clients. Furthermore, duplicate data can be identified across objects regardless of
whether the deduplication is performed on the client or server. These benefits allow for hybrid configurations
that efficiently apply client-side deduplication to a subset of clients, and use server-side deduplication for the
remaining clients.
Typically a combination of both client-side and server-side data deduplication is the most appropriate. Here
are some further points to consider:
•
Server-side deduplication is a two-step process of duplicate data identification followed by
reclamation to remove the duplicate data. Client-side deduplication stores the data directly in a
deduplicated format, reducing the need for the extra reclamation processing.
•
Deduplication on the client can be combined with compression to provide the largest possible storage
savings.
•
Client-side deduplication processing can increase backup durations. Expect increased backup
durations if network bandwidth is not restrictive. A doubling of backup durations is a reasonable
estimate when using client-side deduplication in an environment that is not constrained by the
network.
•
Client-side deduplication can place a significant load on the TSM server in cases where a large
number of clients are simultaneously driving deduplication processing. The load is a result of the
TSM server processing duplicate chunk queries from the clients. Server-side deduplication, on the
other hand, typically has a relatively small number of identification processes running in a controlled
fashion.
•
Client-side deduplication cannot be combined with LAN-free data movement using the Tivoli Storage
Manager for SAN feature. If you are implementing one of TSM’s supported LAN-free to disk
solutions, then you can still consider using server-side deduplication.
Tips:
Perform deduplication at the client in combination with compression in the following circumstances:
1. Your backup network speed is a bottleneck.
2. Increased backup durations can be tolerated, and the maximum storage savings is more important
than having the fastest possible backup elapsed times.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 20 of 42
Effective Planning and Use of TSM V6 Deduplication
3. The client does not typically send objects larger than 500GB in size, or client configuration options
can be used to break up large objects into smaller objects. These options are discussed in a later
section.
3.2 TSM Deduplication configuration recommendations
3.2.1 Recommendations for deduplicated storage pools
The TSM deduplication feature is turned on at the storage pool level. The TSM server can be configured with
more than one deduplicated storage pool, but duplicate data will not be identified across different storage
pools. In most cases, using a single large deduplicated storage pool is recommended.
The following commands provide an example of setting up a deduplicated storage pool on the TSM server.
Some parameters are explained in further detail to give the rationale behind the values used, and later
sections build upon those settings.
3.2.1.1 Device class
A device class is used to define the storage that will be used for sequential file volumes by the deduplicated
storage pool. Each of the directories specified should be backed by a separate file system, which
corresponds to a distinct logical volume on the disk storage subsystem. By using multiple directories backed
by different storage elements on the subsystem, the TSM round-robin implementation for volume allocation is
able to achieve more throughput by spreading I/O across a large pool of physical disks.
Here are some considerations for parameters with the DEFINE DEVCLASS command:
•
The mountlimit parameter limits the number of volumes that can be simultaneously mounted by all
storage pools that use this device class. Typically client sessions sending data to the server use the
most mount points, so you will want to set this parameter high enough to handle the expected
number of simultaneous client sessions.
This parameter needs to be set very high for deduplicated storage pools to avoid having client
session and server processes waiting for available mount points. The setting is influenced by the
numopenvolsallowed option, which is discussed in a later section. To estimate the setting of this
option, use the following formula where numprocs is the largest number of processes used for a data
copy/movement task such as reclamation and migration:
mountlimit = (numprocs * numopenvolsallowed) + max_backup_sessions +
(restore_sessions * numopenvolsallowed) + buffer
•
The maxcapacity parameter controls the size of each file volume that will be created for your storage
pool. This parameter takes some planning. The goal is to avoid too small of a volume size, which
will result in frequent end-of-volume processing and spanning of larger objects across multiple
volumes, and also to avoid volume sizes that are too large to ensure that enough writeable volumes
are available to handle your expected number of client backup sessions. The following example
shows a volume size of 100GB, which has proven to be optimal in many environments.
> define devclass dedupfile devtype=file mountlimit=150 maxcapacity=102400m
directory=/tsmdedup1,/tsmdedup2,/tsmdedup3,/tsmdedup4,/tsmdedup5,/tsmdedup6,/tsm
dedup7,/tsmdedup8
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 21 of 42
Effective Planning and Use of TSM V6 Deduplication
3.2.1.2 Storage pools
The storage pool is the repository for deduplicated storage and uses the device class previously defined. An
example command for defining a deduplicated storage pool is given below, with explanations for parameters
that vary from defaults. There are two methods for allocating volumes in a file-based storage pool. With the
first method, volumes are pre-allocated and remain assigned to the same storage pool after they are
reclaimed. The second method uses scratch volumes, which are allocated as needed, and return to the
scratch pool once they are reclaimed. The examples below set up a storage pool using scratch volumes as
this approach is more convenient and has shown in testing to more efficiently distribute the load across
multiple storage containers within a disk subsystem.
•
The deduplicate parameter is required to enable deduplication for the storage pool.
•
The maxscratch parameter defines the maximum number of volumes that can be created for the
storage pool. This parameter is used when using the scratch method of volume allocation, and
should otherwise be set to a value of 0 when using pre-allocated volumes. Each volume will have a
size determined by the maxcapacity parameter for the device class. In our example, 100 volumes
multiplied by 100GB per volume, requires that 10TB of free space be available across the eight file
systems used by the device class.
•
The identifyprocess parameter is set to 0 to prevent duplicate identification processes from starting
automatically. This supports scheduling when duplicate identification runs, which is described in
more detail in a later section.
•
The reclaim parameter is set to 100 to prevent automatic storage pool reclamation from running.
This supports the best practice of scheduling when reclamation runs, which is described in more
detail in a later section. The actual threshold used for reclamation is defined as part of the scheduled
reclamation command which is defined in a later section.
•
The reclaimprocess parameter is set to a value higher than the default of 1 since a deduplicated
storage pool requires a large volume of reclamation processing to keep up with the daily ingestion of
new backups. The suggested value of 8 is likely be sufficient for large-scale implementations, but
you may need to further increase this setting.
> define stgpool deduppool dedupfile maxscratch=100 deduplicate=yes
identifyprocess=0 reclaim=100 reclaimprocess=8
3.2.1.3 Policy settings
The final configuration step involves defining policy settings on the TSM server that allow data to ingest
directly into the deduplicated storage pool that has been created. Policy requirements vary for each
customer, but the following example shows policy that retains extra backup versions for 30 days.
> define domain DEDUPDISK
> define policy DEDUPDISK POLICY1
> define mgmtclass DEDUPDISK POLICY1 STANDARD
> assign defmgmtclass DEDUPDISK POLICY1 STANDARD
> define copygroup DEDUPDISK POLICY1 STANDARD type=backup destination=DEDUPPOOL
VEREXISTS=nolimit VERDELETED=10 RETEXTRA=30 RETONLY=80
> define copygroup DEDUPDISK POLICY1 STANDARD type=archive destination=DEDUPPOOL
RETVER=30
> activate policyset DEDUPDISK POLICY1
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 22 of 42
Effective Planning and Use of TSM V6 Deduplication
3.2.2 Recommended options for deduplication
The server has several tuning options that control deduplication processing. The following table summarizes
these options, and provides an explanation for those options for which we recommend overriding the default
values.
Allowed
values
Option
Recommended
value
Explanation
Default
DedupRequiresBackup
This option delays the completion of server-side
deduplication processing until after a secondary
copy of the data has been made with storage
pool backup.
The use of copy storage pools is optional, so in
cases where there will be no secondary copy
storage pool or when node replication will be
used for the secondary copy, this option should
be set to No.
Yes | No
Default: Yes
The TSM server offers many levels of
protection, including the ability to create a
secondary copy of your data. Creating a
secondary copy is optional, but is always a best
practice for any storage pool regardless of
whether it is deduplicated.
Default
Specifies the largest object size in gigabytes
that can be processed using client-side
deduplication. This can be increased up to
1TB, but this does not guarantee that the TSM
server will be able to process objects up to this
size in all environments.
Default
Specifies the largest object size in gigabytes
that can be processed using server-side
deduplication. This can be increased up to
2TB, but this does not guarantee that the TSM
server will be able to process objects up to this
size in all environments.
Default
Changing the default tier settings is not
recommended. Small changes may be
tolerated, but avoid frequent changes to these
settings, as changes will prevent matches
between previously ingested backups and
future backups.
Default
Changing the default tier settings is not
recommended. Small changes may be
tolerated, but avoid frequent changes to these
settings, as changes will prevent matches
between previously ingested backups and
future backups.
Min: 32
ClientDedupTxnLimit
Max: 1024
Default: 300
Min: 32
ServerDedupTxnLimit
Max: 2048
Default: 300
Min: 20
DedupTier2FileSize
Max: 9999
Default: 100
Min: 90
DedupTier3FileSize
Max: 9999
Default: 400
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 23 of 42
Effective Planning and Use of TSM V6 Deduplication
20
This option controls the number of volumes that
a process such as reclamation or client
sessions can hold open at the same time. A
small increase to this option is recommended,
and some trial and error may be needed. Note:
The device class mount limit parameter may
need to be increased if this option is increased.
Default
If you are using NDMP backup of NetApp file
servers in your environment, change this option
to Yes.
Min: 3
NumOpenVolsAllowed
Max: 999
Default: 10
EnableNasDedup
Yes | No
Default: No
3.2.3 Best practices for ordering backup ingestion and data
maintenance tasks
A successful implementation of deduplication with TSM requires separating the tasks of ingesting client data
and performing server data maintenance tasks into separate time windows. Furthermore, the server data
maintenance tasks have an optimal ordering, and in some cases need to be performed without overlap to
avoid resource contention problems.
TSM has the ability to schedule all of these activities to follow these best practices. The recommended
ordering is explained below, along with sample commands to implement these tasks through scheduling.
Consider using the following task sequence. Please note that the list focuses on those tasks pertinent to
deduplication. Please consult the product documentation for additional commands which you may also need
to include in the daily maintenance tasks.
1. Client data ingestion.
2. Create the secondary disaster recovery (DR) copy using the BACKUP STGPOOL command
(optional).
3. The following tasks can run in parallel:
a. Perform server-side duplicate identification by running the IDENTIFY DUPLICATES
command. This processes data that was not already deduplicated on the clients.
b. Create a DR copy of the TSM database by running the BACKUP DATABASE command.
Following the completion of the database backup, the DELETE VOLHISTORY command can
be used to remove older versions of database backups which are no longer required.
4. Perform node replication to create a secondary copy of the ingested data to another TSM server
using the REPLICATE NODE command (optional).
5. Reclaim unused space from storage pool volumes that has been released through deduplication and
inventory expiration using the RECLAIM STGPOOL command.
6. Backup the volume history and device configuration using BACKUP VOLHISTORY and BACKUP
DEVCONFIG commands.
7. Remove objects that have exceeded their allowed retention using the EXPIRE INVENTORY
command.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 24 of 42
Effective Planning and Use of TSM V6 Deduplication
3.2.3.1 Define scripts that run each required maintenance task
The following scripts, once defined, can be called by scheduled administrative commands. Here are a few
points to note regarding these scripts:
•
The storage pool backup script assumes you have already defined a copy storage pool named
copypool, which uses tape storage.
•
The database backup script requires a device class that typically also uses tape storage.
•
The script for reclamation gives an example of how the parallel command can be used to
simultaneously process more than one storage pool.
•
The number of processes to use for identifying duplicates should not exceed the number of CPU
cores available on your TSM server. This command also does not have a wait=yes parameter, so it
is necessary to define a duration limit.
•
If you have a large TSM database, you can further optimize the BACKUP DATABASE command by
using multiple streams with TSM 6.3 and later.
•
A deduplicated storage pool is typically reclaimed to a threshold lower than the default of 60 to allow
more of the identified duplicate chunks to be removed. Some experimenting will be needed to find a
value that can be completed within the available time. Tip: A reclamation setting of 40 or less is
usually sufficient.
define script STGBACKUP "/* Run stg pool backups */"
update script STGBACKUP "backup stgpool DEDUPPOOL copypool maxprocess=10
wait=yes" line=020
define script DEDUP "/* Run identify duplicate processes */"
update script DEDUP "identify duplicates DEDUPPOOL numprocess=6 duration=660"
line=010
set dbrecovery TAPEDEVC
define script DBBACKUP "/* Run
update script DBBACKUP "backup
update script DBBACKUP "backup
update script DBBACKUP "backup
update script DBBACKUP "delete
totime=now" line=040
define
update
update
update
define
update
script
script
script
script
script
script
DB backups */"
db devclass=TAPEDEVC type=full wait=yes" line=010
volhistory" line=020
devconfig" line=030
volhistory type=dbbackup todate=today-7
RECLAIM "/* Run stg pool reclamation */"
RECLAIM "parallel" line=010
RECLAIM "reclaim stgpool DEDUPPOOL threshold=40 wait=yes" line=020
RECLAIM "reclaim stgpool COPYPOOL threshold=60 wait=yes" line=030
EXPIRE "/* Run expiration processes. */"
EXPIRE "expire inventory resources=8 wait=yes" line=010
3.2.3.2 Define schedules to run the data maintenance tasks
The TSM server has the ability to schedule commands to run, where the scheduled action is to run the
various scripts that were defined in the previous sections. The examples below give specific start times that
have proven to be successful in environments where backups run from midnight until 07:00 AM on the same
day. You will need to change the start times to appropriate values for your environment.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 25 of 42
Effective Planning and Use of TSM V6 Deduplication
define schedule STGBACKUP type=admin cmd="run STGBACKUP" active=yes \
desc="Run all stg pool backups." startdate=today starttime=08:00:00 \
duration=15 durunits=minutes period=1 perunits=day
define schedule DEDUP type=admin cmd="run DEDUP" active=no \
desc="Run indentify duplicates." startdate=today starttime=11:00:00 \
duration=15 durunits=minutes period=1 perunits=day
define schedule DBBACKUP type=admin cmd="run DBBACKUP" active=yes \
desc="Run database backup." startdate=today starttime=12:00:00 \
duration=15 durunits=minutes period=1 perunits=day
define schedule RECLAIM type=admin cmd="run RECLAIM" active=yes \
desc="Reclaim space from storage pools." startdate=today starttime=14:00 \
duration=15 durunits=minutes period=1 perunits=day
define schedule EXPIRATION type=admin cmd="run expire" active=yes \
desc="Run expiration." startdate=today starttime=18:00:00 \
duration=15 durunits=minutes period=1 perunits=day
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 26 of 42
Effective Planning and Use of TSM V6 Deduplication
4 Estimating deduplication savings
If you ask someone in the data deduplication business to give you an estimate of the amount of savings to
expect for your specific data, the answer will often be “it depends.” The reality is that TSM, like every other
data protection product, cannot guarantee a certain level of deduplication because there are a variety of
factors unique to your data that influence the results.
Since deduplication requires computational resources, it is important to consider which environments and
circumstances can benefit most from deduplication, and when other data reduction techniques may be more
appropriate. What we can do is provide an understanding of the factors that influence deduplication
effectiveness when using TSM, and provide some examples of observed behaviors for specific types of data,
which can be used as a reference for planning purposes.
4.1 Factors that influence the effectiveness of deduplication
The following are factors that have an influence on how effectively TSM reduces the amount of data to be
stored using deduplication.
4.1.1 Characteristics of the data
4.1.1.1 Uniqueness of the data
The first factor to consider is the uniqueness of the data. Much of deduplication savings come from repeated
backups of the same objects. Some savings, however, result from having data in common with backups of
other objects or even within the same object. The uniqueness of the data is the portion of an object that has
never been stored by a previous backup. Duplicate data can be found within the same object, across
different objects stored by the same client, and from objects stored by different clients.
4.1.1.2 Response to fingerprinting
The next factor is how data responds to the deduplication fingerprinting processing used by TSM. During
deduplication, TSM breaks objects into chunks, which are examined to determine whether they have been
previously stored. These chunks are variable in size and are identified using a process called fingerprinting.
The purpose of fingerprinting is to ensure that the same chunk will always be identified regardless of whether
it shifts to different positions within the object between successive backups.
The TSM fingerprinting implementation uses a probability-based algorithm for identifying chunk boundaries
within an object. The algorithm strives to have all of the chunks created for an object average out in terms of
size to a target average for all chunks. The actual size of each chunk is variable within the constraints that it
must be larger than the minimum chunk size and cannot be larger than the object itself. The fingerprinting
implementation results in average chunk sizes that vary for different kinds of data. For data that fingerprints
to average chunk sizes significantly larger than the target average, the deduplication efficiency is more
sensitive to changes. More details are given in the later section that discusses tiering.
4.1.1.3 Volatility of the data
The final factor is the volatility of the data. A significant amount of deduplication savings is a result of the fact
that similar objects are backed up repeatedly over time. Objects that undergo only minor changes between
backups will end up having a significant percentage of chunks that are unchanged since the last backup and
hence do not need to be stored again. Likewise, an object can undergo a pattern of change that alters a
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 27 of 42
Effective Planning and Use of TSM V6 Deduplication
large percent of the chunks in the object. In these cases, there is very little savings realized by deduplication.
It is important to note that this effect does not necessarily relate to the amount of data being written to an
object. Instead, it is a factor of how pervasively the changes are scattered throughout the object. Some
change patterns, such as appending new data at the end of an object, have a very favorable response with
deduplication.
4.1.1.4 Examples of workloads that respond well to deduplication
The following are general examples of backup workloads that respond well to deduplication:
•
Backup of workstations with multiple copies or versions of the same file.
•
Backup of objects with regions that repeat the same chunks of data (for example, regions with zeros).
•
Multiple full backups of different versions of the same database.
•
Operating system files across multiple systems. For example, Windows systemstate backup is a
common source of duplicate data. Another example is virtual machine image backups with TSM for
Virtual Environments.
•
Backup of workstations with versions or copies of the same application data (for example,
documents, presentations, or images).
•
Periodic full backups taken of systems using a new nodename for the purposes of creating a out of
cycle backup with special retention criteria.
4.1.1.5 Deduplication efficiency of some data types
The following table shows some common data types along with their expected deduplication efficiency.
Data type
Deduplication efficiency
Audio (mp3, wma), Video (mp4), Images (jpeg)
Poor
Human generated/consumer data: text documents, source
code
Good
Office documents – spreadsheets, presentations
Poor
Common operating system files
Good
Large repeated backups of databases (Oracle, DB2, etc)
Good
Objects with embedded control structures
Poor
TSM data stored in non-native storage pools (for example,
NDMP data)
None
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 28 of 42
Effective Planning and Use of TSM V6 Deduplication
4.1.2 Impacts from backup strategy decisions
The gains realized from deduplication are also influenced by two different implementation choices in how
backups are taken and managed.
4.1.2.1 Backup model
For TSM, a very common backup model is the use of incremental-forever backups. In this case, each
subsequent backup achieves significant storage savings by not having to send unchanged objects. These
objects that are not re-sent also do not need to go through deduplication processing, which turns out to be a
very efficient method of reducing data. On the other hand, other data types use a backup model that always
runs a full backup, or a periodic full backup. In these cases, there will typically be significant reductions in the
data to be stored, which is a result of the significant duplication across subsequent backups of the similar
objects. The following table illustrates some examples of deduplication savings between full and incremental
backup models:
Does deduplication offer
savings in the case
where ….
Full backup
Incremental backup
Yes when:
•
File-level backups are taken
using the backup-archive
client.
•
There is data in common
from other nodes such as
operating system files
Yes for files that are being re-sent due
to changes (depends on volatility)
Periodic full backups are
taken for a system. This is No for new files that are being sent for
occasionally performed using the first time (depends on uniqueness)
a different node name for the
purpose of establishing a
different retention scheme
Yes when:
•
Subsequent full backups are Typically no. The database
taken (depends on volatility) incremental mechanism is only
Database backups are taken
sending changed regions of the object,
using a data protection client. No when:
which typically have not been stored
• The first backup is taken.
before.
Databases are typically
unique
Virtual machine backups are
taken using the Data
Protection for VMware
product.
Yes. VMware full backups often
experience savings with matches
from the backups of other virtual
machines, as well as from regions
from the same virtual disk that are in
common.
4.1.2.2 Retention settings
In general, the more versions you set TSM policy to retain, the more savings you will realize from TSM
deduplication as a percentage of the total you would have needed to store without deduplication. Users who
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 29 of 42
Effective Planning and Use of TSM V6 Deduplication
desire to retain more versions of objects in TSM storage find this to be more cost effective when using
deduplication. Consider the example below, which shows the accumulated storage used over a series of
backups using the Data Protection for Oracle product. You can see that ten backup versions are stored with
deduplication using less capacity than three backup versions require without deduplication.
Cumulative Data Stored (120 GB Oracle backups)
75% reduction
1400000.0
Data Stored(MB)
1200000.0
1000000.0
800000.0
600000.0
Dedup
No Dedup
400000.0
200000.0
0.0
b1
b2
b3
b4
b5
b6
b7
b8
b9
b10
Backup number
4.2 Interaction of compression and deduplication
The TSM client provides the ability to compress data with the potential to provide additional storage savings
by combining both compression and deduplication. With TSM deduplication, you will need to decide whether
to perform deduplication at the client, server, or in some combination. This section will guide you through the
analysis that should happen in making that decision, taking into consideration the fact that combining
deduplication and compression is only possible on the clients.
4.2.1 How deduplication and compression interact with TSM
In general, deduplication technologies are not very effective when applied to data that is previously
compressed. However, by compressing data after it is already deduplicated, additional savings can be
gained. When deduplication and compression are both performed by the TSM client, the operations are
sequenced in the desirable order of first applying deduplication, followed by compression. The following list
summarizes key points of the TSM implementation, which will help explain other information to follow in this
section:
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 30 of 42
Effective Planning and Use of TSM V6 Deduplication
•
The TSM client can perform deduplication combined with compression.
•
The TSM server can perform deduplication, but cannot perform compression.
•
If data is compressed prior to being passed to the TSM client, it is not possible to perform
deduplication prior to compression. For example, certain databases provide the ability to compress a
backup stream prior to passing the stream to a Tivoli for Data Protection client. In these cases, the
data will be compressed prior to TSM performing deduplication.
The most significant reduction in data size is typically a result of performing the combination of client-side
deduplication and compression. The additional savings provided by compression will vary depending on how
well the specific data responds to the TSM client compression mechanism.
4.2.2 Considerations related to compression when choosing between
client-side and server-side deduplication
Typically, the decision of whether to use data reduction technologies on the TSM client depends on your
backup window requirements, and whether your environment is network-constrained. With constrained
networks, using data reduction technologies on the client may actually improve backup elapsed times.
Without a constrained network, the use of client-side data reduction technologies will typically result in longer
backup elapsed times. The following questions are important to consider when choosing whether to
implement client-side data reduction technologies:
1. Is the speed of your backup network limiting backup elapsed times?
2. What is more important to your business: the amount of storage savings you achieve through data
reduction technologies, or how quickly backups complete?
If the answer to the first question is yes, using data reduction technologies on the client may result in both
faster backups and increased storage savings on the TSM server. More often, the answer to this question is
no, in which case you need to weigh the trade-offs between having the fastest possible backup elapsed
times, and gaining the maximum amount of storage pool savings.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 31 of 42
Effective Planning and Use of TSM V6 Deduplication
Reduction Savings
Client Dedup with compression vs Server Dedup with
compression (20GB object, 1% change rate)
100.0%
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
CliDedup+comp
Dedup only
ServDedup+comp
Comp only
b1
b2
b3
b4
b5
b6
b7
b8
b9 b10
Backup number
Totals after ten
backups
Stored
Saved
%reduced
Elapsed Time
(seconds)
200 GB
--
--
402
105.8 GB
94.2 GB
47.1%
665 (1.7x)
Client compression +
server dedup
83.5 GB
116.5 GB
58.3%
-
Dedup only
50.0 GB
150.0 GB
75.0%
-
Client dedup +
compression
27.5 GB
172.5 GB
86.2%
618 (1.5x)
No Dedup, no comp
Comp only
The graph above shows a 20GB object going through a series of ten backups. For each of the ten backups,
the object in the same state was run through different data reduction mechanisms in TSM to allow comparing
the behavior of each. The table summarizes the cumulative totals stored and saved for each of the
techniques, along with elapsed times in some cases. The following observations can be made from these
results:
•
The most significant storage savings of 86% is seen with the combination of client-side deduplication
and compression. There is a cost of a 1.5 times increase in the backup elapsed time versus a
backup with no client-side data reduction. The addition of compression provides the additional 11%
savings beyond the 75% that is possible using deduplication alone.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 32 of 42
Effective Planning and Use of TSM V6 Deduplication
•
With compression alone, there is a savings of 47%. This is a fairly typical savings seen with TSM
compression.
•
With deduplication alone (can be either client-side or server-side,) there is a savings of 75%. There
was no savings for the first backup with deduplication alone. This is typical with unique objects such
as databases. The additional savings seen on the initial backup is one area in which compression
provides substantial savings beyond what deduplication provides.
•
Applying server-side deduplication to data that is already compressed by the client results in a lower
58% savings than the 75% that can be achieved using server-side deduplication alone. Caution:
Your application may compress data before it is passed to the TSM client. This will result in a similar
less-efficient deduplication savings. In these cases, it is best to either disable the application
compression, or send this data to a storage pool that does not use deduplication.
The bottom line: For the fastest backups on a fast network, choose server-side deduplication. For the
largest storage savings, choose client-side deduplication combined with compression. Avoid performing
client-compression in combination with server-side deduplication.
4.3 Understanding the TSM deduplication tiering implementation
The deduplication implementation in TSM uses a tiered model where larger objects are processed with larger
average chunk sizes with the goal of limiting the number of chunks that an object will be split into. The tiering
model is used to avoid operational problems that arise when the TSM server needs to operate on objects
consisting of very large numbers of chunks, and also to limit the growth of the TSM database. The use of
larger average chunk sizes has the trade-off of limiting the amount of savings achieved by deduplication. The
TSM server provides three different tiers that are used for different ranges of object sizes.
4.3.1 Controls for deduplication tiering
There are two options on the TSM server that control the object size thresholds at which objects are
processed in tier2 or tier3. All objects with sizes smaller than the tier2 threshold are processed in tier1. By
default, objects under 100GB in size are processed at tier1. Objects in the range of 100GB to under 400GB
are processed in tier2, and all objects 400GB and larger are processed in tier3.
Avoid makings adjustments to the options controlling the deduplication tier thresholds. Changes to the
thresholds after data has been stored can prevent newly stored data from matching data stored in previous
backups, and can also cause operational problems if the changes cause larger objects to be processed in the
lower tiers.
Very large objects can be excluded from deduplication using the options clientdeduptxnlimit and
serverdeduptxnlimit. The storage pool parameter maxsize can also be used to prevent large objects from
being stored in a deduplicated storage pool.
Option
DedupTier2FileSize
Allowed values (GB)
Implications of the default
Minimum: 20
Objects that are smaller the 100GB will be processed in tier1.
Objects 100GB and up to the tier3 setting are processed as
tier2.
Maximum: 9999
Default: 100
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 33 of 42
Effective Planning and Use of TSM V6 Deduplication
Objects that are 400GB and larger are processed in tier3.
Objects that are smaller the 400GB are processed in tier2
down to the tier2 threshold where they are processed with
tier1.
Minimum: 90
DedupTier3FileSize
Maximum: 9999
Default: 400
4.3.2 The impact of tiering to deduplication storage reduction
The chart below gives an example of the impact that tiering has on deduplication savings. For the test below,
the same DB2 database was processed through a series of ten sets of backups with a varying change
pattern applied after each set of backups. For each set of backups, the object in the same state was tested
using the three different deduplication tiers, each being stored in its own storage pool. The table below gives
the cumulative savings for each tier across the ten backups. The following observations can be made:
•
Deduplication is always more effective at reducing data in the lower tiers.
•
The amount of difference in data reduction between the tiers depends on how the objects change
between backups. For data with low volatility, there is less impact to savings from tiering.
•
As a general rule-of-thumb, you can estimate that there will be approximately 17% loss of
deduplication savings as you move through each tier.
Dedup Savings (120GB db2 )
100.0%
90.0%
Dedup Savings
80.0%
70.0%
60.0%
Tier 1
Tier 2
Tier3
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
b1
b2
b3
b4
b5
b6
b7
b8
b9
b10
Backup number
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 34 of 42
Effective Planning and Use of TSM V6 Deduplication
Totals after
ten backups
Stored
Saved
%reduced
No Dedup
1,236.1 GB
--
--
•tier1 70K
Tier1
272.81 GB
963.3 GB
77.9%
•tier2 300K
Tier2
479.03 GB
757.1 GB
61.2%
•tier3 860K
Tier3
689.33 GB
546.8 GB
44.2%
Average chunk size after 10
backups:
4.3.3 Client controls that optimize deduplication efficiency
Controls are available on some TSM client types that prevent objects from becoming too large. This allows
for large objects to be processed as multiple smaller objects which fall into the tier1 range. There is not a
method to accomplish this for every client type, but here are some strategies that have proven effective at
keeping objects within the tier1 threshold:
•
For Oracle database backups, use the RAM MAXPIECESIZE option to prevent any individual object
crossing the tier2 size threshold. More recommendations on this topic follow in a later section.
•
For Microsoft SQL database backups that use the legacy backup API, the database can be split
across multiple streams. Each stream that is used results in a separate object being stored on the
TSM server. A 200GB database, for example, can be backed up with four streams, which results in
approximately four 50GB objects that will all fit within the default tier1 size threshold.
4.4 What kinds of savings can I expect for different application types
No specific guarantee of TSM deduplication data reduction can be made for specific application types. It is
possible to construct an implementation of any of the applications discussed in this section with initial data
and apply changes to that data in such a way that any deduplication system would show poor results. What
we can do, and what is covered in this section, is to provide some examples of how specific implementations
of these applications that undergo reasonable patterns of change respond to TSM deduplication. This
information can be considered to be a likely outcome of using TSM deduplication in your environment. More
specific results for your environment can only be obtained by testing your real data with TSM over a period of
time.
In the sections that follow, sample deduplication savings are given for specific applications that result from
taking a series of backups with TSM. Each of these examples shows results from only using deduplication,
so improved results are possible by combining deduplication and compression. Comparisons across the
three different deduplication tiers are given except for applications where using the higher tiers can be
avoided. Client-side deduplication was used for all of the tests.
There are tables in the following sections that include elapsed times. These are given so that you can make
relative comparisons and should not be considered indicators of the performance you will see. There are
many factors that will influence actual backup elapsed times, including network performance.
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 35 of 42
Effective Planning and Use of TSM V6 Deduplication
4.4.1 IBM DB2
Cumulative Data Stored (120 GB DB2 backups)
Data Stored(MB)
1400000.0
1200000.0
1000000.0
800000.0
600000.0
Tier3
Tier 2
Tier 1
No Dedup
400000.0
200000.0
0.0
1
2
3
4
5
6
7
8
9
10
Backup number
Totals after
ten backups
Document:
Stored
Saved
%reduced
Elapsed Time
(seconds)
No Dedup
1,236.1 GB
--
--
1446
Tier1
272.81 GB
963.3 GB
77.9%
3541 (2.4x)
Tier2
479.03 GB
757.1 GB
61.2%
2955 (2x)
Tier3
689.33 GB
546.8 GB
44.2%
2712 (1.9x)
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 36 of 42
Effective Planning and Use of TSM V6 Deduplication
4.4.2 Microsoft SQL
Cumulative Data Stored (93 GB MS SQL backups)
Data Stored(MB)
1200000.0
1000000.0
800000.0
600000.0
Tier 3
Tier 2
Tier 1
No Dedup
400000.0
200000.0
0.0
b1
b2
b3
b4
b5
b6
b7
b8
b9
b10
Backup number
Totals after
ten backups
Stored
Saved
%reduced
Elapsed time
(seconds)
No Dedup
934.7 GB
--
--
1222
Tier1
199.4 GB
735.3 GB
78.7%
6132 (5x)
Tier2
356.8 GB
577.9 GB
61.8%
3286 (2.7x)
Tier3
463.0 GB
471.7 GB
50.5%
2944 (2.4x)
4.4.3 Oracle
Backups using the Data Protection for Oracle product can achieve similar deduplication storage savings with
the proper configuration. The test results summarized in the following charts only give values for tier 1. The
other tiers were not tested because the RMAN MAXPIECESIZE option can be used to prevent objects from
reaching sizes that require the higher tiers.
The following RMAN settings are recommended when performing deduplicated backups of Oracle databases
with TSM:

Use the maxpiecesize RMAN parameter to keep the objects sent to TSM within the tier 1 size range.
Oracle backups can be broken into multiple objects of a specified size. This allows for databases of
larger sizes to be processed safely with tier1 deduplication processing. The parameter must be set
to a value that is less than the TSM server DedupTier2FileSize parameter (defaults to 100GB).
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 37 of 42
Effective Planning and Use of TSM V6 Deduplication
–

Oracle RMAN provides the capability to multiplex the backups of database filesets across multiple
channels. Using this feature will typically result in less effective TSM deduplication data reduction.
Use the filesperset RMAN parameter to avoid splitting a fileset across multiple channels.
–

Recommended value: A maxpiecesize setting of 10GB provides a good balance between
keeping each piece at an optimal size for handling by the TSM server and having too many
resulting objects.
Recommended value: A filesperset setting of 1 should be used for optimal deduplication data
reduction.
Following is a sample RMAN script, which includes the recommended values for use with TSM
deduplication:
run
{
allocate channel ch1 type 'SBT_TAPE' maxopenfiles=1 maxpiecesize 10G
parms 'ENV=(TDPO_OPTFILE=/home/orc11/tdpo_10g.opt)';
backup filesperset 1 (tablespace tbsp_dd);
release channel ch1;
}
Cumulative Data Stored (120 GB Oracle backups)
75% reduction
1400000.0
Data Stored(MB)
1200000.0
1000000.0
800000.0
600000.0
Dedup
No Dedup
400000.0
200000.0
0.0
b1
b2
b3
b4
b5
b6
b7
b8
b9
b10
Backup number
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 38 of 42
Effective Planning and Use of TSM V6 Deduplication
Totals after
ten backups
No Dedup
Tier1
Stored
Saved
%reduced
Elapsed time
(seconds)
1195.0 GB
--
--
2704
304.6 GB
890.5 GB
74.5%
11135 (4.1x)
4.4.4 VMware
VMware backup using TSM for Virtual Environments is one area that is commonly being deployed using TSM
deduplication. VMware backups are typically showing very substantial savings when the combination of
client-side deduplication and compression is used. The following factors contribute to the substantial savings
that are seen:
•
There is often significant data in common across virtual machines. Part of this is the result of the
same operating systems being installed and cloned across multiple virtual machines.
•
The TSM for Virtual Environments requirement to periodically repeat full backups results in significant
duplication across backup versions.
•
Some duplicate data exists within the same virtual machine on the initial backup.
•
An example savings achieved with VMware backup using the combination of client-side deduplication
and compression is 8 to 1 (87.5%).
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 39 of 42
Effective Planning and Use of TSM V6 Deduplication
5 How to determine deduplication results
It is useful to evaluate the actual data reduction results from TSM deduplication to determine if the expected
storage savings have been achieved. In addition to evaluating the data reduction results, other key
operational factors should be checked, such as database utilization, to ensure that they are consistent with
expectations.
Deduplication results can be determined by various queries to the TSM server from the administrative
command line or Administration Center interface. It is important to recognize the dynamic nature of
deduplication and that the benefits of deduplication are not always realized immediately after data is backed
up. Also, since the scope of deduplication includes multiple backups across multiple hosts, it will take time to
accumulate sufficient data in the TSM storage pool to be effective at eliminating duplicates. Therefore, it is
important to sample results at regular intervals, such as weekly, to obtain a valid report of the results.
In addition to checking data reduction results, TSM provides queries that can show pending activity for
deduplication processing. These queries can be issued to determine an overall assessment of deduplication
processing in the server. A script has been developed to assist administrators with monitoring of
deduplication-related processing. The script source is provided in the appendix of this document.
5.1 Simple TSM Server Queries
5.1.1 QUERY STGPOOL
The QUERY STGPOOL command provides a basic and quick method for evaluating deduplication results.
However, if the query is run prior to reclamation of the storage pool then the “Duplicate Data Not Stored”
value will be inaccurate and not reflect the most recent data reduction.
Example command:
Query stgpool format=detailed
Example output:
Estimated Capacity:
Space Trigger Util:
Pct Util:
Pct Migr:
Pct Logical:
< ... >
Deduplicate Data?:
Processes For Identifying Duplicates:
Duplicate Data Not Stored:
Auto-copy Mode:
Contains Data Deduplicated by Client?:
9,848 G
60.7
60.7
60.7
98.7
Yes
0
28,387 G (87%)
Client
Yes
The displayed value of “Duplicate Data Not Stored” will show the actual reduction of data in megabytes or
gigabytes, and the percentage of reduction of the storage pool. If reclamation has not yet occurred, the
following example shows the pending amount of data that will be removed:
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 40 of 42
Effective Planning and Use of TSM V6 Deduplication
In this example “backuppool-file” is the name of the deduplicating storage pool.
5.1.2 Other server queries affected by deduplication
5.1.2.1 QUERY OCCUPANCY
When a filespace is backed up to a deduplicated storage pool, the “QUERY OCCUPANCY” command will
show the logical amount of storage per filespace. The physical space is displayed as “0.00” as this
information is not able to be determined on an individual filespace basis. An example is shown below:
Early versions of the TSM V6 server incorrectly maintained occupancy records in certain cases, which can
result in an incorrect report of the amount of stored data. The following technote provides information on how
to repair the occupancy information if necessary:
http://www.ibm.com/support/docview.wss?uid=swg21579500
5.2 TSM client reports
When using client-side deduplication, the client summary report will show the data reduction associated with
deduplication as well as compression. An example is shown here:
Total number of objects inspected:
Total number of objects backed up:
Total number of objects updated:
Total number of objects rebound:
Total number of objects deleted:
Total number of objects expired:
Total number of objects failed:
Total objects deduplicated:
Total number of bytes inspected:
Total number of bytes processed:
Total bytes before deduplication:
Total bytes after deduplication:
Total number of bytes transferred:
Data transfer time:
Network data transfer rate:
Aggregate data transfer rate:
Objects compressed by:
Document:
380,194
573
0
0
0
72
0
324
1.19 TB
132.24 MB
1.01 GB
131.95 MB
132.24 MB
22.11 sec
6,122.55 KB/sec
164.95 KB/sec
0%
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 41 of 42
Effective Planning and Use of TSM V6 Deduplication
Deduplication reduction:
Total data reduction ratio:
Elapsed processing time:
87.30%
99.99%
00:13:40
5.3 TSM deduplication report script
A script has been developed to provide detailed information on deduplication results for a TSM server. In
addition to providing summary information on the effectiveness of TSM deduplication, it can also be used to
gather diagnostics if deduplication results are not consistent with expectations. The script and usage
instructions can be obtained from the TSM support site:
http://www.ibm.com/support/docview.wss?uid=swg21596944
An example of the summary data provided by this report is shown below:
The report also provides details of dedup related utilization of the TSM database.
< End of Document>
Document:
Effective Planning and use of TSM V6 Deduplication
Date: 08/17/2012
Version: 1.0
Page 42 of 42
Download PDF
Similar pages