Turning Object Storage into Virtual Machine Storage

White Papers
TurningObject
StorageintoVirtual
MachineStorage
Open vStorage is the World’s fastest
Distributed Block Store that spans across
different Datacenter. It combines ultrahigh performance and low latency
connectionswithadataintegritythathas
nocomparison.Dataisdistributedacross
datacenters using both Replication and
ErasureCoding.
Joining Performance and Integrity is not
asimple bolt-on solution and requires a
from-the-ground-up approach. Disk
Failures, Node Failures and even
Datacenter Failures do not present data
loss and hence do not threaten any of
yourDataIntegrity.
You have been lead to believe that in
order to have a 100% Data Loss
Protection you have to compromise on
Performance. While this mightsound
logical and acceptable, in is time to step
out of the box and demand a noncompromiseStoragePlatform.
With Open vStorage you can have your
cakeandeatittoo!
ObjectStorageistodaythestandardtobuildscale-outstorage.Butduetotechnical
hurdlesitisimpossibletorunVirtualMachinesdirectlyfromanObjectStore.Open
vStorageisthelayerbetweenthehypervisorandObjectStoreandturnstheObject
Storeintoahighperformance,distributed,VM-centricstorageplatform.
AntwerpseSteenweg19,9080Lochristi
Belgium
Phone:+3293242574
Mail:Info@openvstorage.com
WhitePapers
TheFeaturesofSwift
HighlyScalable
Introduction
Object Storage became mainstream over the last year. Amazon S3 started the Object Storage
momentum but today other players such as Scality, Ceph, OpenStack (Swift) and many other on-site
ObjectStorageSolutionsaretakingover.Theadoptionforgeneralfilestorageuseofthisscale-out,costeffective storage systems are no longer to be stopped. Using Object Storage as primary storage for
Virtual Machines on the other hand has not taken off due to many technical hurdles. With Open
vStoragethesehurdlesareliftedandanyObjectStorecanbeturnedintohighperformance,VMCentric
VirtualMachinestorage.
TheRiseofObjectStorage
Object Storage, a storage architecture that stores data as objects identified by a unique key, is fast
becoming the standard way to store data. IDC1 estimates that the market for File- and Object-Based
Storage will experience an annual growth rate of 27% through 2017, reaching $21.7 billion. This
estimate might even be modest considering the amount of funding Object Storage companies have
receivedoverthelastfewyears2:
1
http://amplidata.com/wp-content/uploads/2013/11/Amplidata-IDC-MarketScape-2013.pdf
http://blog.oxygencloud.com/2013/09/16/after-10-years-object-storage-investment-continues-and-begins-tobear-significant-fruit/
2
WhitePapers
ThebenefitsofObjectStorageareimmense:
•
•
•
•
•
ItallowsServiceProviderstobuildscale-outstoragesolutionsthatoffertheflexibilitytoscaleas-you-growbyaddingmoredisksandstandardx86serverstothestoragerepository.
Reliabilityisofferedbyduplicatingdataacrossmultiplehostsorbyevenmoreadvancederasure
codingalgorithms.Thismakesitvirtuallyimpossibletolosedata.
Easeofmanagementbytakingawayadministrativelowlevelfunctionssuchasmanaginglogical
volumesandraids.
Standardized APIs as almost all Object Storage Solutions offer support for the Amazon Simple
StorageService(S3)APIwhiletraditionalstoragesolutionseachhavetheirownproprietaryAPI.
ThisstandardizedObjectStorageAPIsignificantlyreducesvendorlock-inandmakesmigration
betweendifferentObjectStorageSolutionseasy.
Cost-effectiveasdifferentstoragetierscaneasilybecreatedbymixingfaststoragewithlarge
capacityslowstorage.
Let’shavealookonthetraditionalwayofsettingupVirtualMachineenvironments.VirtualMachines
require block storage. But block level storage such as a SAN is hard to manage, hard to scale and is
expensive.WhatisneededisatechnologywherebyVirtualMachinescanuseObjectStoresinsteadofa
SANandgetthebenefitsofthelowcostandscale-outcapabilitiesofObjectStores.However,thereare
a number of challenges in doing this, which are described below. Open vStorage is a "Grid Storage
Router"thatononesideconnectsthehypervisorandontheothersideanObjectStoretocreateahigh
performance,ultrareliableVM-centricandscale-outstoragesystem.
WhitePapers
TheObjectStorageChallenges
EventualConsistency
ObjectStoragesolutionsaredesignedtobescale-outbysimplyaddingmorex86servers,nodestothe
ObjectStore.Allthesenodesworktogethertoformadistributed,highavailablestoragerepository.Due
tothisdistributednatureofObjectStorage,itissubjecttoBrewer’sCAPTheorem3.
ThistheoremstatesthatitisimpossibleforadistributedsystemtosimultaneouslyprovideConsistency
(allnodesseethesamedataatthesametime),Availability(aguaranteethateveryrequestreceivesa
response about whether it was successful or failed) and Partition Tolerance (the system continues to
operatedespitefailureofpartofthesystem)atthesametime.ObjectStorescanoffertwobutneverall
three.Soatrade-offhastobemade.
ForObjectStoragethetrade-offiseventualconsistency.Eventualconsistencymeansthatincasedata
objects are stored and receive no new updates, that eventually all nodes with access to these data
objectswillreturnthelastupdatedvalue.EventualconsistencyhasbeenproposedsoObjectStorescan
offeranacceptableperformance.Introducingeventualconsistencyhasabigimpactonthecorrectness
of data. If you retrieve data from an Object Store, you are never sure that you actually received the
latestdata.Byintroducingeventualconsistencywehaveallowedpossible‘datacorruption’inorderto
haveanacceptableperformance.
3
http://ksat.me/a-plain-english-introduction-to-cap-theorem/
WhitePapers
Butrestassured,thisdoesn’tmeanthatyourdataontheObjectStoreispossiblycorrupt.Itmeansthat
applications accessing data on the Object Store need to be aware and detect that data might be
outdated. Upon this detection, the application can retrieve the data again and in many cases the
subsequentcallwillreturnthecorrectdata.
Latencyandperformance
Virtual Machines and especially IOPS devouring applications require low storage latency and high
performance storage. Each Virtual Machine requires for its disks almost immediate access to the
underlying storage. As latency and IOPS issues became a flood tide in larger Virtual Machine
environments, faster and more expensive hardware was developed to bring the latency down. SAS
disks,fiberchannel,infinibandandAll-FlashArrayswereintroducedtoofferthenecessarybandwidth
andanacceptablelatency.
Object Storage is developed and optimized to contain a massive amount of data. To maximize the
amount of storage capacity per node in the Object Storage Cluster, large SATA disks are selected as
these provide the best price per GB. By selecting these large disks, you can’t achieve the IOPS and
storage performance needed by Virtual Machines. This fact isn’t jaw dropping as for years SANs have
been fitted with fast, but small SAS drives. One could of course not try to maximize the amount the
storagepernodeandselectsmaller,moreexpensiveSSDdisks,butthismakesthepriceperGBstored
dataskyrocket.
Having expensive, fast disks also does not remove the additional latency introduced by having the
hypervisorconnectoverthelocalLANtotheObjectStore.Fetchingdataacrossthenetworkwillnever
be faster than fetching it local even with infiniband or 40 GbE technologies. With converged and
hyperconverged infrastructure, the trend towards bringing storage closer to the application layer is
irreversiblystarted.
DifferentManagementParadigms
ObjectStoresunderstandobjects,whilehypervisorsunderstandVirtualMachines.Whatisneededisa
software layer that plugs into the hypervisor such that the system administrator doesn't need to
understand LUNs, RAID groups, etc but can just manage Virtual Machines. This software layer has to
translateaVMparadigmintoanObjectStoreparadigm.
WhitePapers
Whya(distributed)filesystemdoesnot
workforVirtualMachines
Virtual Machines need block level storage, a block of storage they can control like a hard drive. File
systems have over time been adjusted to emulate block storage behavior. For example copy-on-write
filesystems,whereeverywriterequiresareadand2writeactions,weredevelopedtosupportblocklevel snapshots. It is clear that in case multiple Virtual Machines are writing at the same time, these
writeactions,whichareveryexpensiveIOactions,becomealimitationfortheperformance.
VirtualizedenvironmentsdemandthesamefilesystemtobeavailableonallHostsinthevirtualization
cluster. This requires a distributed file system or dedicated, expensive hardware like SANs. These
distributed file systems are not designed for Virtual Machines, as they need to balance Consistency,
AvailabilityandPartitiontolerance,whichmeanstheirperformance,isfundamentallylimitedandhence
arenotsuitedforvirtualizedenvironments.
To conclude, none of the file systems today have been designed to link Virtual Machines and Object
Storage. For example copy-on-write file systems struggle with eventual consistency as for every write
they first need a read to safeguard the latest data and with eventual consistency you never know for
sureyouhavethelatestdata.
TurningObjectStorageintoVirtual
MachineStorage
ToturnObjectStorageintoprimarystorageforhypervisors,thesolutionmustbeespeciallydesignedfor
VirtualMachinesandtakearadicaldifferentapproachcomparedtoexistingstoragetechnologies.Open
vStoragetakesthisdifferentapproachandisdesignedfromthegroundupwithVirtualMachinesand
their performance requirements in mind. It uses a well-considered architecture, which allows Object
Storage to be turned into block storage for Virtual Machines and avoids pitfalls such as seen with
distributedfilesystemslinkedtoObjectStorage.
OpenvStoragecreatesaunifiednamespacefortheVirtualMachinesacrossmultipleHosts.Butinthat
namespace not all data gets treated the same way. The actual data of the Virtual Machine, the bits
whichmakeupthevolume,areseparatedfromallotherfiles.Eachcreatedvolumewillbestoredasa
separateblockdeviceinadifferentbucketontheObjectStore.
Every new write on the Virtual Machine volume will result in a new 4k block that will be added to a
Storage Container Object (SCO). Once a SCO is full, typically when it contains 4MB, it is pushed at a
WhitePapers
slowerpacetotheback-end,anObjectStorelikeOpenstackSwift.Asthissecondlayerofstorageisalso
aTimeBasedstorageimplementation,eventualconsistencyisnolongeranissue.Let’ssayaStorage
ContainerObjectispushedtotheObjectStoreandlaterretrieved.Duetothefactthatdataisalways
appended and not overwritten, the Object Store can due to the eventual consistency rule give 2
answers, the actual data or no answer at all. But under no circumstances the hypervisor will receive
outdated,incorrectdata.
Another big difference compared with traditional distributed file systems is that a volume of Virtual
Machine is only available on one Host and not on all Hosts. Each Virtual Machine with the Open
vStorage software has its own NFS server and exports a different file system instance. Nevertheless
each Host is tricked into believing that it accesses a single unified namespace shared across all these
VirtualMachinesrunningtheOpenvStoragesoftware.
Thenon-volumefilesaretreatedcompletelydifferent.Dependingontheirsizeandrole,theyarestored
inadistributeddatabaseorVirtualFileServer.ForexampleVMwareVirtualMachineconfigurationfiles
(vmxfiles)needtobeavailableonallHostssotherearestoredinthedistributeddatabase.Byhaving
themissioncriticalfilesstoredinadistributedadatabase,OpenvStoragesupportsVMwarevMotionas
theofferedstoragepresentedtoeachHostslooklikesharedstorage.ISOfiles,ontheotherhand,are
notmissioncriticalandareroutedtoaVirtualFileServerstoredontheObjectStore.IncasetheVirtual
FileServerisdown,itcaneasilyberestartedonanotherHost.
WhitePapers
OpenvStoragefeatures
Open vStorage, as only solution in the market, turns Object Storage into usable, high performance
storageforVirtualMachineswithfollowingfeatures:
Scale-out
Open vStorage offers scalability both in performance and storage. Adding more Virtual Machines
running the Open vStorage software will linearly scale the performance. This guarantees that storage
performancewillneverbeabottleneckinthevirtualizedenvironment.
Open vStorage allows adding multiple Object Stores to a single virtualized environment. Start with an
ObjectStoreandwhenavailablestoragespacebecomesanissue,takeadecision.Buynewhardwareto
enlargetheexistingstoragepoolorinvestintoanewObjectStore.MixingandmatchingObjectStores
asprimarystorageforVirtualMachinesissomethingonlyOpenvStorageoffers.
VM-Centric
TheflexibilityofOpenvStoragedoesn’tonlyappearinthepossibilitytomixdifferentObjectStoresbut
OpenvStoragealsoallowstocarefullydesigningyourstoragetiers.OnoneandthesameObjectStorage
Solutionyoucouldforexamplehaveatesttierandahighlyredundanttier.Thehighlyredundanttier
could for example make 3 copies of the data while the test tier saves the data only once. Both these
storagetierscanbemadeavailableinOpenvStorageasprimarystorageforVirtualMachines.
Splitting up the Virtual Machine volumes into separately manageable buckets and objects turns Open
vStorageintoaVM-centricstorageplatform,whichallowsforstorageactionslikesnapshotting,cloning
orreplicationattheVirtualMachinelevel.Gonearethedaysofselectingasingleretentionpolicyacross
all Virtual Machines on the LUN. With Open vStorage, administrators can easily select only the most
importantVirtualMachinesforreplication4.Ontop,OpenvStoragesupportsthinprovisioningasonly
datathathasbeenwrittentotheVirtualMachinediskwillbestored.
HavingVM-centricfunctionalitylowersthemanagementoverheadasforexamplebulkprovisioningof
hundreds of Virtual Machines comes out of the box. These Virtual Machines are nearly instant
provisioned as only metadata needs to be copied for each Virtual Machine. A snapshot is merely a
referencetothecorrectmetadata.Takingsnapshotsimposesthusnooverheadasnodataneedstobe
copied.
4
PlannedforQ12015
WhitePapers
HighPerforming
To eliminate the typical VM I/O blender effect, circumvent the eventual consistency issue of Object
Storageandbooststorageperformance,OpenvStorageusesawritecache,whichworksasatransaction
log, on fast Flash or SSD in the Host. These Storage Containers Objects (SCO’s) are sequentially filled
witheachnew4kblocksthatiswrittenbyaVirtualMachine.ThisbasicallyturnsanyrandomwriteI/O
behavior into a sequential write operation. The transaction log immediately confirms the write to the
hypervisorforfastresponsetimes.Duringeachwrite,theaddressofthe4kblocks,thehash,theSCO
number and the offset are stored as metadata. Open vStorage uses a Paxos distributed database to
provideredundancyandimmediateaccesstothemetadataincasethevolumeismovedorfailedover
toanotherHost.ToprovideredundancyallwritesaremirroredtoaFail-OverCache(FOC)onasecond
Host.ThesizeofthisFail-OverCachecanbeverysmall(coupleofMbytespervolumes)becausethereis
onlyaneedtoprotectdatathatisnotyetstoredontheObjectStore.
To improve the read performance Open vStorage uses a deduplicated read cache across all volumes
hosted on the same hypervisor. If a read request is done, Open vStorage looks up the hash in the
metadataandifitexistsinthecacheitwillservethedatadirectlyfromSSDorflashstorage,resultingin
veryfastreadoperations.Whenthinclonesaremade,forexamplewhenmultipleVirtualMachinesare
clonedfromamastertemplate,thesame4kblockswillhavethesamehashandwillbestoredonlyonce
in the read cache (dedupe), while the hypervisor will see them all as individual and independent
volumes.
Conclusion
In the past year on-site Object Stores (Ceph, Openstack Swift, …) have left the niche status and is
becomingfastthedefactostandardforscale-out,redundantstorage.RunningVirtualMachinesonthis
typeofstoragedoesnotcomeoutoftheboxduetoissuessuchaseventualconsistency,highlatency
andlimitedbandwidth.OpenvStorageisthesolutiontorunVirtualMachinesontopoftheseObject
Stores. By using an architecture with caching on fast Flash or SSD drives close to the hypervisor,
transactionlogsandisolatingVirtualMachinevolumesfromotherVirtualMachinefiles,OpenvStorage
turns an Object Store into a high performance, distributed, VM-centric storage platform which lowers
the management overhead and offers features such as zero-copy snapshots, thin provisioning, bulk
provisioningandquickrestores.