White Papers TurningObject StorageintoVirtual MachineStorage Open vStorage is the World’s fastest Distributed Block Store that spans across different Datacenter. It combines ultrahigh performance and low latency connectionswithadataintegritythathas nocomparison.Dataisdistributedacross datacenters using both Replication and ErasureCoding. Joining Performance and Integrity is not asimple bolt-on solution and requires a from-the-ground-up approach. Disk Failures, Node Failures and even Datacenter Failures do not present data loss and hence do not threaten any of yourDataIntegrity. You have been lead to believe that in order to have a 100% Data Loss Protection you have to compromise on Performance. While this mightsound logical and acceptable, in is time to step out of the box and demand a noncompromiseStoragePlatform. With Open vStorage you can have your cakeandeatittoo! ObjectStorageistodaythestandardtobuildscale-outstorage.Butduetotechnical hurdlesitisimpossibletorunVirtualMachinesdirectlyfromanObjectStore.Open vStorageisthelayerbetweenthehypervisorandObjectStoreandturnstheObject Storeintoahighperformance,distributed,VM-centricstorageplatform. AntwerpseSteenweg19,9080Lochristi Belgium Phone:+3293242574 Mail:Info@openvstorage.com WhitePapers TheFeaturesofSwift HighlyScalable Introduction Object Storage became mainstream over the last year. Amazon S3 started the Object Storage momentum but today other players such as Scality, Ceph, OpenStack (Swift) and many other on-site ObjectStorageSolutionsaretakingover.Theadoptionforgeneralfilestorageuseofthisscale-out,costeffective storage systems are no longer to be stopped. Using Object Storage as primary storage for Virtual Machines on the other hand has not taken off due to many technical hurdles. With Open vStoragethesehurdlesareliftedandanyObjectStorecanbeturnedintohighperformance,VMCentric VirtualMachinestorage. TheRiseofObjectStorage Object Storage, a storage architecture that stores data as objects identified by a unique key, is fast becoming the standard way to store data. IDC1 estimates that the market for File- and Object-Based Storage will experience an annual growth rate of 27% through 2017, reaching $21.7 billion. This estimate might even be modest considering the amount of funding Object Storage companies have receivedoverthelastfewyears2: 1 http://amplidata.com/wp-content/uploads/2013/11/Amplidata-IDC-MarketScape-2013.pdf http://blog.oxygencloud.com/2013/09/16/after-10-years-object-storage-investment-continues-and-begins-tobear-significant-fruit/ 2 WhitePapers ThebenefitsofObjectStorageareimmense: • • • • • ItallowsServiceProviderstobuildscale-outstoragesolutionsthatoffertheflexibilitytoscaleas-you-growbyaddingmoredisksandstandardx86serverstothestoragerepository. Reliabilityisofferedbyduplicatingdataacrossmultiplehostsorbyevenmoreadvancederasure codingalgorithms.Thismakesitvirtuallyimpossibletolosedata. Easeofmanagementbytakingawayadministrativelowlevelfunctionssuchasmanaginglogical volumesandraids. Standardized APIs as almost all Object Storage Solutions offer support for the Amazon Simple StorageService(S3)APIwhiletraditionalstoragesolutionseachhavetheirownproprietaryAPI. ThisstandardizedObjectStorageAPIsignificantlyreducesvendorlock-inandmakesmigration betweendifferentObjectStorageSolutionseasy. Cost-effectiveasdifferentstoragetierscaneasilybecreatedbymixingfaststoragewithlarge capacityslowstorage. Let’shavealookonthetraditionalwayofsettingupVirtualMachineenvironments.VirtualMachines require block storage. But block level storage such as a SAN is hard to manage, hard to scale and is expensive.WhatisneededisatechnologywherebyVirtualMachinescanuseObjectStoresinsteadofa SANandgetthebenefitsofthelowcostandscale-outcapabilitiesofObjectStores.However,thereare a number of challenges in doing this, which are described below. Open vStorage is a "Grid Storage Router"thatononesideconnectsthehypervisorandontheothersideanObjectStoretocreateahigh performance,ultrareliableVM-centricandscale-outstoragesystem. WhitePapers TheObjectStorageChallenges EventualConsistency ObjectStoragesolutionsaredesignedtobescale-outbysimplyaddingmorex86servers,nodestothe ObjectStore.Allthesenodesworktogethertoformadistributed,highavailablestoragerepository.Due tothisdistributednatureofObjectStorage,itissubjecttoBrewer’sCAPTheorem3. ThistheoremstatesthatitisimpossibleforadistributedsystemtosimultaneouslyprovideConsistency (allnodesseethesamedataatthesametime),Availability(aguaranteethateveryrequestreceivesa response about whether it was successful or failed) and Partition Tolerance (the system continues to operatedespitefailureofpartofthesystem)atthesametime.ObjectStorescanoffertwobutneverall three.Soatrade-offhastobemade. ForObjectStoragethetrade-offiseventualconsistency.Eventualconsistencymeansthatincasedata objects are stored and receive no new updates, that eventually all nodes with access to these data objectswillreturnthelastupdatedvalue.EventualconsistencyhasbeenproposedsoObjectStorescan offeranacceptableperformance.Introducingeventualconsistencyhasabigimpactonthecorrectness of data. If you retrieve data from an Object Store, you are never sure that you actually received the latestdata.Byintroducingeventualconsistencywehaveallowedpossible‘datacorruption’inorderto haveanacceptableperformance. 3 http://ksat.me/a-plain-english-introduction-to-cap-theorem/ WhitePapers Butrestassured,thisdoesn’tmeanthatyourdataontheObjectStoreispossiblycorrupt.Itmeansthat applications accessing data on the Object Store need to be aware and detect that data might be outdated. Upon this detection, the application can retrieve the data again and in many cases the subsequentcallwillreturnthecorrectdata. Latencyandperformance Virtual Machines and especially IOPS devouring applications require low storage latency and high performance storage. Each Virtual Machine requires for its disks almost immediate access to the underlying storage. As latency and IOPS issues became a flood tide in larger Virtual Machine environments, faster and more expensive hardware was developed to bring the latency down. SAS disks,fiberchannel,infinibandandAll-FlashArrayswereintroducedtoofferthenecessarybandwidth andanacceptablelatency. Object Storage is developed and optimized to contain a massive amount of data. To maximize the amount of storage capacity per node in the Object Storage Cluster, large SATA disks are selected as these provide the best price per GB. By selecting these large disks, you can’t achieve the IOPS and storage performance needed by Virtual Machines. This fact isn’t jaw dropping as for years SANs have been fitted with fast, but small SAS drives. One could of course not try to maximize the amount the storagepernodeandselectsmaller,moreexpensiveSSDdisks,butthismakesthepriceperGBstored dataskyrocket. Having expensive, fast disks also does not remove the additional latency introduced by having the hypervisorconnectoverthelocalLANtotheObjectStore.Fetchingdataacrossthenetworkwillnever be faster than fetching it local even with infiniband or 40 GbE technologies. With converged and hyperconverged infrastructure, the trend towards bringing storage closer to the application layer is irreversiblystarted. DifferentManagementParadigms ObjectStoresunderstandobjects,whilehypervisorsunderstandVirtualMachines.Whatisneededisa software layer that plugs into the hypervisor such that the system administrator doesn't need to understand LUNs, RAID groups, etc but can just manage Virtual Machines. This software layer has to translateaVMparadigmintoanObjectStoreparadigm. WhitePapers Whya(distributed)filesystemdoesnot workforVirtualMachines Virtual Machines need block level storage, a block of storage they can control like a hard drive. File systems have over time been adjusted to emulate block storage behavior. For example copy-on-write filesystems,whereeverywriterequiresareadand2writeactions,weredevelopedtosupportblocklevel snapshots. It is clear that in case multiple Virtual Machines are writing at the same time, these writeactions,whichareveryexpensiveIOactions,becomealimitationfortheperformance. VirtualizedenvironmentsdemandthesamefilesystemtobeavailableonallHostsinthevirtualization cluster. This requires a distributed file system or dedicated, expensive hardware like SANs. These distributed file systems are not designed for Virtual Machines, as they need to balance Consistency, AvailabilityandPartitiontolerance,whichmeanstheirperformance,isfundamentallylimitedandhence arenotsuitedforvirtualizedenvironments. To conclude, none of the file systems today have been designed to link Virtual Machines and Object Storage. For example copy-on-write file systems struggle with eventual consistency as for every write they first need a read to safeguard the latest data and with eventual consistency you never know for sureyouhavethelatestdata. TurningObjectStorageintoVirtual MachineStorage ToturnObjectStorageintoprimarystorageforhypervisors,thesolutionmustbeespeciallydesignedfor VirtualMachinesandtakearadicaldifferentapproachcomparedtoexistingstoragetechnologies.Open vStoragetakesthisdifferentapproachandisdesignedfromthegroundupwithVirtualMachinesand their performance requirements in mind. It uses a well-considered architecture, which allows Object Storage to be turned into block storage for Virtual Machines and avoids pitfalls such as seen with distributedfilesystemslinkedtoObjectStorage. OpenvStoragecreatesaunifiednamespacefortheVirtualMachinesacrossmultipleHosts.Butinthat namespace not all data gets treated the same way. The actual data of the Virtual Machine, the bits whichmakeupthevolume,areseparatedfromallotherfiles.Eachcreatedvolumewillbestoredasa separateblockdeviceinadifferentbucketontheObjectStore. Every new write on the Virtual Machine volume will result in a new 4k block that will be added to a Storage Container Object (SCO). Once a SCO is full, typically when it contains 4MB, it is pushed at a WhitePapers slowerpacetotheback-end,anObjectStorelikeOpenstackSwift.Asthissecondlayerofstorageisalso aTimeBasedstorageimplementation,eventualconsistencyisnolongeranissue.Let’ssayaStorage ContainerObjectispushedtotheObjectStoreandlaterretrieved.Duetothefactthatdataisalways appended and not overwritten, the Object Store can due to the eventual consistency rule give 2 answers, the actual data or no answer at all. But under no circumstances the hypervisor will receive outdated,incorrectdata. Another big difference compared with traditional distributed file systems is that a volume of Virtual Machine is only available on one Host and not on all Hosts. Each Virtual Machine with the Open vStorage software has its own NFS server and exports a different file system instance. Nevertheless each Host is tricked into believing that it accesses a single unified namespace shared across all these VirtualMachinesrunningtheOpenvStoragesoftware. Thenon-volumefilesaretreatedcompletelydifferent.Dependingontheirsizeandrole,theyarestored inadistributeddatabaseorVirtualFileServer.ForexampleVMwareVirtualMachineconfigurationfiles (vmxfiles)needtobeavailableonallHostssotherearestoredinthedistributeddatabase.Byhaving themissioncriticalfilesstoredinadistributedadatabase,OpenvStoragesupportsVMwarevMotionas theofferedstoragepresentedtoeachHostslooklikesharedstorage.ISOfiles,ontheotherhand,are notmissioncriticalandareroutedtoaVirtualFileServerstoredontheObjectStore.IncasetheVirtual FileServerisdown,itcaneasilyberestartedonanotherHost. WhitePapers OpenvStoragefeatures Open vStorage, as only solution in the market, turns Object Storage into usable, high performance storageforVirtualMachineswithfollowingfeatures: Scale-out Open vStorage offers scalability both in performance and storage. Adding more Virtual Machines running the Open vStorage software will linearly scale the performance. This guarantees that storage performancewillneverbeabottleneckinthevirtualizedenvironment. Open vStorage allows adding multiple Object Stores to a single virtualized environment. Start with an ObjectStoreandwhenavailablestoragespacebecomesanissue,takeadecision.Buynewhardwareto enlargetheexistingstoragepoolorinvestintoanewObjectStore.MixingandmatchingObjectStores asprimarystorageforVirtualMachinesissomethingonlyOpenvStorageoffers. VM-Centric TheflexibilityofOpenvStoragedoesn’tonlyappearinthepossibilitytomixdifferentObjectStoresbut OpenvStoragealsoallowstocarefullydesigningyourstoragetiers.OnoneandthesameObjectStorage Solutionyoucouldforexamplehaveatesttierandahighlyredundanttier.Thehighlyredundanttier could for example make 3 copies of the data while the test tier saves the data only once. Both these storagetierscanbemadeavailableinOpenvStorageasprimarystorageforVirtualMachines. Splitting up the Virtual Machine volumes into separately manageable buckets and objects turns Open vStorageintoaVM-centricstorageplatform,whichallowsforstorageactionslikesnapshotting,cloning orreplicationattheVirtualMachinelevel.Gonearethedaysofselectingasingleretentionpolicyacross all Virtual Machines on the LUN. With Open vStorage, administrators can easily select only the most importantVirtualMachinesforreplication4.Ontop,OpenvStoragesupportsthinprovisioningasonly datathathasbeenwrittentotheVirtualMachinediskwillbestored. HavingVM-centricfunctionalitylowersthemanagementoverheadasforexamplebulkprovisioningof hundreds of Virtual Machines comes out of the box. These Virtual Machines are nearly instant provisioned as only metadata needs to be copied for each Virtual Machine. A snapshot is merely a referencetothecorrectmetadata.Takingsnapshotsimposesthusnooverheadasnodataneedstobe copied. 4 PlannedforQ12015 WhitePapers HighPerforming To eliminate the typical VM I/O blender effect, circumvent the eventual consistency issue of Object Storageandbooststorageperformance,OpenvStorageusesawritecache,whichworksasatransaction log, on fast Flash or SSD in the Host. These Storage Containers Objects (SCO’s) are sequentially filled witheachnew4kblocksthatiswrittenbyaVirtualMachine.ThisbasicallyturnsanyrandomwriteI/O behavior into a sequential write operation. The transaction log immediately confirms the write to the hypervisorforfastresponsetimes.Duringeachwrite,theaddressofthe4kblocks,thehash,theSCO number and the offset are stored as metadata. Open vStorage uses a Paxos distributed database to provideredundancyandimmediateaccesstothemetadataincasethevolumeismovedorfailedover toanotherHost.ToprovideredundancyallwritesaremirroredtoaFail-OverCache(FOC)onasecond Host.ThesizeofthisFail-OverCachecanbeverysmall(coupleofMbytespervolumes)becausethereis onlyaneedtoprotectdatathatisnotyetstoredontheObjectStore. To improve the read performance Open vStorage uses a deduplicated read cache across all volumes hosted on the same hypervisor. If a read request is done, Open vStorage looks up the hash in the metadataandifitexistsinthecacheitwillservethedatadirectlyfromSSDorflashstorage,resultingin veryfastreadoperations.Whenthinclonesaremade,forexamplewhenmultipleVirtualMachinesare clonedfromamastertemplate,thesame4kblockswillhavethesamehashandwillbestoredonlyonce in the read cache (dedupe), while the hypervisor will see them all as individual and independent volumes. Conclusion In the past year on-site Object Stores (Ceph, Openstack Swift, …) have left the niche status and is becomingfastthedefactostandardforscale-out,redundantstorage.RunningVirtualMachinesonthis typeofstoragedoesnotcomeoutoftheboxduetoissuessuchaseventualconsistency,highlatency andlimitedbandwidth.OpenvStorageisthesolutiontorunVirtualMachinesontopoftheseObject Stores. By using an architecture with caching on fast Flash or SSD drives close to the hypervisor, transactionlogsandisolatingVirtualMachinevolumesfromotherVirtualMachinefiles,OpenvStorage turns an Object Store into a high performance, distributed, VM-centric storage platform which lowers the management overhead and offers features such as zero-copy snapshots, thin provisioning, bulk provisioningandquickrestores.