Monitoring and Analysis of Disk throughput and latency in

Monitoring and Analysis of Disk throughput and latency in
Thesis no: MSEE-2016:45
Monitoring and Analysis of Disk
throughput and latency in servers running
Cassandra database
An Experimental Investigation
Rajeev Varma Kalidindi
Disk throughput and latency
Faculty of Computing
Blekinge Institute of Technology
SE-371 79 Karlskrona Sweden
This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial
fulfillment of the requirements for the degree of Master of Science in Electrical Engineering with
Emphasis on Telecommunication Systems. The thesis is equivalent to 20 weeks of full time
studies.
Contact Information:
Author(s):
Rajeev Varma Kalidindi
Email: [email protected] || [email protected]
External advisor:
Jim Håkansson
Chief Architect,
Ericsson R&D,
Karlskrona, Sweden
University advisor:
Prof. Kurt Tutschku
Department of Communication Systems
Faculty of Computing
Blekinge Institute of Technology
SE-371 79 Karlskrona, Sweden
Internet
Phone
Fax
: www.bth.se
: +46 455 38 50 00
: +46 455 38 50 57
i
ABSTRACT
Context. Light weight process virtualization has been used in the past e.g., Solaris zones, jails in Free
BSD and Linux’s containers (LXC). But only since 2013 is there a kernel support for user namespace
and process grouping control that make the use of lightweight virtualization interesting to create virtual
environments comparable to virtual machines.
Telecom providers have to handle the massive growth of information due to the growing number of
customers and devices. Traditional databases are not designed to handle such massive data ballooning.
NoSQL databases were developed for this purpose. Cassandra, with its high read and write throughputs,
is a popular NoSQL database to handle this kind of data.
Running the database using operating system virtualization or containerization would offer a significant
performance gain when compared to that of virtual machines and also gives the benefits of migration,
fast boot up and shut down times, lower latency and less use of physical resources of the servers.
Objectives. This thesis aims to investigate the trade-off in performance while loading a Cassandra
cluster in bare-metal and containerized environments. A detailed study of the effect of loading the
cluster in each individual node in terms of Latency, CPU and Disk throughput will be analyzed.
Methods. We implement the physical model of the Cassandra cluster based on realistic and commonly
used scenarios or database analysis for our experiment. We generate different load cases on the cluster
for bare-metal and Cassandra in docker scenarios and see the values of CPU utilization, Disk throughput
and latency using standard tools like sar and iostat. Statistical analysis (Mean value analysis, higher
moment analysis and confidence intervals) are done on measurements on specific interfaces in order to
increase the reliability of the results.
Results. Experimental results show a quantitative analysis of measurements consisting Latency, CPU
and Disk throughput while running a Cassandra cluster in Bare Metal and Container Environments. A
statistical analysis summarizing the performance of Cassandra cluster is surveyed.
Conclusions. With the detailed analysis, the resource utilization of the database was similar in both the
bare-metal and container scenarios. Disk throughput is similar in the case of mixed load and containers
have a slight overhead in the case of write loads for both the maximum load case and 66% of maximum
load case. The latency values inside the container are slightly higher for all the cases. The mean value
analysis and higher moment analysis helps us in doing a finer analysis of the results. The confidence
intervals calculated show that there is a lot of variation in the disk performance which might be due to
compactions happening randomly. Future work in the area can be done on compaction strategies.
Keywords: Cassandra-stress, no-SQL, Docker, VM,
Virtualization, CQL, Bare-Metal, Linux
1
ACKNOWLEDGEMENTS
I would like to express my gratitude to my supervisor Prof. Kurt Tutschku for his guidance
and support during my thesis work. His comments have helped me in every step of my thesis
and have been instrumental in improving the quality of the thesis work.
I would like to extend my heartfelt thanks to my mentors Jim Håkansson, and Christian
Andersson at Ericsson for giving their invaluable guidance, support and clarity. I would like
acknowledge, Marcus Olsson, and Jan Karlsson from Ericsson for their advice and support.
I would like to thank Associate Prof. Emiliano Casalicchio for proposing this thesis and
Sogand Shirinbab for providing the servers and for guiding us during this project.
A special note of thanks to my thesis partner Avinash Goud Chekkila for his cooperation
during the thesis work.
Finally, I would like to take this opportunity to thank and acknowledge my family for their
love, support and the values that they have taught me over the years. All that is good in me is
because of them. Thank you.
2
CONTENTS
Table of Contents Abstract ................................................................................................................ 1 Acknowledgements ............................................................................................... 2 Contents ................................................................................................................ 3 List of Figures ........................................................................................................ 5 List of tables .......................................................................................................... 7 1 Introduction ................................................................................................... 8 Motivation ........................................................................................................ 9 Problem Statement ........................................................................................... 9 Aim of the thesis ............................................................................................... 9 Research questions ......................................................................................... 10 Split of work .................................................................................................... 10 Contribution .................................................................................................... 11 1.1 1.2 1.3 1.4 1.5 1.6 2 Related Work ............................................................................................... 12 3 Methodology ................................................................................................ 14 3.1 No-­‐SQL ............................................................................................................ 14 3.1.1 Cassandra ...................................................................................................... 15 3.2 Virtualization and container based virtualization ............................................. 21 3.2.1 Virtualization ................................................................................................. 21 3.2.2 Container based virtualization ...................................................................... 21 3.2.3 Container based virtualization vs traditional virtualization ........................... 22 3.2.4 Docker ........................................................................................................... 22 4 5 Methodology ................................................................................................ 24 4.1 Ways to study a system ................................................................................... 24 4.2 Methodology for analyzing a system. .............................................................. 24 4.2.1 SAR Tool ......................................................................................................... 25 4.2.2 Iostat tool ...................................................................................................... 25 4.2.3 Cassandra-­‐Stress tool .................................................................................... 25 4.2.4 Test-­‐bed 1: Cassandra on Native Bare-­‐metal Server ................................... 26 4.2.5 Test-­‐bed 2: Cassandra on docker .................................................................. 27 4.3 Statistical and measurement based system analysis ........................................ 29 4.4 Scenarios ......................................................................................................... 30 4.5 Metrics ............................................................................................................ 31 results .......................................................................................................... 33 5.1 Individual results ............................................................................................. 33 5.1.1 Experiment description for disk utilization .................................................... 33 5.1.2 Experiment description for latency: .............................................................. 39 5.2 Common results: ............................................................................................. 41 5.2.1 Cpu utilization: .............................................................................................. 41 5.3 Discussion ....................................................................................................... 44 3
6 Conclusion and Future Work ......................................................................... 46 Answer to research questions: ........................................................................ 46 6.1 References .......................................................................................................... 48 Appendix ............................................................................................................. 49 4
LIST OF FIGURES
FIGURE 1: CASSANDRA ARCHITECTURE [6] .......................................................................................... 16 FIGURE 2 CASSANDRA WRITE PATH [7] ................................................................................................ 17 FIGURE 3 CASSANDRA COMPACTION PROCESS [7] .............................................................................. 18 FIGURE 4 CASSANDRA READ PATH [8] ................................................................................................. 19 FIGURE 5 CASSANDRA HINTED HANDOFF [9] ....................................................................................... 20 FIGURE 6 VIRTUAL MACHINES VS CONTAINERS ................................................................................... 22 FIGURE 7 WAYS TO STUDY A SYSTEM .................................................................................................. 24 FIGURE 8 CASSANDRA IN NATIVE BARE-­‐METAL SERVER ...................................................................... 26 FIGURE 9 CASSANDRA IN DOCKER ....................................................................................................... 28 FIGURE 10 SCENARIOS ......................................................................................................................... 30 FIGURE 11 DISK UTILIZATION FOR 100% MIXED LOAD ........................................................................ 34 FIGURE 12 95 % CONFIDENCE INTERVALS FOR DISK THROUGHPUT_100%_MIXED ............................ 34 FIGURE 13 DISK THROUGHPUT FOR 66% MIXED LOAD ....................................................................... 35 FIGURE 14 95% CONFIDENCE INTERVALS FOR DISK THROUGHPUT_66% MIXED LOAD ...................... 35 FIGURE 15 AVERAGE DISK THROUGHPUT ............................................................................................ 36 FIGURE 16 AVERAGE DISK UTILIZATION (8 INTERVALS) ...................................................................... 36 FIGURE 17 DISK THROUGHPUT FOR 100% WRITE LOAD ...................................................................... 37 FIGURE 18 95% CONFIDENCE INTERVAL FOR DISK THROUGHPUT_100% WRITE LOAD ...................... 37 FIGURE 19 DISK THROUGHPUT FOR 66% WRITE LOAD ........................................................................ 38 FIGURE 20 95% CONFIDENCE INTERVAL FOR DISK THROUGHPUT_66% WRITE .................................. 38 FIGURE 21 DISK THROUGHPUT FOR 100% READ LOAD ....................................................................... 39 FIGURE 22 MAX MIXED LOAD LATENCY (IN MS) .................................................................................. 39 FIGURE 23 66% OF MAX LOAD LATENCY (IN MS) ................................................................................. 40 FIGURE 24 MAX LOAD WRITE OPERATIONS LATENCY ......................................................................... 40 FIGURE 25 66% OF MAX LOAD WRITE OPERATIONS LATENCY ............................................................ 41 FIGURE 26 CPU UTILIZATION FOR 100% MIXED LOAD ......................................................................... 42 FIGURE 27: 95% CONFIDENCE INTERVALS FOR CPU UTILIZATION_100%_MIXED LOAD ..................... 42 FIGURE 28 AVERAGE CPU UTILIZATION ................................................................................................ 43 FIGURE 29 AVERAGE VALUE OF CPU UTILIZATION ............................................................................... 43 FIGURE 30: MAX LOAD READ OPERATIONS LATENCY .......................................................................... 44 FIGURE 31: 66% OF MAX LOAD READ OPERATION .............................................................................. 44 FIGURE 32: SEEDS IP ADDRESS ............................................................................................................. 49 FIGURE 33: LISTEN ADDRESS ................................................................................................................ 49 FIGURE 34 BROADCAST ADDRESS ........................................................................................................ 50 FIGURE 35 RPC ADDRESS ...................................................................................................................... 50 FIGURE 36 LATENCY OPERATIONS ....................................................................................................... 51 FIGURE 37: NODETOOL STATUS ........................................................................................................... 52 5
6
LIST OF TABLES
TABLE 1 VIRTUAL MACHINES VS CONTAINERS ..................................................................................... 22 TABLE 2 TEST-­‐BED DETAILS BARE-­‐METAL ............................................................................................. 26 TABLE 3 DOCKER TESTBED DETAILS ..................................................................................................... 28 7
1 INTRODUCTION
This chapter initially describes a brief description about need for no-SQL databases
and cloud technologies in telecom. Successively, an introduction to Containers which
are seen as a viable alternative to virtual machines to handle these enormous data in
cloud environment.
The amount of information in digital form kept growing massively since 2000’s
because of digital transformation in the form of media – voice, TV, radio, print which
mark the transformation from analog to digital world.
In order to handle such data, NoSQL databases are used. Cassandra with respect to
scalability is superior to other databases while at the same time maintaining continuous
availability, Data location independence, Fault Tolerant, Decentralized and Elastic
features.
Also, to keep up with this expansion of data and successive demand to rise the
capability of data centers, run cloud services, consolidate server infrastructure and
provide simpler and more affordable solutions for high availability, Virtualization
Technologies which are hardware independent, isolated, secure user environments are
being utilized by IT organizations.
Virtual Machines are widely used in cloud computing, specifically IaaS. Cloud
Platforms like Amazon make VMs accessible and also execute services like databases
inside VMs. PaaS and SaaS are built on IaaS with all their workloads executing on VMs.
As virtually, all cloud workloads are presently running in VMs, VM performance has
been a key element of overall cloud performance. Once an overhead is added by the
hypervisor, no higher layer can remove it. Such overheads have been an inescapable tax
on cloud workload performance.
Although the virtualization technology is mature, there are quite a few challenges in
performance due to the overhead created by the guest OS. Containers with less overhead
and fast startup and shutdown times are seen as a viable alternative in Big data
applications that use no SQL distributed storage systems. Containers run as a well
isolated application within the host operating system. Containers play a vital role when
Speed and flexibility, New workloads and quick deployment is a major consideration.
Docker containers wrap up a piece of software in a complete filesystem that contains
everything it needs to run: code, runtime, system tools, system libraries, which
guarantees it to be running the same independent of the environment it is running in.
This thesis mainly focuses on the implementation and performance evaluation of
Cassandra which is a no-SQL database in bare-metal and containers. The
performance of the database with necessary configurations in a bare-metal server is
evaluated. Having the same configuration for containers, we do the performance
testing of the database in containers. Finally, a trade-off in the performance while
running the database in bare-metal and containers is observed and analyzed.
8
1.1 Motivation
The growth of of vast amount of data specially because of digital transformation in
the past few years paved a need to go for NoSQL database technologies that were
intended to handle the demands of present day modern applications in the field of IT
which couldn’t be handled by traditional RDBMS which are not that dynamic and
flexible. Running these databases in cloud makes it cost effective because of the
advantages of reduced overhead, rapid provisioning, flexibility and scalability. As
virtually all workloads run inside VM, its performance effects the overall cloud
performance.
IT organizations place a problem with the significant growth of data and methods to
handle it. Cloud computing which uses virtualization can be seen as a solution. But there
is overhead because of the guest OS in the performance of the VM.
Containers which are light weight VM, can be seen as a viable alternative because
of their advantage in avoiding the overhead created by guest OS in VM in running the
Cloud.
1.2 Problem Statement
Virtualization technology is an effective way for optimizing cloud infrastructure.
However, there is an inherent problem in the overall performance of the cloud while
running applications handling Big data inside the VM because of the overhead induced
by the Guest OS. Container-based Virtualization provide a different level of abstraction
in terms of virtualization and isolation. While hypervisor abstract the hardware and need
a full OS which runs on the VM instance in each VM that results in overhead in terms
of virtualizing the hardware and virtual device drivers, containers implement the
isolation of processes at operating system level, thereby avoiding the overhead. These
containers run on top of the kernel of underlying Host Machine. This advantage of
running containers because of the shared kernel makes it to achieve higher density in
terms of disk images and virtualized instances with respect to hypervisors. By
identifying and studying the performance of the overhead in both Bare Metal and
Container environments in terms of CPU, Disk Utilizations and Latency, deeper
understanding of the resource sharing in both environments, as well as better
optimizations can be achieved. This can pave the way for the usage of containers or
hypervisor based virtualization or both in optimizing cloud infrastructure in order to
handle Big Data.
1.3 Aim of the thesis
The aim of this thesis is to investigate the impact of using containers on the
performance while running no-SQL systems for telecommunication applications
processing large amount of data. Initially a no-SQL data base system is implemented by
forming a Cassandra cluster on Bare Metal and then on Container environment. After
that a comparison between Bare Metal and Containers which is a lightweight process
level virtualization is studied while stressing the Cassandra cluster with load and then
by using performance metrics like CPU Utilization, Disk Utilization and Latency on
individual nodes in the cluster.
9
1.4 Research questions
Some of the research questions pertaining to this thesis are as follows:
• What is the methodology for analyzing the performance of databases?
• What is the Disk utilization of the server while running the database in baremetal case and in containers?
• What is the and latency of operations while running the database in baremetal case and in containers?
• How does this performance vary with the different load scenarios?
1.5 Split of work
This thesis is a part of project at Ericsson, which is to analyze the feasibility of using
containers to deploy the Cassandra database. The project is joint thesis done by two
individuals, Rajeev Varma and Avinash Goud, based on different parameters. The Disk
utilization is analyzed by Rajeev Varma and CPU utilization by Avinash Goud. Latency
of the operations is shared by both Rajeev and Avinash. Experiments are conducted
individually for the respective workloads
• Monitoring and analysis on disk utilization of the server while running the
database are performed by Rajeev.
• Monitoring and analysis of CPU utilization of the server while running the
database is performed by Avinash.
• Monitoring and analysis of latency of operations are performed by both
Rajeev and Avinash
• Both of us take the same load cases of maximum load and 66% of maximum
load and evaluate the database on mixed load operation of 3 reads and 1
write.
• Introduction to Cassandra in telecom is written by Avinash. The motivation,
problem statement and research questions are written by both Avinash and
Rajeev.
• Both Rajeev and Avinash have together worked on the related work section.
• Avinash has worked on the technological overview of the No-SQL systems
and Cassandra database. Rajeev has worked on the Virtualization section.
• System analysis methodologies section. Different ways to study a system
and methodology for analyzing a system are written by Rajeev. Statistical
analysis and ways to study a database system are done by Avinash.
• In the results section, the analysis for disk throughput and latency values are
evaluated by Rajeev and CPU utilization is analyzed by Avinash
• Individual conclusions and general conclusion of the thesis is presented in
the conclusion section.
10
1.6 Contribution
The primary contribution of this thesis is the insight it provides into the
understanding of the working of Linux environment, Light weight virtualization,
Cassandra and Docker containers.
Secondly, the thesis gives an understanding on the performance of each load type in
terms of latency, cpu and disk utilizations in Bare metal and Docker container
environments when affected by external load onto the Cassandra cluster.
Finally, Quantitative analysis of resource sharing in bare metal and docker
environments from the obtained results in graphical representation is shown. This helps
in understanding the Resource Sharing in Bare Metals and Light weight virtualization
platform (Container) environments in a greater detail that can lead in optimizing cloud
performance.
11
2 RELATED WORK
This chapter aims to present the major ideas in state of previous performance
measurements in the field of NO-SQL Databases, Bare Metal and Virtualization
Technologies.
Datastax paper on Why NoSQL [7], discusses the present day need for handling big
data with the enormous growth in data. The paper discusses six reasons for the world to
move towards No-SQL database like Cassandra from traditional RDBMS databases.
The paper highlights the usage of Cassandra that enables to achieve Continuous
Availability, Location Independence, Modern transactional capabilities, with having a
better architecture and a Flexible Data Model.
Datastax white paper on Introduction to Cassandra [8] provides a brief overview of
the No-SQL data base, its underlying architecture and discusses about the distribution
and replication strategies, read and write operations, cloud support, performance and
data consistency. The paper highlights the linear scalability feature of cassandra that
keeps it to the top among other databases and the easy to use data model that enables
developers to use and develop applications.
Avinash Lakshman and Prashanth Malik in their paper [9] present their model of a
No-SQL database Cassandra, which was developed to handle the facebook inbox search
problem which involved high write throughputs. The paper presents an overview of the
client API, system design and the various distributed algorithms that enable the database
to run. The authors have discussed about performance improvement strategies and give
an use case of cassandra in an application.
Vishal Dilipbhai Jogi and Ashay Sinha in their work [10] focused on the evaluating
the performance of tradional RDBMS and No-SQL databases MySQL, Cassandra and
HBase for Heavy Write Operations by a web based REST application. The authors
measured that Cassandra scaled up scaled up enormously with fast write operations
about ten times while HBase that had almost twice the speed as traditional databases.
Vladimir Jurenka in his paper [11], presents an overview of virtualization
technologies and discusses about the container virtualization using docker. The author
presents an overview of docker, its architecture, docker images and orchestration tools.
The author highlights the docker API and presents a comparative study with other
container technologies and finally ends with a real time implementation of file transfer
between docker containers running on different hosts and developing a new docker
update command that simplifies detection of outdated images.
Wes Felter, Alexander Ferreira, Ram Rajamony, Juan Rubio in their paper[4]outline
the usage of traditional virtual machine and container hypervisors, in cloud computing
applications. The authors make a comparative study of native, container and virtual
machine environments using hardware and software across a cross-section of
benchmarks and workloads relevant to the cloud. The authors identify the performance
impact of using virtualized environments and bring into notice issues that affect their
performance. Finally, they come up with docker showing sometimes negligible or
outperforming performance when compared to Virtual Machines for various test
scenarios.
P. E. N, F. J. P. Mulerickal, B. Paul, and Y. Sastri in their paper [12], evaluated the
usage of container technology with bare-metal and hypervisor technologies based on
their hardware resources utilization. The others use benchmarking tools like Bonnie+
12
and benchmarking code psutil to evaluate the performance of file system and CPU,
Memory Utilizations. Their research also focused on CPU count, CPU times, Disk
Partition, Network I/O counter in Docker and Host OS. Their research found out that the
promising nature of docker in its near native performance.
R. Morabito, J. Kjällman, and M. Komu, in their paper [10] investigated on the
performance traditional hypervisor to lightweight virtualization solutions using various
benchmarking tools to better understand the platforms in terms of processing, storage,
memory and network. The authors results showed the level of overhead introduced by
containers to be almost negligible for linux and docker containers which enables dense
deployment of services.
13
3 METHODOLOGY
This chapter gives an explicative overview about the concepts of Virtualization,
Apache-Cassandra, Docker Containers that are used in the experimentation. The
objective of this chapter is to give an insight about these technologies involved.
3.1 No-SQL
Not only SQL (NoSQL) refers to progressive data management engines that were
developed in response to the demands presented by modern business applications, such
as scaling, being available always and very quick. It uses a very flexible data model that
is horizontally scalable with distributed architectures.
NoSQL databases provide superior performance, more scalable and address
problems that couldn’t be addressed by relational databases by:
• Handling large volumes of new, rapidly changing structured, semistructured, and unstructured data.
• Working in Agile sprints, iterating quickly and pushing the code every week
or sometimes multiple times a day.
• Using Geographically distributed scale-our architecture instead of using
monolithic, expensive one.
• Using easy and flexible Object Oriented Programming.
NoSQL provides this performance because it incorporates the following features:
1. Dynamic schemas
NoSQL databases can have data insertion without any predefined schema,
which makes it easy to make significant application changes in real work
environments. This makes it faster, reliable code integration.
2. Auto-sharing
NoSQL databases natively and automatically spread data across an arbitrary
number of servers, without the application requiring to know the
composition of server pool. Data and query load are balanced automatically.
A server can be transparently and quickly replaced with no disruption in the
application when a server goes down.
3. Replication
NoSQL databases support automatic database replication to maintain the
availability in the case of event failure or planned maintenance events. Few
NoSQL databases offer automated failover and recovery, self-healing
features, as well as the ability to geographically distribute database across
various locations in order to withstand regional failures and to enable data
localization.
4. Many NoSQL databases have integrated caching feature, keeping
recurrently used data as much as possible in the system memory and thereby
removing the need for a separate catching layer. Few databases have fully
managed, integrated in-memory database management layer for workloads
requiring high throughput and low latency.
Typically, NoSQL datatypes can be classified based on any of the four Data Models:
a. Document model
They pair each key with a complex data structure known as a document
whose structure is in JSON (JavaScript Object Notation). The schema is
14
dynamic which allows a document to have different fields. It makes it easy
to add new fields during development of an application.
Documents can contain many key-value or key-array pairs or even nested
documents. It has the broadest applications due to its flexibility, ability to
query on any field and natural mapping of the data model to objects in
modern programming languages.
E.g: MongoDB and CouchDB.
b. Graph Model
It uses graphical structures with nodes, edges and properties to represent
data. Data is modeled as a network of relationships between specific
elements. It is useful for cases in which traversing relationships are
applications core, like navigating through networks, social network
connections, supply chains.
Examples: Neo4j.
c. Key-value model
Key-value Stores is one of the most basic type of non-relational database
type. Every single item is stored in the database as an attribute name, or key,
together with its value. Data can only be queried with the use of the key. It
is a very useful model in case of representing polymorphic and unstructured
data, as the database doesn’t enforce a set schema across the key-value pairs.
Some key-value stores allow each value to have a type, like ‘integer’ that’s
adds functionality.
Example: Riak, Redis and Berkeley DB
d. Wide column model
Wide column stores usa a sparse, distributed multi-dimensional sorted map
for data storage. Every record can vary with the number of columns stored.
Columns can be grouped to form column families, or can be spread across
multiple column families. Data is retrieved with the use of primary key per
column family.
Example: HBase and Cassandra
3.1.1 Cassandra
Cassandra is a distributed no-SQL database for managing large amounts of
structured, semi-structured, and unstructured data across multiple data centers and the
cloud. Cassandra delivers continuous availability, linear scalability and its simplicity in
operation across many servers with no single point of failure, along with a powerful
dynamic data model which is designed to achieve flexibility and low latency which
enables fast response times.
Cassandra built-for-scale architecture enables to achieve massive volumes of data
handling, higher concurrent users/operations per second, fast write and read
performance, and deliverability of true linear scale performance in a masterless, scaleout design compared to other no-SQL databases.
15
Figure 1: Cassandra Architecture [6]
3.1.1.1 Cassandra Architecture
The dataflow of Cassandra operation is detailed in this section to give an overview
of Cassandra operation. It’s hardware is based on the understanding that system
hardware failures can and do occur. Cassandra address these failures by maintaining a
peer-to-peer distributed system across nodes among which data is distributed in the
cluster. Every node contains either all or some parts of the data present in the Cassandra
system. Each node exchanges information across the cluster or to any other node in the
cluster every second depending on the consistency level configured. When data is
entered into the cluster which is a write operation, it goes to a local persistent called
Commit Log, which maintains all the logs regarding the data in Cassandra.
3.1.1.2 Cassandra data insertion
Cassandra processes data at several stages on the write path, starting with the
immediate logging of a write operation till its compaction:
a) Logging writes and memtables storage:
When data enters into Cassandra, which is a write operation, it stores the
data in a structure in memory, the memTable, which is a write-back cache
of the data partitions, which Cassandra looks with the use of a key, and also
updates to the commit log on disk, enabling configurable durability. The
commit log is updated for every write operation made to the node, and these
durable writes survive permanently even in case of power failures.
b) Flushing data from memTable
When the contents in a memTable, which includes indexes exceed a
configurable threshold, it is put in a queue to be flushed to disk. The length
of the queue can be configured using the memtable_flush_queue_size
option in the cassandra.yaml file. If the data which is to be flushed is more
than the queue size, Cassandra stops write operations until the next flush
succeeds. Memtables are sorted by token and then written to disk.
c) Storing data on disk in SStables
When the memtables are filled, data is flushed in sorted order into the
SStables (sorted string tables). Data in the commit log is purged after its
16
corresponding data in memtable is flushed to an SSTable. All writes are
automatically partitioned and replicated throughout the cluster.
Cassandra creates the following structure for each SStable:
• Partition Index, which contains a list of partition keys and positions
of starting points of rows in the data file.
• Partition summary, which is a sample of the partition index.
• Bloom filter which is used to find out the SSTable which most
likely contains the key.
Figure 2 Cassandra write path [7]
d) Compaction
Cassandra using compaction, periodically consolidates SSTables,
discarding obsolete data and tombstone, which is a marker in a row that
indicates a column was deleted and exists for a configured time defined by
gc_grace_seconds value set on the table. During compaction, marked
columns are deleted. Periodic compaction is essential as Cassandra doesn’t
insert or update in place. Cassandra creates a new timestamped version of
inserted or updates data in another SSTable.
17
Figure 3 Cassandra Compaction process [7]
Compaction uses partition key and merges data in each SSTable. It selects
the latest data for storage by using its timestamp.
Cassandra can merge the data, without any random IO with the help of
partition key within each SSTable that enables rows to be sorted.
Subsequently evicting tombstones and removing deleted data, columns and
rows, SSTables are merged together into a single file in this compaction
process.
Old SSTables are deleted once the last reads finish using the files, which
enables in creating new disk space available for use. Cassandra with its
newly built up SSTable helps in handling more read requests efficiently than
before compaction.
Withstanding no random I/O occurs, compaction can be considered as a
heavyweight operation. When the old and new SSTables co-exist, there is a
spike in the disk space usage during compaction. To minimize the read
speeds, compaction runs in the background.
To reduce the impact of compaction on application requests, Cassandra does
the following operations:
• Controls
compaction
I/O
using
compaction_throughout_mb_per_sec (default 16MB/s).
• Requests OS to pull latest compacted partitions into page cache
Compactions that can be configured and designed for to run periodically
are:
• Size Tiered Compaction Strategy (STCS) for write-intensive
workloads
• Date Tired Compaction Strategy (DTCS) for time-series and
expiring data
• Leveled Compaction Strategy (LCS) for read-intensive workloads
18
3.1.1.3 Cassandra read operations
Cassandra fundamentally incorporates results from potentially multiple SSTables
and active memtables to serve a read. When a read request is raised, it checks the bloom
filter. Every SSTable has a bloom filter that is incorporated with it that inspects the
probability of having any data for the partition requested in the SSTable before
proceeding to do any disk I/O.
Figure 4 Cassandra read path [8]
If the Bloom filter doesn’t rule out the SSTable, Cassandra reviews the
partition key cache, which contains the partition index for a Cassandra table
and successively exhibits one of the following actions based on finding of
the index entry in the cache:
Ø If index entry is found in cache:
• Cassandra finds the compressed block having the data by
going to the compression offset map.
• It returns the result set by fetching the compressed data.
Ø If index entry is missing in cache:
• Cassandra determines the index entry approximate location
on disk by searching the partition summary.
• Later, to search the index entry, Cassandra hits the disk, and
performs a singles seek, sequential column reads in the
SSTable if the columns are adjoining.
• Cassandra finds the compressed block having data by going
to the compression offset map like in previous case.
• Finally, it returns the result set by fetching the compressed
data.
3.1.1.4 Data deletion
Data in Cassandra column has TTL (Time to live), which is an optional expiration
date that can be set by using CQL. Once the requested amount of time expires the TTL
data is marked with a tombstone whose period is set by gc_grace_seconds. Tombstones
are automatically deleted during compaction process.
Running a node repair is essential if there is any node failure during compaction
process as the node down will not be having the delete information sent by Cassandra
after gc_grace_seconds and may lead to appearance of deleted data.
19
3.1.1.5 Hinted handoff
It is a unique feature of Cassandra that helps in optimizing the consistency process
in a cluster when a replica-owning node is unavailable to accept a successful write
request.
When hinted handoff is enabled during a write operation, a hint that indicates a write
needs to be replayed to one or multiple nodes about dead replicas is stored by the
coordinator in the local system hint tables for either of these cases:
• A replica node for the row is known to be down before time.
• A replica node doesn’t respond to write request.
A hint consists of:
• Location of the replica that is down
• Actual Data written
• Version metadata
Once a node discovers that the node it holds hints for is up, it sends data row
corresponding to each hint to the target.
Figure 5 Cassandra hinted handoff [9]
Consider a three node cluster A, B and C, each row is stored on two nodes in a
keyspace with replication factor of 2. When node C is down and a write request is raised
from client to write row K, coordinator write a hint for node C and replicates data to
node B.
When client specified consistency level is not met, Cassandra doesn’t store a hint.
3.1.1.6 Consistency
It refers to the process of updating and synchronizing Casandra on all its replicas in
the cluster. Cassandra extends eventual consistency, that is if the data item has no new
updates, eventually all those who access the data item will return the last updated value
by using the concept of tunable consistency.
Cassandra can choose between strong and eventual consistency based on the need.
The number of replicas that need to acknowledge the write request to the client
application is determined by the write consistency level and the number of replicas that
must respond read request before returning data to client application is specified by read
20
consistency level. Consistency levels can be set globally or on a per-operation basis.
Few of the most used consistency levels are stated below:
• ONE
A response from one of the replica nodes is sufficient
• Quorum
A response from a quorum of replicas from any data center. The
quorum value is found from the replication factor by using the
formula. Quorum = (Replication Factor/2)
• All
Cassandra consists of a cluster of nodes, where each node is an independent data
store. A node is independent (It can be a server or a VM in the cloud). Each node
in Cassandra is responsible for a part of the overall database. It writes copies of
data on different nodes so as to avoid any single point of failure. Replication factor
is used to set the number of copies of data in the node. A replication strategy is
used to replicate data across multiple servers and data centers. In Cassandra, all
nodes play equal roles; with nodes communicating with each other equally. There
is no master node so there is no single point of failure and all the data has copies in
other nodes which secures the data stored. It is capable of handling large amounts
of data and thousands of concurrent users or operations per second across multiple
data centers.
3.2 Virtualization and container based virtualization
3.2.1 Virtualization
Virtualization refers to the act of creating a virtual version of something. It is used
as a technique for portioning or dividing resources of a single server into multiple
separated and isolated environments. The result is the creation of logical units called
virtual machines. The virtual machines are given access to the hardware resources and
controlled by a software called Virtual Machine Monitor (VMM) or hypervisor.
It aims to make the most use of the server hardware resources. It creates multiple
virtual machines (VMs) in a single physical machine. Virtualization makes use of
hypervisors to create virtual machines. The server is the host and the created virtual
machines are known as guests on the server. Each virtual machine has a part of the host
server’s resources (like CPU, disk, memory, I/0 etc.) which are handled by the
hypervisor.
3.2.2 Container based virtualization
Container-based virtualization differs from virtualization in the traditional sense that
it does not have an operating system of its own but uses the kernel features such as
namespaces and cgroups to provide an isolated layer. Operating system is virtualized
while the kernel is shared among the instances. A separate guest OS is not needed in the
container case as the instances isolated by the container engine share the same kernel
with the host OS
21
Figure 6 Virtual machines vs containers
3.2.3 Container based virtualization vs traditional virtualization
Virtual Machines
Represent Hardware-level virtualization
They can be run on any operating system
guest OS
They are heavyweight because they run
operating systems
Slow provisioning
Fully isolated and hence more secure
Containers
Represent operating system virtualization
They work only on Linux based systems
They are lightweight
Fast provisioning and scalability
Process level isolation and hence less secure
Table 1 Virtual machines vs containers
There are multiple container technologies. They have been enumerated as follows:
• LXC (Linux containers): This is one of the most popular containerization
technologies. It represents running multiple isolated Linux systems
(containers) on a single Linux machine.
• OpenVZ: It is also based on Linux kernel and operating system.
• Solaris Containers: A Solaris Container is a combination of the system
resource controls and boundary separation provided by zones.
3.2.4 Docker
Docker is an open containerization platform. It uses the same principles as LXC
containers but also adds a few features and has greater usability than any other container
management software. It has its own public registry from which we can pull images to
run containers and also push images for others to use. It is designed with the purpose of
using the container technology in the production environment.
Why Docker?
Docker has several features which makes it a leading containerization technology.
3.2.4.1 Faster delivery of applications
Docker is optimum in aiding with the development lifecycle. Docker allows
developers to develop on local containers that contain applications and service and
later integrate everything into a continuous integration and deployment workflow.
22
3.2.4.2 Deploying and scaling
High portable workloads can be handled because of its container based platform
which also enable it to run on local host, physical or virtual machines in a data center,
or in the cloud.
Dynamical Management of workloads is possible because of Docker’s portability
and lightweight nature. Scaling up and tearing down applications and services is quicker
in docker which enables scaling to be almost real time.
3.2.4.3 Higher density and multiple work load applications
Because of its lightweight and fast nature, docker provides a cost-effective, viable
alternative to hypervisor-based virtualization which enhances its usage in high density
environments: for example, in building a Platform-as-a-service PAAS or a own cloud.
It is also effective in usage for small and medium size deployments where one wants to
get more of the limited resources possessed.
3.2.4.4 Underlying technology
Docker is built on the following underlying technologies.
Namespaces
Docker uses namespaces to provide isolated workspaces. When a container is run,
docker creates a set of namespaces for that container. This provides an isolation layer
i.e. each aspect of container runs its own namespace and doesn’t have external access
outside it. Few commonly used namespaces for Linux are as follows:
• pid namespace: Process Isolation (PID: Process ID)
• net namespace: Managing network interfaces (NET: Networking)
• ipc namespace: Managing access to IPC resources (IPC: Inter Process
Communication)
• mnt namespace: Managing mount-points (MNT: Mount)
• uts namespace: Isolating kernel and version identifiers. (UTS: Unix
Timesharing System).
Control Groups
Docker engine uses cgroups or control groups to share available hardware resources
to containers and, if required, set up limits and constraints. Running applications in
isolation requires containers to use only the resources they are intended to, which
ensures them to be good multi-tenant citizens on a host. Ex: Limiting the disk usage of
a specific container.
Union file systems
Union file systems, or UnionFS, are file systems that operate by creating layers,
making docker very fast and lightweight environment. UFS is used by Docker engine
and acts as building block for containers. Docker Engine utilizes various union file
system variants of them which include: AUFS, btrfs, vfs, and DeviceMapper.
Container format
Container format is a wrapper created by Docker Engine which combines all these
components. Current default container format is libcontainer.
23
4 METHODOLOGY
METHOD
This section describes the research methodology and the experimental test-bed
followed in our thesis work:
4.1 Ways to study a system
There are few different ways to study a system and understand its operations,
relationship with resources, and measure and analyze its performance.
Figure 7 Ways to study a system
Experiment with the actual system and Experiment with the model of a system are
two ways to do it. Experiment on a physical model of a system is considered for this
thesis since our goal is to evaluate the performance of the Cassandra database in a
physical server and comparing that to Cassandra implemented in docker containers, a
physical model will show the performance overhead without having the need to have
production servers. Having a model of a system allows us to make alterations to the
data-model and schema allowing us to make a detailed analysis of the performance
overhead in two different scenarios.
Furthermore, we consider the physical model over the mathematical model since a
mathematical model is an abstract version of the system and shows the logical
relationships between components. Our primary goal is to evaluate the performance of
the database in real time load cases and comparing that with the performance in docker
containers. For this purpose, we find that a model of a system is best suited for our
analysis.
4.2 Methodology for analyzing a system.
24
There are two experimental test-beds in our thesis because we aim to investigate the
performance of the database in a native bare-metal server and Cassandra running in
docker containers. Both the test-beds are in the same servers and hence use the same
physical resources.
4.2.1 SAR Tool
The sysstat package provides the sar and iostat system performance utilities. SAR Collect, report, or save system activity information. Sar is used to collect the CPU usage
of the system. Sar takes a snapshot of the system at periodic intervals, gathering
performance data such as CPU utilization, memory usage, interrupt rate, context
switches, and process creation. When using sar CPU utilization is shown in different
fields. %user gives the CPU utilization that occurred while executing at the application
level. This field includes the time spent running virtual processors. %usr gives the CPU
utilization that occurred while executing at the application level as well but it does not
include time spent running virtual processors. %nice gives the utilization while
executing at the user level with nice priority. %idle gives the percentage of the time the
CPU or CPU's were idle and the system did not have an outstanding dick I/O
request. %idle gives the summation of all the others and hence is the absolute CPU
utilization.
4.2.2 Iostat tool
Iostat reports CPU statistics and input/output statistics for devices and partitions.
The Iostat tool is used for monitoring system input/output device loading by observing
the time the devices are active in relation to their average transfer rates. The Iostat
reports can be used to change the system configuration to better balance the input/output
load between physical disks.
Iostat tool generates two kinds of reports, the CPU utilization report and the Device
utilization report. We are primarily concerned with the disk utilization report of the iostat
tool. It provides statistics per physical device or on partition basis. It provides the
following lines.
Tps- Indicate the number of transfers per second that were issued to the device. A
transfer is an I/O request to the device. Multiple logical requests can be combined into
a single I/O request to the device. A transfer is of indeterminate size.
kB_read/s- Indicate the amount of data read from the drive expressed in kilobytes
per second. Data displayed are valid only with kernels 2.4 and newer.
kB_wrtn/s- Indicate the amount of data written to the drive expressed in kilobytes
per second. Data displayed are valid only with kernels 2.4 and newer.
kB_read- The total number of kilobytes read. Data displayed are valid only with
kernels 2.4 and newer.
kB_wrtn The total number of kilobytes written. Data displayed are valid only with
kernels 2.4 and newer.
4.2.3 Cassandra-Stress tool
The Cassandra-stress tool is a Java-based stress utility for basic benchmarking and
load testing a Cassandra cluster. We are using the stress tool to create a key space called
keyspace1 which has the table standard1 and in each of the servers and we observe the
CPU and Disk utilization of the respective servers using the sar and iostat tools. Creating
the best data model requires significant load testing and multiple iterations. The
Cassandra-­stress tool helps us in this endeavor by populating our cluster and
supporting stress testing of arbitrary CQL tables and arbitrary queries on tables.
25
4.2.4 Test-bed 1: Cassandra on Native Bare-metal Server
This is our first test-bed which is running Cassandra data-base in a native bare-metal
server. The physical testbed topology is shown in below.
Figure 8 Cassandra in native bare-metal server
4.2.4.1 Test-bed details
The Test bed details are as follows
Operating system
Ubuntu 14.04 LTS (GNU/Linux 3.19.0-49-generic x86_64)
RAM
23 GB
Hard-disk
279.4 GB
Processor
12 cores, 2 threads per core ---> 24 theoretical cores
Cassandra
3.0.8
Cassandra-stress tool
2.1
Table 2 Test-bed details bare-metal
4.2.4.2 Performance evaluation of cassandra in native bare-metal servers
In our experimental test-bed, we have four hosts (one source and three destination
hosts) with Cassandra 3.0.8 installed in each of them. Cassandra package comes with a
command-line stress tool (Cassandra-stress tool) to generate load on the cluster of
servers, cqlsh utility, a python-based command line client for executing Cassandra
Query Language (CQL) commands and nodetool utility for managing a cluster. These
tools are used to stress the servers from the client and manage the data in the servers.
26
In order to do a performance analysis of Cassandra database we consider three
modes of operation of the database - read, write and mixed load operation. Each of the
servers have the same configuration of hardware, software and network used. We ensure
that the RAM and Hard-disk of each of the servers are equivalent after each iteration of
the experiment to ensure the integrity of the results.
Cassandra database has access to full hardware resources of the servers as there is
no other competing application for the resources. The Cassandra-stress tool creates a
keyspace called keyspace1 and within that, tables named standard1 or counter1 in each
of the nodes. These are automatically created the first time you run the stress test and
are reused on subsequent runs unless you drop the keyspace using CQL. A write
operation inserts data into the database and is done prior to the load testing of the
database. We use standard tools sar and iostat to measure the load on each server.
To form the cluster in the three nodes, we install the database in each of the servers.
we set the listen address, rpc address and broadcast address in the cassandra.yaml file to
the ip address of the node. The IP address of the one of the nodes is set as the seeds ip
address of the cassandra cluster. Setting the seeds in the cassandra. yaml file allows the
nodes to communicate with each other and form the cluster.
Sar takes a snapshot of the system at periodic intervals, gathering performance data
such as CPU utilization, memory usage, interrupt rate, context switches, and process
creation. %idle value of the sar tool gives the total percentage of the time the CPU's
were idle which in turn gives us the percentage of CPU's total usage. iostat reports CPU
statistics and input/output statistics for devices and partitions. The iostat tool is used for
monitoring system input/output device loading by observing the time the devices are
active in relation to their average transfer rates. The amount of disk resources utilized
is collected with iostat. It monitors the system by observing the time devices are active
in relation to their average transfer rates. We are primarily interested in the kB_wrtn/s
value of the iostat tool which indicates the amount of data written per second to the disk.
Using the stress tool, we generate write data on the cluster. This is to have a
preloaded data in all the nodes. On this data set we have three cases -mixed operation,
read operation and write operation for a duration of 20 minutes and record the values of
CPU and Disk utilized. Average values of CPU utilized and Disk utilized for an interval
of 30 seconds is taken for the entire duration of 20 minutes for the servers in the cluster.
Latency is measured in the stress server and is total time taken by the write or read
request from the stress server to the time an acknowledgement is received to the server.
4.2.5 Test-bed 2: Cassandra on docker
This is our second test-bed which is running Cassandra data-base in a docker
container. The physical testbed topology is shown in below.
27
Figure 9 Cassandra in Docker
4.2.5.1 Test-bed details
The configuration is as follows
Operating system
Ubuntu 14.04 LTS (GNU/Linux 3.19.0-49-generic x86_64)
RAM
23 GB
Hard-disk
279.4 GB
Processor
12 cores, 2 threads per core ---> 24 theoretical cores
Cassandra
3.0.8
docker
1.11.2
Cassandra-stress tool
2.1
Table 3 Docker Testbed details
4.2.5.2 Cassandra in docker
We use the docker image from the Cassandra docker hub. The image is built from a
docker file and is available the docker hub page for Cassandra. The image comes with a
default script to run when the Cassandra container is started. This default script allows
us to set the environment variables in the cassandra.yaml file directly. To run cassandra
in the seed node we use the command.
$ docker run --name Cassandra -d -p 9042:9042 -p 7199:7199 -p 7000:7000 –e
CASSANDRA_BROADCAST_ADDRESS=194.47.131.212 cassandra:3.0.8.
This starts the container with the name some-cassandra. -d option runs the container
in the background, -e option is to set the environment variable in the cassandra yaml
file, -p option forwards the traffic from the respective port of the host to the container
28
and finally cassandra version that we use is 3.0.8 which is same as the one in bare-metal
server.
In the other two nodes we start cassandra with
$ docker run --name Cassandra -d -p 9042:9042 -p 7199:7199 –p 7000:7000 –e
CASSANDRA_BROADCAST_ADDRESS=194.47.131.211
–e
CASSANDRA_SEEDS=194.47.131.212 cassandra:3.0.8.
$ docker run --name Cassandra -d -p 9042:9042 -p 7199:7199 –p 7000:7000 –e
CASSANDRA_BROADCAST_ADDRESS=194.47.131.211
–e
CASSANDRA_SEEDS=194.47.131.212 cassandra:3.0.8
CASSANDRA_SEEDS value points to the ip address of the seed node which forms
the cluster and communicates with the other nodes.
4.2.5.3 Performance evaluation of cassandra in docker
Once the Cassandra cluster is formed in the docker container on the host OS. We
consider the same three modes of operation of the database -read, write and mixed load
operation. Cassandra in docker has access to all the hardware resources of the servers
again as there is no competing application for the resources.
We use sar and the iostat tools on the host operating system while running the
database in the container. This gives us the overhead of the CPU and disk resources used
by deploying the database in a container. We consider the same parameters of CPU
utilization from %idle value of sar and Disk utilization from kB_wrtn/s of iostat. We
follow the same procedure as we did for bare-metal evaluation.
4.3 Statistical and measurement based system analysis
Taking a random sample from a population to compute mean is to find the
approximation of mean of a sample. In order to see how well the mean estimates the
underlying population can be taken care of by building Confidence Intervals.
Confidence Interval gives the range of values within which there is a specified
probability that the value of a parameter lies in it.
95% confidence Interval means if we used the same sample to compute to compute
the estimate on various occasions and interval estimates are made on each occasion, then
we can expect the true population to fall within the interval 95% of the time.
Confidence Intervals are calculated based on standard deviation (deviation from
mean value), population size and confidence level.
Standard Deviation is calculated using the below formula.
𝜎=
1
𝑁
,
(𝑥' − µμ)+
'-.
Where µ = mean
𝑥' = Value of ‘i’th element in the sample
N = Total number of elements in the sample
σ = Standard Deviation
29
Then calculate the z value which is based on z table for 95% confidence interval the
value is 1.96.
Population Mean is given by the following formula.
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑀𝑒𝑎𝑛 (µμ) = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 +
𝑠𝑎𝑚𝑝𝑙𝑒 𝑒𝑟𝑟𝑜𝑟
−
Where, Sample error: (𝑧 ∗ 𝜎)/ 𝑛
The confidence interval is the difference between upper and lower bounds of
population mean.
4.4 Scenarios
There are three scenarios in evaluating the performance of the database in both the
cases of Cassandra in docker and Cassandra in bare-metal servers. The mixed load
scenario, the read scenario and the write scenario. Each scenario has two cases, the
maximum load case and 66% of the maximum load case. For all the scenarios we have
the database having 11GB of data. On this data set we perform the read, write and mixed
load (3 reads and 1 write) operations.
Figure 10 Scenarios
Scenario 1: Mixed load
From the stress server, a mixed load of 3 reads and 1 write is generated for a duration
of 20 minutes on the cluster. The maximum load (the maximum number of operations
per second on the cluster) is observed in the case of 450 threads and this value is fixed
to generate the load on the cluster. Mixed load is generated using the command:
$ cassandra-stress mixed ratio\(write=1,read=3\) duration=20m cl=ONE -pop
dist=UNIFORM\(1..50000000\) -rate threads\=450 -node 194.47.131.212;
Duration specifies the time in minutes to run the load, cl indicates consistency level,
-rate threads indicates the number of threads and –node indicates the node to generate
30
the node to. 66% of maximum load is generated by changing the number of threads from
450 to 150. The command to generate it is:
$ cassandra-stress mixed ratio\(write=1,read=3\) duration=20m cl=ONE -pop
dist=UNIFORM\(1..50000000\) -rate threads\=150 -node 194.47.131.212;
Scenario 2: Write load
To analyze the performance in the case of write operations, a write load is generated
on the cluster for a duration of 20 minutes. The maximum write load is in the case of
450 threads. The write load is generated using the command.
$ cassandra-stress write duration=20m cl=ONE -pop
dist=UNIFORM\(1..50000000\) -rate threads\=450 -node 194.47.131.212;
The 66% of the maximum load is observed in the case of 150 threads and it is
generated using the command.
$ cassandra-stress write duration=20m cl=ONE -pop
dist=UNIFORM\(1..50000000\) -rate threads\=150 -node 194.47.131.212;
Scenario 3: Read load
To analyze the performance in the case of read operations, read load is generated on
the cluster for a duration of 20 minutes. The maximum read load is in the case of 450
threads. The read load is generated using the command.
$ cassandra-stress read duration=20m cl=ONE -pop
dist=UNIFORM\(1..50000000\) -rate threads\=450 -node 194.47.131.212;
The 66% of the maximum load is observed in the case of 150 threads and it is
generated using the command.
$ cassandra-stress read duration=20m cl=ONE -pop
dist=UNIFORM\(1..50000000\) -rate threads\=150 -node 194.47.131.212;
4.5 Metrics
The following metrics are to be known to understand the monitoring the system
performance while running the Cassandra database.
Read requests: following metrics are to be known to understand the monitoring the
system performance while running the Cassandra database.
Write request latency: following metrics are to be known to understand the
monitoring the system performance while running the Cassandra database.
Read request latency: following metrics are to be known to understand the
monitoring the system performance while running the Cassandra database.
Operations per second: The total number of write and/or read requests generated by
the client per second is the operations per second.
Cpu utilization: The total number of write and/or read requests generated by the
client per second is the operations per second
31
Disk throughput: The total number of write and/or read requests generated by the
client per second is the operations per second.
32
5 RESULTS
5.1 Individual results
5.1.1 Experiment description for disk utilization
Disk utilization of a server for a process is the amount of disk space occupied by the
process during the run time of the process. In our experiment to evaluate the disk
utilization of the servers we generate the data-set from load generator into the cluster
having three Cassandra nodes. Each node is an individual server running Cassandra. The
Cassandra cluster is created by configuring the Cassandra.yaml file in each of the
servers. The disk utilization is measured on the servers running the Cassandra database.
The disk usage of each of the servers while the load is generated on the cluster for
read, write and mixed operations is considered for this thesis work. The amount of disk
space used depends on the data stored in the physical memory before it is flushed to the
disk. If for some reason the memTables are not flushed to the disk. Then there will be
no disk utilization.
The disk utilized is a collection of memTables and compaction together. Disk
throughput depends on the rate of requests sent to the cluster. As the number of requests
increases the load increases as observed when comparing the maximum load case
against the 66% of the maximum load case. This is because more data is flushed to the
disk and compactions occur quickly.
Cassandra 3.0.8 is installed in four servers (one as load generator and three servers
forming the Cassandra cluster). The load generator runs for the different load cases of
read, write and mixed load and The write Kbps of iostat tool gives the disk utilization.
While the load is generated at the client, we run the iostat tool on the servers running the
database. The values of disk utilization are collected using the bash command.
Iostat -d 30 40 | grep sda | awk '{print $4}' > disk
The same experiment is repeated for docker by installing Cassandra inside a docker
container, forming a cluster by configuring the cassandra. yaml, running the load
generator for read, write and mixed load cases and collecting the values of disk
throughput in the servers.
Scenario 1: mixed load operations
Mixed load of three reads and one write are taken for a duration of 20 minutes to
observe the server’s disk utilization.
Disk throughput for 100% load: The total op/s sec pushed to the cluster is around
150000. We use the default Cassandra configurations for memory and other parameters.
The stress is generated over a cluster that has 11 GB of data. On this cluster we perform
the mixed load operation of 3 reads and 1 write. The maximum Cassandra in docker disk
utilization is 21795.61 Kbps and the maximum Cassandra bare-metal disk utilization is
21051.96 Kbps.
33
Disk_throughput (Write
Ops/sec)
Disk throughput for 100% Mixed Load
25000
20000
15000
10000
5000
0
0
200
400
600
800
Time (sec)
Baremetal_Disk_Usage (%)
1000
1200
1400
Docker_Disk_Usage (%)
Figure 11 Disk Utilization for 100% Mixed load
Disk_throughput (Write Ops/sec)
• Values indicate that there are more compactions happening on the disk of
the server because memTables are flushed.
• The disk utilization is around 5000 - 22000 op/s in both the cases of Docker
and bare-metal servers.
95 % Confidence Intervals for Disk
throughput_100%_Mixed Load
30000
20000
10000
0
0
200
400
600
800
Time (sec)
Baremetal_Disk_Usage (%)
1000
1200
1400
Docker_Disk_Usage (%)
Figure 12 95 % Confidence Intervals for disk throughput_100%_mixed
• This is the same graph as above with the confidence intervals. There is high
standard deviation in the results because compaction can occur at any time
and this is true both in the case if bare-metal servers and Docker. Although
this shows ambiguity in the results because of compactions the overall
average indicating similar performance for both Docker and bare-metal
servers.
Disk utilization for 66% load: At 66% of the maximum load. The total op/s pushed
from the cluster is around 100000. The maximum disk utilization for docker is
19020 and bare-metal is 19646.
34
Disk throughput (Write
Ops/sec)
Disk Throughput for 66% Mixed Load
30000
20000
10000
0
0
200
400
600
800
Time (sec)
Baremetal_Disk_Usage (%)
1000
1200
1400
Docker_Disk_Usage (%)
Disk throughput (Write Ops/sec)
Figure 13 Disk Throughput for 66% Mixed load
95% Confidence Intervals for Disk
throughput_66% Mixed Load
25000
20000
15000
10000
5000
0
0
200
400
600
800
Time (sec)
Baremetal_Disk_Usage (%)
1000
1200
1400
Docker_Disk_Usage (%)
Figure 14 95% Confidence Intervals for Disk throughput_66% Mixed Load
Average values for Disk utilization: The average value for each iteration is
calculated by summation of all the values and dividing by total number is values. Since
we have done 10 iterations, the average values of 10 iterations are then averaged to get
the average disk utilization. We also divide the iterations in multiple intervals of 2
halves, 4 halves and 8 halves (i.e., the first half from 0-600 and second half from 6001200 and so on). Perform similar analysis as in the case of total average and note the
average values of each interval.
35
Disk Throughput ()Write op/s
Average value of Disk throughput 15100
10100
5100
100
BareMetal
Docker
Disk Throughput ()Write op/s
Figure 15 Average disk throughput
Average value of Disk throughput (8 intervals)
20000
15000
10000
5000
0
BareMetal
Docker
Figure 16 Average disk utilization (8 intervals)
• The average value for 10 iterations for the case of bare-metal and docker is
taken.
• Also, the entire duration is divided into multiple intervals of 2 halves, 4
halves, and 8 halves. Cassandra in Docker shows better performance in the
first 4 halves and in the second half bare-metal shows better disk
throughput.
Scenario 2: Write operations
We generate a write load on the cluster for a duration of 20 minutes for an
experiment. The experiment has been run for 10 iterations to deal with stochastic
variations caused by kernel, and measurement errors as well for the 95% confidence
interval analysis.
Disk throughput for 100% load:
36
Disk throughput (Write Ops/sec)
Disk throughput for 100% Write Load
50000
40000
30000
20000
10000
0
0
200
400
600
800
Time (sec)
Baremetal_Disk_Usage (%)
1000
1200
1400
Docker_Disk_Usage (%)
Figure 17 Disk throughput for 100% Write Load
• The values of disk utilization of Cassandra in bare-metal outperforms
Cassandra in docker for 100 write load. The difference in the both is quite
small.
Disk throughput (Write Ops/sec)
95% Confidence Interval for Disk
throughput_100% Write Load
50000
40000
30000
20000
10000
0
0
200
400
600
800
Time (sec)
Baremetal_Disk_Usage (%)
1000
1200
1400
Docker_Disk_Usage (%)
Figure 18 95% Confidence Interval for disk throughput_100% write load
37
Disk throughput (Write Ops/sec)
Disk throughput for 66% Write Load
50000
40000
30000
20000
10000
0
0
200
400
600
800
1000
1200
1400
Time (sec)
Baremetal_Disk_Usage (%)
Docker_Disk_Usage (%)
Figure 19 Disk throughput for 66% Write load
Disk throughput (Write
Ops/sec)
95% Confidence Interval for Disk throughput_66% Write
Load
60000
40000
20000
0
0
200
400
Baremetal_Disk_Usage (%)
600
800
Time (sec)
1000
1200
1400
Docker_Disk_Usage (%)
Figure 20 95% Confidence Interval for disk throughput_66% write
• The Disk utilization in Docker is slightly lower than the case of bare-metal
server. The overhead though present is low when compared to virtual
machines.
• The Standard deviation from mean is quite high in the case of disk
utilization because of compactions.
Scenario 3: Read operations
The cassandra read path is different from the write path. It first stores all the data in
the memory and reads from the memory. If the data is found in the memory, then
the disk is not searched for the data. In our case the entire data can be stored on the
memory so there is no disk throughput for read operations.
38
Disk_throughout (Write
Ops/sec)
Disk throughput for 100% Read Load
6000
4000
2000
0
0
200
400
600
800
Time (sec)
Baremetal_Disk_Usage (%)
1000
1200
1400
Docker_Disk_Usage (%)
Figure 21 Disk throughput for 100% Read load
5.1.2 Experiment description for latency:
The latency of the operations is the time taken for the operations (either read or
write) to be executed and generate a response from the cluster running Cassandra. The
latency values are noted at the load generator server.
The latency of operations is noted while the load is generated on the cluster for read,
write and mixed operations. The results for different load cases and operations are shown
below. As the load increases the latency of the operations increases because there are
more number of responses for the requests per second.
The overhead in the case of docker can be due to addition of port forwarding. We
connect the ports 7000 and 9042 of the physical host to that of the container which could
add a network overhead.
Scenario 1Mixed load operations: We generate a mixed load operation of 3 reads
and 1 write on a cluster having 11GB of data for a duration of 20 minutes. The average
latency of the operations is taken for one iteration. We perform 10 such iterations and
the average value of the latency for these iterations is taken as the result.
Max mixed Load Latency 3.5
3
2.5
2
1.5
1
0.5
0
3.13
3.13
Latency_Mean
Bare Metal
Docker
Figure 22 Max mixed load latency (in ms)
39
66% of Max load Latency
2
1.45
1.4
1.5
1
0.5
0
Latency_Mean
Bare Metal
Docker
Figure 23 66% of Max load latency (in ms)
• The mean value of the latency is in milliseconds.
Write operations: We generate a write load case on the cluster having 11GB of data.
The write load is run for a duration of 20 minutes. We take 10 such iterations and
perform mean value analysis.
Max load write opearations latency
3.5
3.29
3.07
3
2.5
2
1.5
1
0.5
0
Latency_Mean
Bare Metal
Docker
Figure 24 Max load write operations latency
40
66% of max load write operations latency
1.6
1.4
1.38
1.49
1.2
1
0.8
0.6
0.4
0.2
0
Latency_Mean
Bare Metal
Docker
Figure 25 66% of max load write operations latency
• The average latency for 10 iterations in the case of max load write
operations is 3.07 ms for bare metal and 3.29 in the case 66% load docker
1.38 ms for bare metal, 1.49ms for docker. this means that docker adds
about 6-7% network overhead in the case of write operations.
5.2 Common results:
These are the results of CPU utilization experiment performed by Avinash. The
methodology is the same as in the case of Disk throughput.
5.2.1 Cpu utilization:
It is the number of processor cycles that are utilized by the process. In our
experiment we only run one process inside the servers to evaluate the performance of
the database. All the experiments are on the cluster having three nodes with 11GB data.
Cassandra 3.0.8 is installed in the servers and the cluster is formed with three nodes
(Servers running the database). The load generator runs for the different load cases of
read, write and mixed load and CPU utilization of the servers is collected from the %idle
value of the sar tool which gives the percentage of the time the cpu is utilized and
100-%idle gives the cpu utilization. It is obtained by running the following command in
the bash terminal:
$ sar -­u 30 40 | awk '{print $8 "\t" $9 }' > cpu
The average values of 30 seconds of the cpu utilization are taken for 40 intervals
making the total run time equivalent to 1200 sec or 20 minutes. This experiment is
repeated for Cassandra in docker and values are noted. The graphs below show these
results for values workloads.
These results show the essence of the experiment for CPU utilization performed by
Avinash. A more detailed analysis and results is presented in his thesis document.
41
CPU_Utilization (%)
CPU Utilization for 100% Mixed Load
100
80
60
40
20
0
0
200
400
600
800
Time (sec)
Baremetal_CPU_Usage(%)
1000
1200
1400
Docker_CPU_Usage (%)
Figure 26 CPU Utilization for 100% Mixed load
CPU_Utilization (%)
95% Confidence Intervals for CPU
Utilization_100%_Mixed Load
100
50
0
0
200
400
600
800
Time (sec)
Baremetal_CPU_Usage(%)
1000
1200
1400
Docker_CPU_Usage (%)
Figure 27: 95% Confidence Intervals for CPU utilization_100%_mixed load
• The cpu utilization is given in terms of total cpu’s utilized. The results
are seen to be similar for both docker and bare-metal.
The confidence interval values for CPU utilization are quite small showing that there
is less standard deviation.
Average values for mixed load:
42
CPU_Utilization (%)
Average CPU utilization 80
60
40
20
0
BareMetal
Docker
Figure 28 Average CPU utilization
CPU_Utilization (%)
Average value of CPU utilization (8 intervals)
80
60
40
20
0
BareMetal
Docker
Figure 29 Average value of CPU utilization
43
Latency: The average values of these results show the latency of operations in the
case of read scenario.
Read operations:
Max Load Read operations latency
3.5
3
2.5
2
1.5
1
0.5
0
3
2.82
Latency_Mean
Bare Metal
Docker
Figure 30: Max Load Read operations latency
66% of max load read operations latency
1.36
1.3
1.5
1
0.5
0
Latency_Mean
Bare Metal
Docker
Figure 31: 66% of max load read operation
5.3 Discussion
Disk utilization: In the case of mixed load operation, the average disk utilization of
the servers while running the Cassandra database in Docker was higher. The mean
average value in the case of bare-metal is 9675.87 and in the case of docker is 10234.487.
Though this is an anomaly, there is a possibility that running the experiment for a longer
time would show us that the disk utilization is the equivalent for both Cassandra in
Docker and in bare metal as the average values for 2 halves, 4 halves and 8 halves show
that the bare-metal performance is better later on.
44
From the mean value analysis, we speculate that the performance will be the same
for both Cassandra in Docker and bare-metal. There is high standard deviation in the
results because compaction can occur at any time and this is true both in the case if baremetal servers and Docker.
In the case write work load disk utilization is greater in the case of bare metal and
the time series analysis shows that it is consistently better in both the cases of maximum
load and 66% of the maximum load. Although this is the case the standard deviation is
quite high for both the scenarios. Hence it is not possible to predict the exact overhead
of running the database in containers. Though the overhead might be there, it seems that
the overall overhead would be small.
Latency: The average latency in the case of mixed load operations is the same in
both docker and bare metal for maximum load case.
In the 66% case we see that the average latency of 10 iterations for bare metal is 1.4
and 1.45 for docker. The average latency for 10 iterations in the case of max load write
operations is 3.07 ms for bare metal and 3.29 in the case 66% load docker 1.38 ms for
bare metal, 1.49ms for docker. this means that docker adds about 6-7% network
overhead in the case of write operations.
The latency in the case of reads is 3.07 in the case of bare-metal and 3.29 in the case
of docker. In the case of 66% load the latency is 1.38 in bare metal and 1.49 in the case
of docker.
Cpu utilization: In the case of mixed load operations, the cpu utilization in baremetal server running Cassandra is 63.43 and docker running Cassandra is 63.66. The
CPU utilization is slightly higher in the case of write workloads and read workloads but
the confidence interval values largely overlap showing that the values are almost the
same.
45
6 CONCLUSION AND FUTURE WORK
The purpose of this thesis was to monitor the disk performance and latency of the
Cassandra database in the case of bare-metal servers and compare its performance
with the case of containers. Standard tools for monitoring the performance of servers
– sar for cpu utilization and iostat for disk utilization were used. The overhead in the
case of containers would decide the feasibility of using containers for running the
database.
Firstly, from the results, server disk utilization in the case of bare-metal servers and
docker have shown equivalent values in the case of mixed load, Cassandra in baremetal slightly outperform. In the case of latency of operations there is overhead in
the case of Cassandra in docker for write and mixed load cases. Different cases of
operations of the database and load cases to analyze the performance of the database
are shown and time series values for each case is graphed. This shows a variability
in the performance of the database with time and the average values are compared
for docker and bare-metal cases. The limitation of this methodology is that we do
not how compactions affect the disk utilization from our results by themselves. One
way for a deeper analysis of the database would be to check the nodetool
compactionhistory values to see the number of compactions happening and see if it
corresponds to the disk utilization results to show that the disk utilization at a
particular instance is due to compaction and explain the anomalies in instances
where Cassandra in docker has outperformed Cassandra in bare-metal. By
combining the system analysis results with the results from observing the cassandra
based metrics like compaction history, we can create a more wholesome
methodology for performance analysis of the database.
The research questions for the thesis are answered by the results:
6.1 Answer to research questions:
• Research question 1: What is the methodology for analyzing the
performance of databases in virtual machines?
The answer to this question is provided in the methodology section [see
section 4]. The various operations (read, write and mixed) for the database
are considered and for each operation of the database the disk utilization of
the database is monitored using sar and iostat tools for both Cassandra in
bare-metal and Cassandra in docker.
• Research question 2: What is the Disk utilization of the server while running
the database in bare-metal case and in containers?
This research question can be answered using the results from the
experiment for disk utilization given in section 5.1. The disk utilization of
the servers is collected by the iostat command tool and statistical averages
are performed on the results to provide the mean value analysis and
confidence intervals. The results show that the disk utilization for the mixed
load scenario is equivalent in both the cases of docker and bare-metal, write
disk utilization for bare-metal is slightly higher when compared to docker
containers and read disk utilization is almost zero because all the data is
read from the memory.
• Research question 3: What is the and latency of operations while running
the database in bare-metal case and in containers?
The experiment for latency of operations [see section 5.2] is designed to
answer this research question. The latency of operations is collected at the
stress generator, the values for mixed load indicate a slight overhead in the
case of containers, values in the read and write load cases indicate a 6-8%
overhead in the case of containers.
46
• Research question 4: How does this performance vary with the different
load scenarios?
The answer to this question can be found in section 5. Two different load
cases were considered for bare metal and docker. In both the load cases the
performance of the database was similar.
Future Work:
Future work can be done in the area of performance tuning the cassandra database
depending on the scenarios, performance considering different compaction
strategies and compression and consider the performance of the database in while
running other applications on the servers.
47
REFERENCES
[1] R. Lawrence, “Integration and Virtualization of Relational SQL and NoSQL Systems
Including MySQL and MongoDB,” in Proceedings of the 2014 International Conference on
Computational Science and Computational Intelligence - Volume 01, Washington, DC,
USA, 2014, pp. 285–290.
[2] E. Casalicchio, L. Lundberg, and S. Shirinbad, “An Energy-Aware Adaptation Model for
Big Data Platforms,” in 2016 IEEE International Conference on Autonomic Computing
(ICAC), 2016, pp. 349–350.
[3] “NoSQL Comparison Benchmarks,” DataStax. [Online]. Available:
http://www.datastax.com/nosql-databases/benchmarks-cassandra-vs-mongodb-vs-hbase.
[Accessed: 14-Sep-2016].
[4] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison
of virtual machines and Linux containers,” in 2015 IEEE International Symposium on
Performance Analysis of Systems and Software (ISPASS), 2015, pp. 171–172.
[5] “The write path to compaction.” [Online]. Available:
https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html.
[Accessed: 14-Sep-2016].
[6] “Docker Overview,” Docker, 30-May-2016. [Online]. Available:
https://docs.docker.com/engine/understanding-docker/. [Accessed: 14-Sep-2016].
[7] “Why NoSQL?,” DataStax. [Online]. Available:
http://www.datastax.com/resources/whitepapers/why-nosql. [Accessed: 14-Sep-2016].
[8] “Introduction to Apache Cassandra,” DataStax. [Online]. Available:
http://www.datastax.com/resources/whitepapers/intro-to-cassandra. [Accessed: 14-Sep2016].
[9] A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”
SIGOPS Oper Syst Rev, vol. 44, no. 2, pp. 35–40, Apr. 2010.
[10] V. D. Jogi and A. Sinha, “Performance evaluation of MySQL, Cassandra and HBase for
heavy write operation,” in 2016 3rd International Conference on Recent Advances in
Information Technology (RAIT), 2016, pp. 586–590.
[11] V. Jurenka, “Virtualizace pomocí platformy Docker,” Master’s thesis, Masaryk
University, Faculty of Informatics, 2015.
[12] P. E. N, F. J. P. Mulerickal, B. Paul, and Y. Sastri, “Evaluation of Docker containers
based on hardware utilization,” in 2015 International Conference on Control
Communication Computing India (ICCC), 2015, pp. 697–700.
[13] R. Morabito, J. Kjällman, and M. Komu, “Hypervisors vs. Lightweight Virtualization:
A Performance Comparison,” in 2015 IEEE International Conference on Cloud Engineering
(IC2E), 2015, pp. 386–393.
[14] “sar(1) - Linux man page.” [Online]. Available: http://linux.die.net/man/1/sar.
[Accessed: 14-Sep-2016].
[15] “iostat(1) - Linux man page.” [Online]. Available: http://linux.die.net/man/1/iostat.
[Accessed: 14-Sep-2016].
[16] “The cassandra-stress tool.” [Online]. Available:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html. [Accessed:
14-Sep-2016].
48
APPENDIX
This section appendix, shows the status and configuration of cassandra database
conducted for the purpose of monitoring and analysis of disk throughput and latency in
servers running the cassandra database.
Network configuration (cassandra.yaml file):
Figure 32: Seeds ip address
Figure 33: Listen address
49
Figure 34 Broadcast address
Figure 35 Rpc address
50
Figure 36 Latency operations
51
Figure 37: Nodetool status
52
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement