BaseSpace User Guide

BaseSpace User Guide
BaseSpace User Guide
Supporting the NextSeq, MiSeq, and HiSeq Sequencing Systems
FOR RESEARCH USE ONLY Introduction
How Do I Start
BaseSpace User Interface
How To Use BaseSpace
Workflow Reference
Data Reference
Technical Assistance
ILLUMINA PROPRIETARY
Part # 15044182 Rev. D
June 2014
3
8
13
25
60
63
This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"), and are intended solely for the
contractual use of its customer in connection with the use of the product(s) described herein and for no other purpose. This
document and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed,
or reproduced in any way whatsoever without the prior written consent of Illumina. Illumina does not convey any license
under its patent, trademark, copyright, or common-law rights nor similar rights of any third parties by this document.
The instructions in this document must be strictly and explicitly followed by qualified and properly trained personnel in order
to ensure the proper and safe use of the product(s) described herein. All of the contents of this document must be fully read
and understood prior to using such product(s).
FAILURE TO COMPLETELY READ AND EXPLICITLY FOLLOW ALL OF THE INSTRUCTIONS CONTAINED HEREIN
MAY RESULT IN DAMAGE TO THE PRODUCT(S), INJURY TO PERSONS, INCLUDING TO USERS OR OTHERS, AND
DAMAGE TO OTHER PROPERTY.
ILLUMINA DOES NOT ASSUME ANY LIABILITY ARISING OUT OF THE IMPROPER USE OF THE PRODUCT(S)
DESCRIBED HEREIN (INCLUDING PARTS THEREOF OR SOFTWARE) OR ANY USE OF SUCH PRODUCT(S) OUTSIDE
THE SCOPE OF THE EXPRESS WRITTEN LICENSES OR PERMISSIONS GRANTED BY ILLUMINA IN CONNECTION
WITH CUSTOMER'S ACQUISITION OF SUCH PRODUCT(S).
© 2011-2014 Illumina, Inc. All rights reserved.
Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cBot, CSPro, CytoChip, DesignStudio,
Epicentre, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, HiSeq X, Infinium,
iScan, iSelect, ForenSeq, MiSeq, MiSeqDx, MiSeq FGx, NeoPrep, Nextera, NextBio, NextSeq, Powered by Illumina,
SeqMonitor, SureMDA, TruGenome, TruSeq, TruSight, Understand Your Genome, UYG, VeraCode, verifi, VeriSeq, the
pumpkin orange color, and the streaming bases design are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or
other countries. All other names, logos, and other trademarks are the property of their respective owners.
BaseSpace is a genomics analysis platform that is directly integrated into the NextSeq,
MiSeq, and HiSeq sequencing platforms. When setting up runs on your sequencing
instrument, you can send the run to BaseSpace. The instrument then sends the base call
(*.bcl) files, as well as associated files, to your dedicated space on the cloud, as well as
associated files. On the HiSeq, you can also choose to do Run Monitoring Only, which
only sends the files needed for remote monitoring of the run to BaseSpace.
NOTE
This user guide supports data analysis for the NextSeq, MiSeq, and HiSeq sequencing
systems, and contains information about the Prep tab, which is used to set up a NextSeq
sequencing run.
This user guide is specific for BaseSpace running in the cloud, and is not intended for the
on-premise implementation, BaseSpace Onsite.
The instrument seamlessly pushes the data to BaseSpace for automatic analysis and
storage, with the option of retaining data for local analysis and hosting. There is no need
for a manual and time-consuming data-transfer step: the data are already up in the
cloud, for you and your collaborators to access anywhere, anytime.
BaseSpace can automatically run analysis jobs using the Illumina MiSeq workflow apps.
BaseSpace also allows you to use the third-party apps to analyze your data. In addition,
BaseSpace provides a mechanism to share data with others and easily scale storage and
computing needs.
For more information about BaseSpace, see the BaseSpace Data Sheet.
Workflow Model
Prep Run on NextSeq
BaseSpace enables you to prep runs for NextSeq sequencing.
The prep workflow in BaseSpace consists of four steps:
} Biological Samples: Define the samples that are going to be sequenced.
} Libraries: Define the libraries, which consist of biological samples that are prepped
and contain adapters. Each library usually derives from a single biological sample,
though biological samples can be used in multiple libraries.
} Pools: Group libraries into pools that share analysis parameters. Pools can consist of
one or multiple libraries.
} Planned Runs: Define run parameters for pool, then send planned run to the
NextSeq.
You can now start the run from the instrument.
BaseSpace User Guide
3
Introduction
Introduction
Figure 1 Prep Workflow
Data Processing
Processing a flow cell on a sequencing instrument produces various files, collectively
referred to as a run. A run contains log files, instrument health data, run metrics, sample
sheet, and base call information (*.bcl files). The base call information is demultiplexed
in BaseSpace to create the samples used in secondary analysis.
Samples are analyzed automatically using the Illumina workflow apps as specified in
the sample sheet, or by launching custom BaseSpace apps. BaseSpace apps are
processing software and routines that interact with BaseSpace data through the API.
User-level authentication and in-flight data encryption are enforced for every app that
requests access to BaseSpace data.
The result files from an app session are stored in an analysis. Analyses are created to
record every time an app is launched. For example, when a resequencing app executes
alignment and variant calling, an analysis is created that contains the app results for
each sample. App results generally contain BAM and VCF files, but they can also
contain other file types. App results can also be used as inputs to apps.
Finally, projects are simple containers that store samples and analyses.
4
Part # 15044182 Rev. D
Introduction
Figure 2 BaseSpace Data Model
BaseSpace Security Model
Data security is a key concern in deciding to move to cloud-based genomic storage and
analysis. Illumina BaseSpace is hosted on Amazon Web Services (AWS) and provides a
combination of Amazon’s comprehensive and well-tested approach to platform security,
overlaid with Illumina’s own security testing and procedures. These procedures include
reviews and tests by independent security professionals. This cloud genomics solution
meets or exceeds the security provided by many institutional IT infrastructures.
Amazon Web Services
Illumina works with AWS, the leader in cloud-based infrastructure. AWS hosts customerfacing services and critical operations for both private industry and U.S. government
departments including Treasury, DOE, and State. Amazon security processes and
standards are publicly available for review. AWS standards and accreditation include:
} SOC 1/SSAE 16/ISAE 3402 (auditing)
} FISMA moderate (U.S. Federal Government; for reference, the NIH data centers are
rated FISMA moderate)
BaseSpace User Guide
5
} PCI DSS Level 1 (electronic payments)
} ISO 27001 (international security standard)
} FIPS 140-2 (encryption)
Additionally, security staff and controlled access procedures protect AWS data centers .
Staff with system access undergoes background checks, and all hardware is located
behind firewalls that are configured by default to block all traffic. Operating security
patches are automatically applied to AWS servers, including BaseSpace servers. AWS
actively monitors its firewalls to check for vulnerabilities, a service beyond the resources
of most institutions. BaseSpace encrypts all data, something else that is rarely done in
the institutional IT setting.
BaseSpace Data Stream Software
Illumina sequencing instruments have on-board control and workflow software. This
software includes a robust data-streaming component, which acts as a software broker
with the BaseSpace API. The broker allows individual base call (*.bcl) files to be sent over
an encrypted connection, verified, and assembled into samples for analysis in real time
as the sequencing run is conducted. Real-time monitoring of data generated by one
sequencing instrument or a federation of instruments is possible through the BaseSpace
interface.
The instrument control software does not allow publicly addressable inbound
communications. All communication is made through standard https requests initiated
by the user at the instrument. Each data-upload transaction is linked to an authenticated
user account.
BaseSpace Apps
There are two different types of apps in BaseSpace:
} Sample Sheet Driven Workflow Apps (for MiSeq only): launching these Illumina
workflow apps is specified in the sample sheet, and BaseSpace starts them
automatically. The sample sheet driven workflow apps primarily perform secondary
analysis or simple file manipulations. They consist of the following:
• Resequencing
• Amplicon Analysis
• Library QC
• SmallRNA
• Metagenomics
• De Novo Assembly
• Generate FASTQ
See the Workflow Reference on page 60 for descriptions of the workflow apps.
} Custom BaseSpace apps: these apps are actively launched to analyze data. In
general, these apps perform tertiary analysis, visualization, or annotation of data.
Both third-party vendors and Illumina can generate them. There could be additional
costs associated with running a BaseSpace app from a third-party vendor. BaseSpace
apps can require the AppResults from an Illumina workflow app as input.
6
Part # 15044182 Rev. D
Sequencing data has traditionally been stored in non-centralized locations, which offer
little uniformity of data management across locations. BaseSpace transforms data
management by creating an environment with large stores of sequencing data, which
can be easily accessed and analyzed online with a store of applications. Third-party
vendors can develop their own apps for BaseSpace. BaseSpace offers the following
benefits as a development platform:
} Easy access to data: sequencing data are automatically uploaded from instrument to
BaseSpace
} Consistent data retrieval: with a few lines of code, access data with the BaseSpace
API and SDKs
} Write once, execute often: when an app is written and published, all users can
launch it
} Flexible billing: apps can bill customers as little or as much as they wish
} Highly scalable: data storage and analysis scales because BaseSpace is built on
Amazon Web Services (AWS)
} Flexible app hosting: apps can be hosted on any website, desktop, or mobile
application, or inside BaseSpace as a Native App.
} Easy sharing: users can easily share their data and results
Developers can create apps using the BaseSpace Application Programming Interface
(API) or the Software Development Kits (SDKs) available for Java, R, Ruby, and Python.
Both approaches offer safe and easy access to BaseSpace data for Apps to analyze,
visualize, monitor, etc.
Apps access data using the BaseSpace RESTful API. The API can be accessed via simple
HTTPS requests using any programming language and is organized to allow you to get
to the data you need quickly.
The SDKs are available for several programming languages and make it even easier for
developers to write applications or to integrate existing ones. The SDKs work by
exposing what the API has to offer natively without the developer needing to worry
about building their own HTTPS requests. This tool allows rapid development and
integration with BaseSpace data and a simple mechanism of discovering what the API
has to offer.
For more information, see the BaseSpace API documentation.
BaseSpace User Guide
7
Introduction
BaseSpace API
How Do I Start
You can reach BaseSpace from basespace.illumina.com/home/index. In this section, we
discuss the different ways to get started with BaseSpace.
Use your MyIllumina account to log on; the first time you visit you are asked to accept
the BaseSpace agreement. After that, you are ready to run BaseSpace.
Exploring BaseSpace on page 8
Not Uploading Yet on page 8
Using BaseSpace with MiSeq on page 9
Using BaseSpace with HiSeq on page 10
Using BaseSpace with NextSeq on page 11
Getting Shared Data on page 12
Exploring BaseSpace
If you want to explore BaseSpace, go to basespace.illumina.com/home/index and click the
Get Started link. Register to set up a new account: fill out the form, and indicate whether
you want access to product and support resources.
We send you a confirmation email to the email account you entered. Open that email,
and click the confirmation link. Now you are ready to start testing BaseSpace with the
test data we uploaded for you. For information about how to run certain tasks, see How
To Use BaseSpace on page 25
When you feel comfortable with BaseSpace, and you have a BaseSpace equipped HiSeq
or MiSeq, you can start uploading data and run analyses from your sequencing
instrument. Raw data from the run is also stored on the instrument, or in the location of
the output folder that you specified in Run Options.
Not Uploading Yet
If you have a sequencing instrument but you are not uploading data yet, you can start by
exploring BaseSpace. Go to basespace.illumina.com and log on; there are two ways to do
that:
} Use your MyIllumina account to log on; the first time you visit you are asked to
accept the BaseSpace agreement. After that, you are ready to run BaseSpace.
} Explore BaseSpace by taking a test drive; register to set up a new account. See
Exploring BaseSpace on page 8.
Now you are ready to start testing BaseSpace with the test data we uploaded for you. For
information about how to run tasks, see How To Use BaseSpace on page 25
When you feel comfortable with BaseSpace, you can start uploading data and run
analyses from your sequencing instrument. Raw data from the run is also stored on the
instrument, or in the location of the output folder that you specified in Run Options.
Alternatively, you can elect to upload only health data to BaseSpace. Health data helps
Illumina improving the sequencing instruments and BaseSpace; for more information,
see Health Runs on page 89.
See the MiSeq System User Guide, NextSeq System User Guide, or HiSeq User Guide for
instructions for setting up your sequencing instrument.
8
Part # 15044182 Rev. D
Using BaseSpace with MiSeq
BaseSpace is the Illumina analysis cloud environment. Using BaseSpace to store and
analyze your run data provides the following benefits:
} Eliminates the need for onsite storage and computing
} Enables web-based data management and analysis
} Provides tools for global collaboration and sharing
In this section, we discuss the different ways to get started with BaseSpace when
uploading data and analysis from the MiSeq.
You can reach BaseSpace by going to basespace.illumina.com. Use your MyIllumina
account to log on; the first time you visit you are asked to accept the BaseSpace
agreement. After that, you are ready to run BaseSpace.
When you set up the run on the MiSeq, select the option to log in to BaseSpace. If you
have a problem with the data upload between MiSeq and BaseSpace, see MiSeq
Connection on page 9
NOTE
Raw data from the run is also stored on the instrument, or in the location of the output
folder that you specified in Run Options.
BaseSpace automatically disconnects from the MiSeq at the end of the run or as soon as
all primary analysis files have finished uploading. If the internet connection is
interrupted, analysis files continue uploading after the connection is restored from the
point when the interruption occurred.
As soon as the last base call file is uploaded to BaseSpace, secondary analysis of your
data begins. The same analysis workflows are supported on BaseSpace as with oninstrument analysis using MiSeq Reporter. For information about how to run tasks, see
How To Use BaseSpace on page 25
MiSeq Connection
If the MiSeq data are not uploaded to BaseSpace, check the following things.
1
Make sure that you have a stable internet connection of at least 10 Mbps upload
speed from the MiSeq.
2
When setting up runs on the MiSeq, you can log in to BaseSpace, and use
BaseSpace for storage and analysis. Make sure that option is checked.
When you begin your sequencing run on the MiSeq, the BaseSpace icon changes to
indicate that the MiSeq is connected to BaseSpace and data files are being transferred.
Figure 3 Connected to BaseSpace Icon on the MiSeq
For more information, see the MiSeq System User Guide.
BaseSpace User Guide
9
How Do I Start
When your sequencing instrument is uploading data, go to Using BaseSpace with MiSeq
on page 9, Using BaseSpace with NextSeq on page 11, or Using BaseSpace with HiSeq on
page 10 to start the analysis of your run.
Using BaseSpace with HiSeq
BaseSpace Connectivity—The HiSeq features an option to send instrument health and
sequencing data to BaseSpace in real time to streamline both instrument quality control
and analysis. Real-time monitoring of runs enables fast troubleshooting. BaseSpace
facilitates collaboration by enabling you to share results instantly with anyone anywhere
in the world. Free alignment and variant calling and the soon to be launched BaseSpace
app store provide many easy to use workflows that tailor analysis for diverse biological
applications.
BaseSpace is the Illumina analysis cloud environment. Using BaseSpace to store and
analyze your run data provides the following benefits:
} Eliminates the need for onsite storage and computing
} Enables web-based data management and analysis
} Provides tools for global collaboration and sharing
In this section, we discuss the different ways to get started with BaseSpace when
uploading data and analysis from the HiSeq.
You can reach BaseSpace by going to basespace.illumina.com. Use your MyIllumina
account to log on; the first time you visit you are asked to accept the BaseSpace
agreement. After that, you are ready to run BaseSpace.
When you set up the run on the HiSeq, select the option to log in to BaseSpace. If you
have a problem with the data upload between HiSeq and BaseSpace, see HiSeq Connection
on page 10
NOTE
Raw data from the run is also stored on the instrument, or in the location of the output
folder that you specified in the Storage screen.
BaseSpace automatically disconnects from the HiSeq at the end of the run or as soon as
all primary analysis files have finished uploading. If the internet connection is
interrupted, analysis files continue uploading after the connection is restored from the
point when the interruption occurred.
As soon as the last base call file is uploaded to BaseSpace, secondary analysis of your
data begins. For information about how to run tasks, see How To Use BaseSpace on page
25
HiSeq Connection
If the HiSeq data are not uploaded to BaseSpace, check the following things.
10
1
Make sure that you have a stable internet connection of at least 10 Mbps upload
speed from the HiSeq.
2
The Storage screen during run configuration on the HiSeq enables you to define
where your run data are output and stored. Select the options:
• Connect to BaseSpace—When you select this option you are prompted to enter
your MyIllumina account information. Zip BCL files is selected by default.
Illumina recommends that you also save files locally. To save files locally, select
Save to an output folder and enter a path, usually to a local network folder.
• Storage and analysis—This option enables the HiSeq to send run data as well as
system health information to BaseSpace.
Part # 15044182 Rev. D
How Do I Start
Figure 4 Storage Screen
3
If BaseSpace is not available, open Windows Services and start or restart Illumina
BaseSpace Broker:
a Click the Windows Start button.
b Right-click Computer, select Manage.
c On the left, under Services and Applications, choose Services.
d Scroll down the list to find Illumina BaseSpace Broker.
e Right-click Illumina BaseSpace Broker and do one of the following:
— Click Start if this option is not grayed out
— If the Start option is grayed out, click Restart
The service starts, or closes then restarts.
f Close the Computer Management window.
NOTE
To use BaseSpace, you have to load a sample sheet at the start of your run.
For more information, see the HiSeq User Guide
When you begin your sequencing run on the HiSeq, the BaseSpace icon changes to
indicate that the HiSeq is connected to BaseSpace and data files are being transferred.
Using BaseSpace with NextSeq
BaseSpace is the Illumina analysis cloud environment. BaseSpace facilitates your
experiments on the NextSeq system in two different ways:
} BaseSpace helps to organize your samples and experiments, and preps runs for
NextSeq.
} BaseSpace stores and analyzes your run data, providing the following benefits:
• Eliminates the need for on-site storage and computing
• Enables web-based data management and analysis
• Provides tools for global collaboration and sharing
If your NextSeq sequencing system and BaseSpace do not connect properly, check the
following:
} Make sure that you have a stable connection of at least 10 Mbps upload speed from
the NextSeq.
} From the Manage Instrument screen, select System Configuration to access a series
of screens that configure the connection to BaseSpace.
} Log in to BaseSpace when setting up the run on the NextSeq sequencing system.
BaseSpace User Guide
11
Getting Shared Data
If you receive a link to shared data in BaseSpace, click the link.
Use your MyIllumina account to log on; the first time you visit you are asked to accept
the BaseSpace agreement. After that, you are ready to run BaseSpace.If you don't have an
account, fill out the form, and indicate whether you want access to product and support
resources.
If someone shared data with you, you see a notification stating so.
The shared data show up in your project list. Now you can use the BaseSpace tools to
look at and download the data. For information about how to run tasks, see How To Use
BaseSpace on page 25
NOTE
The owner of the data can disable the sharing feature at any time.
12
Part # 15044182 Rev. D
The BaseSpace user interface (UI) has four tabs that allow you to access and use your
data. In addition, there are a number of common interface elements that enable general
tasks. This section describes the various aspects of the BaseSpace UI.
Common Elements on page 13
Dashboard Tab on page 15
Prep Tab on page 16
Runs Tab on page 20
Projects Tab on page 21
Apps Tab on page 24
Public Data Tab on page 24
Common Elements
There are a number of common UI elements that are shared between all BaseSpace
pages, and which enable general tasks:
} Toolbar
} Contact us button
} Bottom links
Toolbar
The BaseSpace toolbar elements are listed in the table.
Icon
BaseSpace User Guide
Element
Description
Dashboard See Dashboard Tab on page 15
Tab
Runs Tab
See Runs Tab on page 20.
Projects
Tab
See Projects Tab on page 21.
Prep Tab
See Prep Tab on page 16. This tab is used to set up a NextSeq
run.
Apps Tab
See Apps Tab on page 24.
Public
Data Tab
See Public Data Tab on page 24.
Support
Page
The BaseSpace Support page provides access to the BaseSpace
Knowledge Base, User Guide, and Illumina Technical Support.
13
BaseSpace User Interface
BaseSpace User Interface
Icon
Element
Search
Account
Description
The Search box allows you to find runs, projects, or samples.
For more information, see Search for Runs, Projects, and Samples
on page 58.
The Account drop-down list provides access to:
• iCredits. See Access Your Wallet on page 56.
• MyAccount. See MyAccount on page 15.
• MyIllumina Dashboard.
• FAQ: leads to a number of frequently asked questions and
Illumina-provided answers.
• Terms: leads to the User Agreement.
• Blog: leads to the blog. Check out for the latest news,
developments, and updates.
• Sign out.
Contact Us Button
The Contact us button opens a new screen that allows you to:
• Browse the knowledge base
• Provide feedback for Illumina, suggest ideas to the user community, or browse, read,
and vote for other ideas.
• Contact Support
Figure 5 Knowledge base, feedback, and contact screen.
Bottom Links
The bottom links provide access to more information:
} Help: online help.
} FAQ: leads to a number of frequently asked questions and Illumina-provided
answers.
} Developers: leads to the developers portal, set up to help you generate custom apps.
} Terms: leads to the User Agreement.
14
Part # 15044182 Rev. D
MyAccount
MyAccount provides access to the Settings, Wallet, Purchase History, Transfer History,
and Genomes pages.
Settings
On the Settings page you can edit your notifications settings, edit your profile, or update
your profile picture.
Wallet
The Wallet page allows you to manage iCredits and credit cards. See Access Your Wallet
on page 56 for more information.
Purchase History
The Purchase History page contains detailed information about purchases, adjustments,
and balance for your account. See View Purchase History on page 58 for more information.
Transfer History
The Transfer History page allows you to review projects or runs that have been
transferred. See Transfer Ownership on page 53 for more information.
Genomes
The Genomes page lists the genomes that are associated with your BaseSpace account.
Dashboard Tab
After login, the first tab you see is the dashboard. The dashboard provides access to
notifications, your latest runs, projects, and app results. The dashboard is always
accessible in BaseSpace from the top ribbon selector.
NOTE
If a run or project is not showing on BaseSpace, it is possible your data has
not been sent to BaseSpace. Set the BaseSpace option on your sequencing
instrument; see the instrument user guide.
Notifications
Shows notifications, most recent first. There are multiple types of notifications:
} Runs
BaseSpace User Guide
15
BaseSpace User Interface
} Blog: leads to the blog (blog.basespace.illumina.com). Check out for the latest news,
updates, and developments, and subscribe to updates.
}
}
}
}
}
• Run in progress
• Run completed
• Run error
Collaborators
• Collaborator joined a project/run of which you are a member
• Collaborator invited you to a project/run
• (optionally) collaborator has included a personal message
• Collaborator recommended an App
• Collaborator accepted your offer to transfer ownership
• Collaborator offered to transfer ownership to you.
Analyses by you
• Analysis in progress
• Analysis completed
• Analysis error
Analyses by collaborators
• Analysis in progress
• Analysis completed
• Analysis error
Uploads, additions, or deletions to/from a project of which you are a member
• By you
• By a collaborator
Messages from Illumina
• New Demo data set
• Announcement of a new feature
Runs Pane
The bottom left pane of the BaseSpace dashboard shows the three most recent runs, and
is updated automatically.
Clicking the Runs pane opens the Runs tab. Clicking a run opens the Runs tab with the
run loaded. For more information, see Runs Tab on page 20.
Projects Pane
The bottom middle pane of the BaseSpace dashboard shows the three most recent
projects. The folder icon indicates the sharing status of the project: if it shows several
people
, the project is shared.
Clicking the Projects pane opens the Projects tab. Clicking a project opens the Projects tab
with the project loaded. For more information, see Projects Tab on page 21.
App Results Pane
The right bottom pane of the BaseSpace dashboard shows the most recent app results.
Clicking an app result provides charts relevant for the app used in the Projects tab. For
more information, see App Results Page on page 23.
Prep Tab
The Prep tab enables you to set up a sequencing run on the NextSeq sequencing system.
16
Part # 15044182 Rev. D
The Prep Tab sets up a run in four steps:
} Biological Samples: Contains information about the samples that are going to be
sequenced. See Biological Samples on page 17
} Libraries: Consists of biological samples that are prepped and contain adapters.
Each library usually derives from a single biological sample, though biological
samples can be used in multiple libraries. See Libraries on page 18.
} Pools: Consists of groups of libraries that share analysis parameters. Pools can
consist of one or multiple libraries. See Pools on page 18.
} Planned Runs: Contains pools that run with the same analysis parameters, on the
same machine, at the same time. Planned runs can consist of one or multiple pools.
See Planned Runs on page 19.
Biological Samples
When you click the Biological Samples tab you see the Biological Samples list, which
shows all available samples you have created on your account.
Figure 6 Biological Samples List
If you want information about the samples, you can perform the following:
} Sort the list by clicking the column headers.
} Click a sample to got to the sample page.
This page provides the following actions to prepare your analysis:
} Create a sample.
} Import new samples.
} Select a sample and edit its properties.
} Select one or more samples and continue with Prep Libraries.
NOTE
You can select multiple samples by using one of the following methods:
BaseSpace User Guide
17
BaseSpace User Interface
This tab is only available for NextSeq sequencing systems. Other sequencing instruments
use a sample sheet to provide sample information to BaseSpace.
• Select multiple checkboxes.
• Click anywhere on a sample row while holding Ctrl button to add to a selection.
• Click anywhere on a sample row while holding Shift button to select all samples in
between.
• Click the checkbox next to Plate/Tube ID to select all samples on the current page.
The box next to the Biological Samples header tracks the total number of samples, and
how many are selected. Click X next to the selection count to clear the current selection.
For more information about these actions, see Create New Biological Samples on page 26,
Import Biological Samples on page 26, and Use Existing Biological Samples on page 27.
Libraries
When you click the Libraries tab you see the Libraries list, which shows all available
plates or tubes with libraries you have created on your account. You can sort the list by
clicking the column headers, or click a plate to see its properties and associated libraries.
Figure 7 Libraries List
This page provides the following actions to prepare your analysis:
} Click a plate, then click the Edit button to edit its properties or libraries.
} Select one or more plates or tubes and move to Pool Libraries.
} Import libraries and associate them to new biological samples at the same time.
NOTE
If you want to select multiple libraries:
• Select multiple checkboxes.
• Click anywhere on a library row while holding Ctrl button to add to a selection.
• Click anywhere on a library row while holding Shift button to select all libraries in
between.
• Click the checkbox next to Plate/Tube ID to select all samples on the current page.
• Use the import function.
The box next to the Libraries header tracks the total number of libraries, and how many
are selected. Click X next to the selection count to clear the current selection.
For more information about these actions, see Prep Libraries on page 27 or Import Samples
and Libraries on page 30. When prepping a library, you can also create a custom library
kit; see Set Up Custom Library Prep Kit on page 29.
Pools
When you click the Pools tab you see the Pools list, which shows all available pools of
libraries you have created on your account. You can sort the list by clicking the column
headers, or click a pool to see its properties and associated libraries.
18
Part # 15044182 Rev. D
This page provides the following actions to prepare your analysis:
} Click a pool, then click the Edit button to edit the notes.
} Select a pool and move to Plan Run.
NOTE
You can also merge pools the following way:
• Click the Save & Continue Later, which takes you to the Pools list, with the recently
created plate at the top of the list.
• Select the checkboxes in the Pools list.
• Click the Merge Pools button in the top navigation bar.
The box next to the Pools header tracks the total number of pools, and how many are
selected.
For more information about these actions, see Pool Libraries on page 31.
Planned Runs
When you click the Planned Runs tab you see the Planned Runs list, which shows all
planned runs you have created on your account.
Figure 9 Planned Runs List
You can sort the list by clicking the column headers, or click a run to see or edit its
properties. For more information about these actions, see Plan Runs on page 32.
The runs can have the following states:
} Ready to Sequence: the run can be started from the NextSeq sequencing system.
} Planning, the run does not show up on the NextSeq sequencing system, because it is
still in the planning stage.
NOTE
If you want to select multiple runs:
• Select multiple checkboxes.
• Click anywhere on a planned run row while holding Ctrl button to add to a selection.
• Click anywhere on a planned run row while holding Shift button to select all runs in
between.
• Click the checkbox next Experiment Name to select all planned runs on the current
page.
BaseSpace User Guide
19
BaseSpace User Interface
Figure 8 Pools List
The box next to the Planned Runs header tracks the total number of runs, and how many
are selected. Click X next to the selection count to clear the current selection.
When sequencing on a run starts, the run is removed automatically from the Planned
Runs list.
Runs Tab
The Runs button leads to the runs list, which allows you to sort your runs based on
experiment name, state, workflow, created date, machine, and owner.
The following run states are possible (blue boxes indicate final states.):
If you want to look at a run in detail, click the name to view metrics in more detail. For
more information, see Run Overview Page on page 20.
When you click the gear wheel next to the run name, you see options for sharing,
transferring, and downloading a run. For more information, see Share Data on page 46,
Transfer Ownership on page 53, or Download Files on page 44.
Run Overview Page
The Run Overview page provides 5 panes:
} The Run Details pane gives a summary of the run with links to view files and
download and share options. For more information, see Share Data on page 46, View
Files and Results on page 34, or Download Files on page 44.
} The Samples pane gives a list of all the app results in the run, the associated
projects, and the number of samples in that analysis. This pane provides access to
the following pages:
• Samples list, see Run Samples List on page 21
• Sample Details page, see Sample Overview Page on page 23
20
Part # 15044182 Rev. D
In addition, there is a Side Navigation ribbon, which provides easy navigation in the
Run Details area.
It contains links to the Overview, Run Samples List, Charts, Run Summary, Indexing QC,
Sample Sheet or Run Settings, and Files pages.
Run Samples List
The samples list allows you to sort the samples in your run based on sample ID, app,
date created, and project. If you want to look at a sample, app result, or project in detail,
click the links to get to the following pages:
} Sample Overview Page on page 23.
} App Results Page on page 23.
} Project Overview Page on page 22.
In addition, there is a Side Navigation ribbon, which provides easy navigation in the
Run Details area.
It contains links to the Overview, Run Samples List, Charts, Run Summary, Indexing QC,
Sample Sheet or Run Settings, and Files pages.
Projects Tab
The Projects button opens a list of your projects. You can sort the list by name, last
update, or owner. Clicking a project provides access to the app results and samples
within that project.
BaseSpace User Guide
21
BaseSpace User Interface
• App Results page, see App Results Page on page 23
• Project Overview page, see Project Overview Page on page 22
} The Charts pane shows an intensity by cycle chart. Clicking the header takes you to
the Charts page, which contains five charts with run metrics. See Charts on page 71.
} The Run Summary pane shows tables with basic data quality metrics. Clicking the
header takes you to the Run Summary page. See Run Summary on page 68.
} The Indexing QC pane lists count information for indexes used in the run. Clicking
the header takes you to the Indexing QC page. See Indexing QC on page 70
You generate a new project by clicking New Project button on top of the list.
When you hover over a project that you own, you see the Settings wheel.
The Settings wheel provides the following options for sharing a project and editing the
project details:
} Edit project: edit the name and description of the project. See also Edit Project Details
on page 50.
} Share: manage sharing a project with a particular collaborator. See also Share a Project
Using the Email Option on page 47.
} Get link: forward the sharing link to any number of collaborators. See also Share a
Project with Get Link on page 47.
} Transfer ownership: hand control of data over to a collaborator or customer. See also
Transfer Ownership on page 53.
NOTE
Runs and projects have separate permissions. If you share a project, you do
not share the runs contained within the project.
Project Overview Page
The Project Overview page provides access to three panes with information about the
project:
} The About tab gives you summary information about the project: owner, shared
status, date created, and collaborators.
} The Analyses tab gives a list of all the App Sessions in the project. This tab can be
sorted based on analysis name, last modified date created, status, or application
used to generate the analysis. Clicking the analysis links to the app results for that
sample, see App Results Page on page 23 for more information.
} The Samples tab gives a list of all the samples in the project. Clicking a sample links
to the page for that sample, see Sample Overview Page on page 23 for more
information. Selecting the samples allows you to launch it in an app, copy to a
different project, or combine with another result.
NOTE
You can access these panes through the left navigation bar.
Project Toolbar
The Project Toolbar provides the following actions:
} Launch app: run apps on your sample. Clicking the app name leads to a page with
more information about launching that app, including access permissions.
See also Analyze Samples Further on page 41. Running apps can incur a charge.
} Share project: manage sharing a project with a particular collaborator. See also Share
a Project Using the Email Option on page 47
} Get link: forward the sharing link to any number of collaborators. See also Share a
Project with Get Link on page 47
} Edit project: edit the name and description of the project. See also Edit Project Details
on page 50
22
Part # 15044182 Rev. D
Options that are not available for the particular analysis or sample are grayed out.
If you have selected samples in the Samples pane, you can perform additional actions:
} Copy to...: copy samples from this project to another. See also Copy Samples on page
52
} Combine: combine samples. See also Combine Samples on page 51
NOTE
The app session states are defined as follows:
State
Running
Complete
Aborted
Needs
Attention
Description
The app is processing or uploading data.
Processing and file upload has finished and the data are now available
to use
This AppResult or Sample has been aborted and cannot be resumed.
Cannot continue without user intervention
App Results Page
The App Results page provides details about the results for that app session. There is a
general information pane to the left, and up to four graphs:
}
}
}
}
Low Percentage Graph on page 63
High Percentage Graph on page 64
Clusters Graph on page 65
Mismatch Graph on page 67
See the following topics for more information about the various apps:
}
}
}
}
}
}
}
Custom/PCR Amplicon on page 60
Resequencing on page 60
Library QC on page 61
Small RNA Analysis on page 61
Metagenomics Analysis on page 62
De Novo Assembly on page 62
Generate FASTQ on page 62
Sample Overview Page
The Sample Overview page provides 2 panes:
} The Sample Details pane gives a summary of the run with a links to launch a
custom BaseSpace app on your sample. Clicking the app name leads to a page with
more information about that app, including access permissions.
Running apps can incur a charge.
} The Files pane gives a list of files associated with that sample. You can either look at
all FASTQ files, or look at files specific for an app session.
See also View Files and Results on page 34. You can also download selected files; see
Download Multiple FASTQ Files on page 44.
BaseSpace User Guide
23
BaseSpace User Interface
} Transfer owner: hand control of data over to a collaborator or customer. See also
Transfer Ownership on page 53
Apps Tab
The Apps button leads to the Apps page, which provides an overview of the custom
BaseSpace apps that you can run.
} Clicking the app name leads to a page with more information about that app,
including a link to the developer and their app support contact details.
} Clicking the Launch button
leads you through the launch pages, which
allow you to set up the app session. Specify parameters like the project, sample, or
output folder used by the app, depending on the app, as well as accept access
permissions.
Running apps can incur a charge.
} You can search for apps using the Search Apps box, or filter by app category on the
right.
Public Data Tab
The Public Data page provides an overview of the publicly available data sets
that you can use. Clicking a data set provides more information for that set, and
allows you to import the run or project. You can search for apps using the Search
Public Data box, or filter by the research areas and categories listed on the right.
24
Part # 15044182 Rev. D
How To Use BaseSpace
How To Use BaseSpace
The following topics describe how to run different functions in BaseSpace.
Prepare a NextSeq Run on page 25
View Files and Results on page 34
Analyze Samples Further on page 41
Download Files on page 44
Share Data on page 46
Project and Sample Management on page 50
Purchasing on page 56
Search for Runs, Projects, and Samples on page 58
Prepare a NextSeq Run
What is it
You can prepare NextSeq runs through the BaseSpace Prep tab, which organizes
samples, libraries, pools, and run in a single environment.
When to use it
Use this option if you want to prepare a sequencing run on a NextSeq instrument, and
have the data stream seamlessly to BaseSpace.
Do not use it to prepare sequencing runs for other instruments. If you do have a NextSeq
sequencing system but do not want to use BaseSpace, you can also start a run straight on
the instrument.
Why to use it
Preparing a run in the Prep tab moves the data and analysis seamlessly to BaseSpace.
Using the Prep tab means BaseSpace is your single-stop solution for sequencing
management, storage and analysis.
How to use it
1
Log in to BaseSpace. If it is your first time logging in, accept the user agreement.
2
Click the Prep icon
3
Set up a NextSeq run on the Prep Tab in four consecutive steps:
a Biological Samples: Contains information about the samples that are going to be
sequenced. You can create new samples, import samples, or use existing
samples; for instructions, see one of the following topics:
— Create New Biological Samples on page 26
— Import Biological Samples on page 26
— Use Existing Biological Samples on page 27
b Libraries: Consists of biological samples that are prepped and contain adapters.
Each library usually derives from a single biological sample, though biological
samples can be used in multiple libraries. See Libraries on page 18
BaseSpace User Guide
.
25
c
d
You can also import biological samples and libraries in one step; see Import
Samples and Libraries on page 30.
Pools: Consists of groups of libraries that share analysis parameters. Pools can
consist of one or multiple libraries. See Pools on page 18.
Planned Runs: Contains pools that run with the same analysis parameters, on
the same machine, at the same time. Planned runs can consist of one or multiple
pools. See Planned Runs on page 19
Create New Biological Samples
If you want to create a new biological sample, do the following:
NOTE
Use the import function to create several new samples, see Import Biological Samples on
page 26.
1
Click the Prep icon
.
2
Click Biological Samples.
3
Click the + Create button.
4
Fill out the required fields Sample ID, Name, and Nucleic Acid type.
NOTE
Sample ID and sample name can only exist of alphanumeric characters, dash, or
underscore. Sample ID has to be unique and short; sample name can be more descriptive
to provide a human-readable identifier.
5
[Optional] Fill out the Organism (species) field.
6
[Optional] Fill out the Project fields. You can also generate a new project. A project is
optional, but if you do not specify it here, you must set it later, because the output
data gets stored to the project.
7
When finished, do one of the following:
• If you only want to select the newly created sample, click the Next: Prep
Libraries button. Continue with Prep Libraries on page 27.
• If you want to select multiple samples, click the Save & Continue Later. This
selection takes you back to the Biological Samples list, with the recently created
sample at the top of the list. Continue with Use Existing Biological Samples on
page 27.
Import Biological Samples
If you want to import new biological samples, do the following:
26
1
Click the Prep icon
.
2
Click Biological Samples.
3
Click the Import button.
4
If you have not generated an import file yet, click the template link, fill out the
samples, and be aware of the following when filling out the template:
• User Sample ID and sample name can only exist of alphanumeric characters,
dash, or underscore. Sample ID has to be unique and short; sample name can be
more descriptive to provide a human-readable identifier.
• The Organism (species) field is optional.
Part # 15044182 Rev. D
Figure 10 Import Sample Template
5
Click the Choose File button.
6
Browse to the import file and click Open.
7
Click Import.
8
When finished, do one of the following:
• If you only want to select the newly created samples, click the Next: Prep
Libraries button. Continue with Prep Libraries on page 27.
• If you want to select multiple samples, click the Save & Continue Later. This
selection takes you back to the Biological Samples list, with the recently created
sample at the top of the list. Continue with Use Existing Biological Samples on
page 27.
Use Existing Biological Samples
The Biological Samples list shows all available samples that you have created on your
account.
1
2
To select existing samples, do one of the following in the Biological Samples list:
} Select the checkboxes.
} Click the sample. If you want to select multiple samples, hold the Ctrl button.
} Select all samples by selecting the checkbox next to the SampleID header.
Click the Prep Libraries button in the top navigation bar.
Prep Libraries
On the Prep Libraries page, you assign indexes to biological samples, based on the
indexes available in the library preparation chosen. Every used well or tube contains a
separate library. Best practice is to set up the libraries in BaseSpace first. then you export
BaseSpace User Guide
27
How To Use BaseSpace
• The Project field is optional, but if you do not specify it here, you must set it later,
because the output data gets stored to the project.
• Fill out the Nucleic Acid column with DNA or RNA.
a file of your library settings, and use that to pipet the biological samples into the proper
wells or tubes.
NOTE
If you do not want to use indexed sequencing, you still have to assign your biological
sample to an index. Only when you set up your sequencing run, you specify that you do
not sequence the index.
1
Select the library prep type. BaseSpace now automatically assigns indexes to wells or
tubes, depending on the format of the library prep type.
Figure 11 Tube Set Up for Single Index Library Preparation Kit
Figure 12 Plate Set Up for Dual Index Library Preparation Kit
2
Enter the plate ID. The ID has to be unique.
3
Click the Auto Prep button to fill the plate or tubes automatically with all samples
listed.
NOTE
You can also manually drag the samples to wells or tubes:
1. Select one or more samples. To multiselect, hold Shift. To multiselect on Firefox or
Internet Explorer 9, click the well twice.
2. Drag selected samples to a position.
3. Check whether the indexes have been assigned to the proper samples. Hovering over
a position reveals the sample that is assigned to that position. You can drag samples
from position to position.
28
4
Save a file of your library settings by clicking the Download CSV button. Use this
file in the lab to indicate which biological samples get pipetted into specific wells.
5
When finished, do one of the following:
Part # 15044182 Rev. D
NOTE
If one of your samples is not assigned to a project, you cannot continue. Select the
sample, click the Set Project button, and assign it to a project. You can also
generate a new project.
Nextera Rapid Capture Considerations
If you are performing Nextera Rapid Capture, do the following:
} Choose Nextera Enrichment as library prep.
} Put biological samples belonging to the same enrichment on a row next to each
other.
} Change the index in the drop-down menu to the left of the rows to the proper index,
probably the same indexes for the different rows (enrichments).
} Name your plate in such a way that makes clear multiple enrichments are on the
plate, or add a note to that effect in the Note field.
Set Up Custom Library Prep Kit
You can set up a custom library prep kit the following way:
1
When prepping a library, select + Custom Library Prep Kit in the Library Prep Kit
dropdown menu.
The Custom Library Prep Kit Definition page opens.
BaseSpace User Guide
29
How To Use BaseSpace
} If you want to select the new plate or tubes, click the Pool Libraries button. Continue
with Pool Libraries on page 31.
} If you want to select multiple library preps or plates, do the following:
a Click the Save & Continue Later. This selection takes you to the Libraries list,
with the recently created set-up at the top of the list.
b Select the checkboxes in the Libraries list.
c Click the Pool Libraries button in the top navigation bar.
2
Fill out the name of the custom prep.
3
Fill out the supported run parameters.
4
Click template to download the index definition file template.
5
Click the Choose.csv File button to select and upload your custom index file.
6
Click Create New Kit to complete the process.
Your custom library prep has been added to the library kit drop-down.
Import Samples and Libraries
You can import libraries and associate them to new biological samples at the same time
the following way:
30
1
Click the Prep icon
.
2
Click Libraries.
3
Click the Import button.
4
If you have not generated an import file yet, click the template link, fill out the
samples, and be aware of the following when filling out the template:
• User Sample ID and sample name can only exist of alphanumeric characters,
dash, or underscore. Sample ID has to be unique and short; sample name can be
more descriptive to provide a human-readable identifier.
• The Species field is optional.
• The Project field is optional, but if you do not specify it here, you must set it later,
because the output data gets stored to the project.
• Fill out the Nucleic Acid column with DNA or RNA.
Part # 15044182 Rev. D
How To Use BaseSpace
Figure 13 Import Sample Template
5
Click the Choose File button.
6
Browse to the import file and click Open.
7
Click Import.
8
When finished, do one of the following:
• If you only want to select the newly created samples, click the Pool Libraries
button. Continue with Pool Libraries on page 31.
• If you want to select other libraries, click the Save & Continue Later. This
selection takes you back to the Libraries list, with the recently created sample at
the top of the list.
Pool Libraries
The Pool Libraries page allows you to pool samples and sequence them in the same run,
using the same analysis parameters.
1
Fill out the first pool ID. Pool ID has to be unique.
2
If needed, you can create additional pools on the right by clicking the + Add Pool
button and filling out the pool IDs.
• Colors of the wells correspond to the colors of the pools.
• You can hover over the wells to see the library IDs.
3
Drag and drop individual samples from their well on the plate to a pool.
You can multiselect by holding Shift. To multiselect on Firefox or Internet Explorer 9,
click the well twice.
BaseSpace User Guide
31
4
If you want to pool libraries from multiple plates, use the Plate drop-down menu to
specify the plate.
NOTE
You can also merge pools the following way:
• Click the Save & Continue Later. This selection takes you to the Pools list, with the
recently created plate at the top of the list.
• Select the checkboxes in the Pools list.
• Click the Merge Pools button in the top navigation bar.
5
Click the Plan Run button.
Nextera Rapid Capture Considerations
If you are performing Nextera Rapid Capture, make sure to assign only samples from the
same enrichment to one pool, and note it in the pool name.
Plan Runs
On the Planned Runs page, you can set up the parameters for the sequencing run on
your NextSeq instrument.
32
Part # 15044182 Rev. D
How To Use BaseSpace
1
Enter a name for your planned run.
2
[Optional] Enter the reagent barcode you plan to use, which links a reagent kit to this
run.
3
Select the rehyb checkbox if you are rehybridizing.
4
Fill out the Enter Cycles section:
} Single- vs. paired-end
} Number of cycles per read
5
Verify the Review Indexes section for the indexing strategy. For indexing, it is set
according to the index/library prep type chosen previously. If you choose to override
this default indexing scheme, you are required to select the Index type (Single, Dual,
or No Index). Make sure that you enter the number of index cycles accordingly. If
you have selected multiple libraries, you cannot specify No Index.
BaseSpace automatically checks if the indexes chosen all start with two Gs; if so, it
warns you that you should change your index strategy.
6
Verify the pool that is included in the planned run.
7
When your settings are complete, choose one of these options to continue:
} Click the Sequence button, which opens the Planned Runs list, and sets the state of
the recently planned run to Ready to Sequence.
BaseSpace User Guide
33
} Click the Save & Continue Later button, which opens the Planned Runs list, and
sets the state of the recently planned run to Planning.
NOTE
A planned run must be in the Ready to Sequence state in order for it to show up in the
Planned Runs list in the control software on the instrument.
8
If you want to change a planned run to the Ready to Sequence state, select the planned
run from the list. Click the Sequence arrow link in the top navigation bar on the
Planned Runs list page.
Your run now shows up in the Planned Runs list in the control software on your
NextSeq sequencing system. Complete the run from your sequencing instrument. A
sample sheet is not required. BaseSpace automatically generates FASTQ files when the
sequencing run is complete.
NOTE
You can connect as many instruments as you have BaseSpace nodes installed, up to a
maximum of six.
View Files and Results
The following topics describe how to view files and results in BaseSpace.
View Files from a Run
What is it
BaseSpace gives you an option to view your run files or download them individually.
When to use it
Use this option if you want to view files such as BCL files or images, you can also
download these files locally.
Why to use it
Use this option if you want to view files such as BCL files or images, you can also
download these files locally.
How to use it
1
Click the Runs icon.
2
Click the desired run.
3
From the Run Overview Page, select the Files icon from the left navigation menu.
4
Select the desired file to view.
View Indexing QC Page
What is it
The Indexing QC page lists count information for indexes used in the run. The Indexing
QC is only available if the run is an index run.
For more information, see Indexing QC on page 70.
34
Part # 15044182 Rev. D
Use this option when you want to access indexing QC results.
Why to use it
You can see unexpected results for a sample with a particular index, and have to
troubleshoot what happened. You can also use it to confirm that all indexed samples
were represented properly.
How to use it
1
Click the Runs icon.
2
Click the desired run.
3
There are two methods to go to the Indexing QC page:
• From the Run Overview page, click the Indexing QC link.
• From the Run Overview page, click the Indexing QC icon from the left
navigation menu.
You can select the displayed lane through the drop-down list.
The first table provides an overall summary of the indexing performance for that lane,
including:
Total Reads
The total number of reads for this lane.
PF Reads
The total number of passing filter reads for this lane.
% Reads Identified (PF)
The total fraction of passing filter reads assigned to an index.
CV
The coefficient of variation for the number of counts across
all indexes.
Min
The lowest representation for any index.
Max
The highest representation for any index.
Further information is provided regarding the frequency of individual indexes in both
table and graph form. The table contains several columns, including
Index Number
A unique number assigned to each index by BaseSpace for
display purposes.
Sample ID
The sample ID assigned to an index in the sample sheet.
Project
The project assigned to an index in the sample sheet.
Index 1 (I7)
The sequence for the first index read.
Index 2 (I5)
The sequence for the second index read.
% Reads Identified (PF)
The number of reads (only includes Passing Filter reads)
mapped to this index.
This information is also displayed in graphical form. In the graphical display, indexes
are ordered according to the unique Index Number assigned by BaseSpace.
BaseSpace User Guide
35
How To Use BaseSpace
When to use it
View Run Charts
What is it
The Charts page shows charts with run metrics.
When to use it
Use this option when you want to view charts such as Flow Cell, Data By Cycle, Data By
Lane, QScore Distribution, and QScore Heatmap.
For more information, see Charts on page 71.
Why to use it
Use this option if you want access to these various charts.
How to use it
1
Click the Runs icon.
2
Click the desired run.
3
There are two methods to go to the Charts page:
• From the Run Overview page, click the Charts link.
• From the Run Overview page, click the Charts icon from the left navigation
menu.
View Run Samples List
What is it
The Run Samples List contains a list of all the samples in the run.
When to use it
} Use this option when you want to see a list of all the samples in the run
} Use this option when you want to navigate to details regarding a specific sample.
Why to use it
Use this option if you want a quick way to view all the samples in a Run.
Use this option when you want to see more detail regarding your samples such as
genome name, sample, or FASTQ files.
How to use it
36
1
Click the Runs icon.
2
Click the desired run.
3
There are two methods to go to the Run Samples List:
Part # 15044182 Rev. D
• From the Runs Overview page, click the Samples icon from the left navigation
menu.
You can now click a sample to see the sample overview; for more information, see
Sample Overview Page on page 23.
View Run Summary
What is it
The Run Summary page has the overall statistics about the run.
When to use it
Use this option when you want to view information about the run such as percent
alignment, cycles, and densities.
Why to use it
Use this option if you want a quick breakdown of the statistics for a particular run.
How to use it
1
Click the Runs icon.
2
Click the desired run.
3
There are two methods to go to the Run Summary:
• From the Run Overview Page, select Run Summary button.
• From the Run Overview Page, select the Run Summary icon from the left
navigation menu.
The following metrics are displayed in the top table, split out by read and total:
Level
The sequencing read number.
Cycles
The number of cycles in the read.
Yield Total
The number of bases sequenced, which is updated as the run
progresses
Projected Total Yield
The projected number of bases expected to be sequenced at
the end of the run.
Yield Perfect
The number of bases in reads that align perfectly, as
determined by alignment to PhiX of reads derived from a
spiked in PhiX control sample. If no PhiX control sample is
run in the lane, this chart is not available.
BaseSpace User Guide
37
How To Use BaseSpace
• From the Runs Overview page, click the Samples link.
Yield <=3 errors
The number of bases in reads that align with three errors or
less, as determined by a spiked in PhiX control sample. If no
PhiX control sample is run in the lane, this chart is not
available, and shows a zero value. This value is not calculated
for NextSeq two-channel sequencing, and the value shown is
always zero.
Aligned
The percentage of the sample that aligned to the PhiX
genome, which is determined for each level or read
independently.
% Perfect [Num Usable Cycles]
The percentage of bases in reads that align perfectly, as
determined by a spiked in PhiX control sample, at the cycle
indicated in the brackets. If no PhiX control sample is run in
the lane, this chart shows 0% and the number of cycles used.
This value is not calculated for NextSeq two-channel
sequencing, and the value shown is always zero.
% <=3 errors [Num Usable
Cycles]
The percentage of bases in reads that align with three errors
or less, as determined by a spiked in PhiX control sample, at
the indicated cycle. If no PhiX control sample is run in the
lane, this chart shows 0% and the number of cycles used. This
value is not calculated for NextSeq two-channel sequencing,
and the value shown is always zero.
Error Rate
The calculated error rate of the reads that aligned to PhiX.
Intensity Cycle 1
The average of the A channel intensity measured at the first
cycle averaged over filtered clusters.
% Intensity Cycle 20
The corresponding intensity statistic at cycle 20 as a
percentage of that value at the first cycle. 100%x(Intensity at
cycle 20)/(Intensity at cycle 1).
%Q>=30
The percentage of bases with a quality score of 30 or higher,
th
respectively. This chart is generated after the 25 cycle, and
the values represent the current cycle.
The following metrics are available in the Read tables, split out by lane:
38
Tiles
The number of tiles per lane.
Density
The density of clusters (in thousands per mm ) detected by
image analysis, +/- one standard deviation.
Clusters PF
The percentage of clusters passing filtering, +/- one standard
deviation.
2
Part # 15044182 Rev. D
The value used by RTA for the percentage of molecules in a
cluster for which sequencing falls behind (phasing) or jumps
ahead (prephasing) the current cycle within a read.
For MiSeq and NextSeq, RTA generates phasing and
prephasing estimates empirically for every cycle. The value
displayed here is therefore not used in the actual
phasing/prephasing calculations, but is an aggregate value
determined from the first 25 cycles. For most applications,
the value reported is very close to the value that is applied.
For low diversity samples or samples with unbalanced base
composition, the reported value can diverge from the values
being applied because the value changes from cycle to cycle.
Reads
The number of clusters (in millions).
Reads PF
The number of clusters (in millions) passing filtering.
%Q>=30
The percentage of bases with a quality score of 30 or higher,
th
respectively. This chart is generated after the 25 cycle, and
the values represent the current cycle.
Yield
The number of bases sequenced which passed filter.
Cycles Err Rated
The number of cycles that have been error rated using PhiX,
starting at cycle 1.
Aligned
The percentage that aligned to the PhiX genome.
Error Rate
The calculated error rate, as determined by the PhiX
alignment. Subsequent columns display the error rate for
cycles 1–35, 1–75, and 1–100.
Intensity Cycle 1
The average of the A channel intensity measured at the first
cycle averaged over filtered clusters.
%Intensity Cycle 20
The corresponding intensity statistic at cycle 20 as a
percentage of that value at the first cycle. 100%x(Intensity at
cycle 20)/(Intensity at cycle 1).
View Sample Sheet from a Run
What is it
This option allows you to view the sample sheet that is tied to this run.
When to use it
Use this option when you want to view the associated sample sheet for this Run.
Why to use it
You want to check whether the sample sheet was set up properly.
How to use it
1
Click the Runs icon.
2
Click the desired run.
BaseSpace User Guide
39
How To Use BaseSpace
Phas./Prephas.
3
From the Run Overview Page, select the Sample Sheet icon from the left navigation
menu.
View the Project Sample List
What is it
The Project Sample List contains the list of samples in a project.
When to use it
} Use this option when you want to see a list of all the samples in the project
} Use this option when you want to navigate to details regarding a specific sample.
Why to use it
This option is an easy way to get to the details page of a sample.
How to use it
1
Click the Projects icon.
2
Click the desired project.
3
Click the Samples link from the left navigation menu.
You can now click a sample to see the sample overview; for more information, see
Sample Overview Page on page 23.
View the Analyses List
What is it
The Analyses List contains a list of app sessions in a project.
When to use it
Use this option when you want to navigate to details regarding a specific app session.
Why to use it
This option is an easy way to get to the details of a particular app session.
How to use it
1
Click the Projects icon.
2
Click the desired project.
You can now click an Analysis to see the results; for more information, see App Results
Page on page 23.
40
Part # 15044182 Rev. D
The following topics describe how to analyze samples further in BaseSpace, starting with
FASTQ files (HiSeq and MiSeq) or the results from sample-sheet driven workflows
(MiSeq).
} Launch Apps on page 41
• Launch the IGV App on page 41
• Run the VariantStudio App on page 43
} Run Sample Sheet Driven Workflow Apps on page 44
Launch Apps
What is it
You can launch apps that perform additional tertiary analysis, visualization, or
annotation of data.
When to use it
When you want to run a third-party or Illumina-provided app on your samples that is
not a samplesheet-driven app. Running apps can incur a charge.
How to use it
1
There are two ways to start an app:
• Navigate to the project, sample, or analysis that you want to run the app on,
click the Launch Apps button, and select the desired app from the drop-down
list.
• Go to the Apps button, select the desired app from the list and click Launch.
2
Read the End-User License Agreement and permissions, and click Accept if you are
ok with them.
NOTE
These instructions do not apply to sample sheet-driven apps (from MiSeq), which are
launched automatically. See Run Sample Sheet Driven Workflow Apps on page 44.
The app now guides you through the start-up process.
NOTE
For support for third-party apps, contact the vendor.
Launch the IGV App
What is it
The Integrative Genomics Viewer (IGV) of the Broad Institute is a fully featured genome
browser that allows you to visualize your sequence data in great detail. Illumina has
modified IGV to display alignment and variant data from BaseSpace (BAM and VCF
files).
When to use it
IGV enables you to perform variant analysis after launching Resequencing or Amplicon
workflows in BaseSpace. IGV is run on a project, which is the highest level directory and
BaseSpace User Guide
41
How To Use BaseSpace
Analyze Samples Further
contains one or more AppResults. IGV retains all of its native functions, including
loading data from your local computer.
Why to use it
To visualize your sequence data in greater detail.
How to use it
NOTE
Make sure that the Java Runtime Environment is installed on the computer in order for
IGV to work properly. Download Java here: java.com/en/.
Run the IGV App the following way:
1
Click the Projects icon.
2
Click desired project.
3
Click the Launch Apps button and select the IGV application from the drop-down
list.
4
Select the Accept button.
5
Depending on your browser, it asks you to open or save the *.jnlp file.
• For Internet Explorer, click the Open button.
• For Chrome, click the Keep button and then click file to open.
• For Firefox, select the Open with Java(TM) Web Start Launcher (default) option.
The IGV App opens on your desktop with the requested project loaded.
BaseSpace Data in IGV
The BaseSpace file browser shows data in BaseSpace that is available for viewing in
IGV. The directory structure shown is according to how data are organized in BaseSpace.
A project is the highest level directory and it contains one or more AppResults. If an
AppResult was the result of analyzing a single sample, then the sample name is
appended to the AppResult name. Each AppResult contains zero or more files.
Only alignment (BAM) and variant (VCF) files are shown in the file browser. Doubleclick a BAM or VCF file to load it as an IGV track. First load VCF files before BAM files
because read tracks can take up an entire IGV screen, which requires scrolling to see
variants.
Additional Reference Genomes
IGV contains a number of installed reference genomes:
}
}
}
}
Homo sapiens: Human hg19
Mus musculus: Mouse mm9
Saccharomyces cerevisiae: S. cerevisiae (sacCer2)
Arabidopsis thaliana: A. thaliana (TAIR10)
In addition, you can download the following additional reference genomes from
Illumina:
42
Part # 15044182 Rev. D
Run the VariantStudio App
What is it
The Illumina VariantStudio data analysis software application enables researchers to
identify and classify disease-relevant variants quickly, and then communicate significant
findings in concise and actionable reports.
This application provides an intuitive framework for non-expert users. It offers:
} Flexible filtering options
} Streamlined variant classification
} Rapid and rich annotations
} Customizable reporting
When to use it
Use VariantStudio if you want to:
} Explore and isolate key variants
} Categorize variants and determine biological impact
How to use it
Launch the VariantStudio app the following way:
1
Click the Apps button
.
2
Select the VariantStudio app
3
Select the project you want to run the app on. You need to be the owner of the project
you select.
4
Click Continue.
5
If you use the app for the first time, install VariantStudio:
a Click the Install VariantStudio button.
b Run the setup.exe file. Your web browser may ask you to save the file first. After
the download has completed, double-click the setup file.
c You may be prompted with a security warning. Click Install.
6
Click the Launch VariantStudio.
from the list and click Launch.
The VariantStudio application opens on your desktop with the requested project loaded.
For instructions on how to run VariantStudio, see the VariantStudio User Guide.
BaseSpace User Guide
43
How To Use BaseSpace
} PhiX: ftp://igenome:[email protected]/PhiX/Illumina/RTA/PhiX_
Illumina_RTA.tar.gz
} Staphylococcus aureus (strain NCTC 8325): ftp://igenome:[email protected]/Staphylococcus_aureus_NCTC_8325/NCBI/2006-0213/Staphylococcus_aureus_NCTC_8325_NCBI_2006-02-13.tar.gz
} E. coli (strain DH10B): ftp://igenome:[email protected]/Escherichia_
coli_K_12_DH10B/NCBI/2008-03-17/Escherichia_coli_K_12_DH10B_NCBI_2008-0317.tar.gz
For E. coli, rename the first line in genome.fa from >chr to >ecoli.
Run Sample Sheet Driven Workflow Apps
Sample sheet driven workflow apps are kicked-off automatically, based on the workflow
that is specified in the sample sheet.
You can resubmit the sample sheet and requeue the run with new analysis parameters
one time. See Fix Sample Sheet / Re-Run Workflow on page 53.
Download Files
The following topics describe how to download files in BaseSpace.
Download Individual Files
What is it
BaseSpace allows you to download data as a package, individually, or as a group of
FASTQ files. This topic describes how to download individual files with the file browser.
When to use it
Use this option when you want to download individual files, and do not need all files
for a run, sample, project, or analysis.
Why to use it
If you only want to download individual files, it saves you time, because you are not
downloading all the other files.
How to use it
1
Click the Runs icon or Projects icon.
2
Navigate to the file you want to download
3
Click the file.
4
Click the Download button.
BaseSpace now downloads the files to the desired location.
Download Multiple FASTQ Files
What is it
BaseSpace allows you to download data as a package, individually, or as a group of
FASTQ files. This topic describes how to download a group of FASTQ files with the
downloader.
When to use it
Use this option when you want to download FASTQ files per sample.
44
Part # 15044182 Rev. D
If you only want to download a number of FASTQ files for a sample, it saves you time,
because you are not downloading all the other files.
How to use it
1
Click the Runs icon or Projects icon.
2
Click the desired run or project.
3
Click the desired sample in the Samples pane.
4
In the Files pane, select the checkboxes for the desired FASTQ files.
5
Click the Download Selected button.
The BaseSpace Downloader guides you through the download process, and starts the
download of the files to the desired location.
Download Run File Package
What is it
BaseSpace allows you to download data as a package, individually, or as a group of
FASTQ files. This topic describes how to download a package of files in a run.
The packages available depend on your workflow; packages that are grayed out are not
available for download. There are four types of data packages:
}
}
}
}
Variant Data, containing VCF files with variant calls.
Aligned Data, containing BAM files with aligned reads.
Unaligned Data, containing FASTQ files with unaligned reads.
SAV Data, containing files describing the set-up of the run and InterOp files.
For more information about file types, see BaseSpace Files on page 78.
When to use it
} Use this option when you want to download a packaged (zipped) file for Variant,
Aligned, Unaligned, or SAV data.
} Do not use if you only want individual files.
Why to use it
If you want to download for Variant, Aligned, Unaligned, or SAV data in a neatly
packaged file versus downloading the files one-by-one.
How to use it
1
Click the Runs icon.
2
Click the desired run.
3
Click the Download button.
BaseSpace User Guide
45
How To Use BaseSpace
Why to use it
4
Select the desired data option.
Download Project or Analysis Package
What is it
BaseSpace allows you to download data as a package, individually, or as a group of
FASTQ files. This topic describes how to download all files in a project or analysis with
the BaseSpace downloader.
When to use it
Use this option when you want to download the files in a project or analysis.
Why to use it
If you want all files in a project or analysis, to archive or perform analysis outside of
BaseSpace.
How to use it
1
Click the Projects icon.
2
Download the package:
• If you want the project files, click Download Project.
• If you want the analysis files, click the desired analysis, and click Download
Analysis.
The BaseSpace Downloader guides you through the download process, and starts the
download of the files to the desired location.
Share Data
Data in BaseSpace can be shared with collaborators in a couple of different ways. You
can either share data at a run or project level, via an email invitation or through a
hyperlink. With the email invitation option, only the accounts with the specified email
can view shared data. Sharing via a hyperlink option allows anyone with access to the
hyperlink to be able to view the shared data, as long as the hyperlink is still active.
Sharing is for read-only access. If you want a collaborator to have write access, see
Transfer Ownership on page 53.
NOTE
Runs and projects have separate permissions. If you share a run, the project
associated with that run is not shared automatically, meaning samples and
app results are not accessible to collaborators of the run.
The following topics describe how to share.
46
Part # 15044182 Rev. D
What is it
Sharing using the Get Link option allows you to share a project or a run with any
collaborator who has access to the link. The hyperlink can be turned on or off by setting
the activate or deactivate option. Anyone can access the project or run when the link is
activated. Furthermore, anyone who previously accepted the link still has access to the
run even though the link is deactivated.
NOTE
If you want more control, use the email share option where you can specify who can view
the project (Share a Project Using the Email Option on page 47).
When to use it
} Use this option when you do not want to assign the project to a specific person.
} This share link can be forwarded to many other collaborators while the link is still
active.
} Do not use this option if you want to confine the list of who has access to this
project.
Why to use it
If you want an easy way to share a link without the hassle of adding specific people by
email and setting permissions.
How to use it
1
Click the Projects icon.
2
Click the desired project.
3
Click the Get Link button.
4
Click the Activate button.
5
Copy the URL to share with collaborators.
The link is active until the Deactivate option is selected. The path to deactivate a sharing
link is similar:
1
Navigate to the shared item.
2
Click the Get Link button.
3
Click the Deactivate button.
Share a Project Using the Email Option
What is it
Sharing using the "Share" option allows you to share a Project or Run with a specified
collaborator via an email link. The specified collaborators receive an email with a link to
the Project or Run and only that person can view the corresponding data.
BaseSpace User Guide
47
How To Use BaseSpace
Share a Project with Get Link
NOTE
The email option allows greater control over who can view your data. Sharing using the
Get Link options gives anyone access to your data, as long as the link is left activated. For
more information, see Share a Project with Get Link on page 47
When to use it
} Use this option of you want to share your project easily with collaborators.
} Use this option if you want to be able to control who has access to the projects.
Why to use it
Use this option if you want to be able to control who has access to the project.
How to use it
1
Click the Projects icon.
2
Click the desired project.
3
Click Share Project.
4
In the Share Settings dialog box, enter the collaborators email address and click the
Invite button.
NOTE
The invitation email address must match your BaseSpace login email address or else
your collaborator is not able to view the project.
5
4. Click Save Settings.
Share a Run with Get Link
What is it
Sharing using the Get Link option allows you to share a run with any collaborator who
has access to the link. The hyperlink can be turned on or off by setting the activate or
deactivate option. Anyone can access the project or run when the link is activated.
Furthermore, anyone who previously accepted the link still has access to the run even
though the link is deactivated.
Sharing runs with the Get Link option is similar to sharing projects with the Get Link
option.
NOTE
If you want more control, use the email share option where you can specify who can view
the project (Share a Run Using the Email Option on page 49).
When to use it
} Use this option when you do not want to assign the run to a specific person.
} This share link can be forwarded to many other collaborators while the link is still
active.
} Do not use this option if you want to confine the list of who has access to this run.
48
Part # 15044182 Rev. D
This option is an easy way to share a link without having to specify email and setting
permissions.
How to use it
1
Click the Runs icon.
2
Click the desired run.
3
Click the More button and select the Get Link option.
4
Click the Activate button.
5
Copy the URL to share with collaborators.
The link is active until the Deactivate option is selected. The path to deactivate a sharing
link is similar:
1
Navigate to the run
2
Click the Get Link button.
3
Click the Deactivate button.
NOTE
Runs and projects have separate permissions. If you share a run, the project
associated with that run is not shared automatically, meaning samples and
app results are not accessible to collaborators of the run.
Share a Run Using the Email Option
What is it
Sharing using the Share option allows you to share a project or run with a specified
collaborator via an email link. Specified collaborators receive an email with a link to the
project or run and only that person can view the corresponding data.
NOTE
The email option allows greater control over who can view your data. The Get Link
options gives anyone access to your data, as long as the link is left activated. See Share a
Run with Get Link on page 48 for more information.
When to use it
} Use this option of you want to share your run easily with collaborators.
} Use this option if you want to be able to control who has access to the run.
Why to use it
Use this option if you want to be able to control who has access to the run.
BaseSpace User Guide
49
How To Use BaseSpace
Why to use it
How to use it
1
Click the Runs icon.
2
Click the desired run.
3
Click the Share button.
4
In the Share Settings dialog box, enter the collaborators email address and click the
Invite button.
NOTE
The invitation email address must match your BaseSpace login email address or else
your collaborator is not able to view the project.
5
4. Click the Save Settings button.
NOTE
Runs and projects have separate permissions. If you share a run, the project
associated with that run is not shared automatically, meaning samples and
app results are not accessible to collaborators of the run.
Project and Sample Management
The following topics describe how to manage projects and samples in BaseSpace.
Edit Project Details
What is it
The way to edit project details.
When to use it
Use this option when you want to change details regarding the project such as the
description or project name.
Why to use it
Use this option if you have to edit the project name or description
How to use it
50
1
Click the Projects icon.
2
Click the desired project.
3
Click the Edit Project button.
4
Change project details in the Edit Project dialog box
5
Click Save.
Part # 15044182 Rev. D
What is it
A method to set up a new project.
When to use it
} When you want to analyze a sample in the context of two different projects
} When you want to transfer ownership of samples to a collaborator, but still keep a
copy yourself
} When you want to split a project into multiple projects
How to use it
1
Click the Projects icon.
2
Click New Project link in the top left corner.
3
Enter a new name and description.
4
Click the Create button.
To copy samples into the new project, seeCopy Samples on page 52.
Combine Samples
What is it
A method to combine (merge) samples.
When to use it
To merge the data from two or more different sequencing runs on the same sample. The
samples need to have the same read lengths.
When not to use it
You cannot combine samples that do not have the same read lengths.
How to use it
1
Click the Projects icon.
2
Click the desired project.
3
Click the Samples link from the left navigation menu.
4
Select the checkboxes of the samples you want to combine.
5
Click the Combine button.
6
Click the Combine button in the pop-up screen.
BaseSpace User Guide
51
How To Use BaseSpace
Set Up a New Project
Copy Samples
What is it
A method to copy samples from one project to another.
When to use it
} When you want to analyze a sample in the context of two different projects
} When you want to transfer ownership of a sample to a collaborator, but still keep a
copy yourself
} When you have assigned a sample to the wrong project
How to use it
1
Click the Projects icon.
2
Click the desired project.
3
Click the Samples link from the left navigation menu.
4
Select the checkboxes of the samples you want to combine.
5
Click the Copy button.
6
Select the new project in the drop-down list.
7
Click the Copy button.
Upload Files
What is it
A method to upload files, such as VCF or manifest files, to BaseSpace.
When to use it
} Some apps need additional files generated outside of BaseSpace.
} Some apps provide downstream analysis for results generated outside of BaseSpace.
How to use it
1
Click the Projects icon.
2
Go to the project you want to add the file to.
NOTE
You have to be the owner of the project.
52
3
Click Import.
4
Select the import file type.
5
Select the file to upload in one of two ways:
• Drag and drop the file in the Drag and Drop box
• Browse to the file through the select file link and click Open
6
BaseSpace now uploads the file to your project.
Part # 15044182 Rev. D
Type in the analysis name in the Name of analysis box.
8
[Optional]: Associate the file you are importing with the samples used as inputs. By
making this association, the analysis is listed on the sample detail page. You can
also locate these uploaded files later by navigating to one of the samples.
9
Click Complete Import.
Transfer Ownership
What is it
A method to hand control of data over to a collaborator or customer.
When to use it
} If you want to give control of your data to a collaborator
} If you sequenced samples for a customer, for example, if you are a core lab or service
provider.
How to use it
1
Select the project or run you want to transfer:
• Project:
a
Click the Projects icon.
b
Click the desired project.
c
Click the Transfer Owner button.
• Run:
2
a
Click the Runs icon.
b
Click the desired run.
c
Click the More button, and then select the Transfer Ownership option.
Enter new owner email and an optional message in the Transfer Ownership dialog
box.
3
Click Continue.
BaseSpace sends the new owner an email asking to accept the ownership of the run or
project. The ownership transfer of the project or run completes when the new owner
accepts. Now you have no control over that run or project anymore. You are also not
able to see that run or project, unless the new owner shares it with you; see Share Data on
page 46 for more information.
Fix Sample Sheet / Re-Run Workflow
What is it
The Fix Sample Sheet page lets you correct errors in your sample sheet, or set up a new
analysis to requeue.
BaseSpace User Guide
53
How To Use BaseSpace
7
When to use it
} To fix errors in the sample sheet.
} To change analysis parameters.
} To change indexing details.
Why to use it
} Errors in the sample sheet can prevent BaseSpace from processing a run. This option
allows BaseSpace to finish the analysis.
} The first analysis was suboptimal. You can resubmit the sample sheet and requeue
the run with new analysis parameters one time.
} The index settings for samples were wrong. This option allows you to correct the
settings.
NOTE
You can only submit a corrected sample sheet and requeue the run one time.
How to use it
1
You can reach the Fix Sample Sheet page two ways:
• A run can have a Needs Attention state. Open the run, and click the Fix Sample
Sheet link.
• Go to a run, select the More drop-down list, and then select Fix Sample Sheet.
The Fix Sample Sheet page opens. If BaseSpace has detected an error, it shows the
issue above the black sample sheet editor.
2
54
Depending on the complexity of the change, you have two options:
Part # 15044182 Rev. D
validating the sample sheet as you edit; any remaining issues are displayed
above the sample sheet editor.
• More complex change: use Illumina Experiment Manager (IEM).
a
If you have not installed IEM yet, click the Illumina Experiment Manager
(IEM) link, and install IEM.
b
Open IEM.
c
Import the original sample sheet from your system in IEM and edit it, or
generate a new sample sheet. See the Illumina Experiment Manager User
Guide for instructions.
d
Copy and paste the sample sheet into the Sample Sheet Editor in BaseSpace.
BaseSpace validates the sample sheet; any issues are displayed above the
sample sheet editor.
3
When you are done editing and the sample sheet is valid, click the Queue Analysis
button, and BaseSpace starts analyzing the run using the new sample sheet. You can
only resubmit a sample sheet and requeue the run one time.
NOTE
If your edits result in an invalid sample sheet, the Queue Analysis button is not available.
You can return to the original using the Load Original button.
Common Sample Sheet Fixes
If a sample sheet is invalid, it could be because the genome path is not set up correctly.
This situation is indicated through the Genome Path Unknown Genome warning (as in the
example). The paths of the standard BaseSpace genomes have to conform to the
following relative paths:
Arabidopsis_thaliana\NCBI\build9.1\Sequence\WholeGenomeFASTA
Bos_taurus\Ensembl\UMD3.1\Sequence\WholeGenomeFASTA
Escherichia_coli_K_12_DH10B\NCBI\2008-0317\Sequence\WholeGenomeFASTA
Homo_sapiens\UCSC\hg19\Sequence\WholeGenomeFASTA
Mus_musculus\UCSC\mm9\Sequence\WholeGenomeFASTA
PhiX\Illumina\RTA\Sequence\WholeGenomeFASTA
Rattus_norvegicus\UCSC\rn4\Sequence\WholeGenomeFASTA
Saccharomyces_cerevisiae\UCSC\sacCer2\Sequence\WholeGenomeFASTA
Staphylococcus_aureus_NCTC_8325\NCBI\2006-0213\Sequence\WholeGenomeFASTA
Fix indexes In the Prep Tab
What is it
You can correct errors in your indexes through the Prep tab and regenerate the FASTQ
files one time.
BaseSpace User Guide
55
How To Use BaseSpace
• Easy fix: edit the sample sheet in the sample sheet editor. BaseSpace keeps
When to use it
} To change indexes and regenerate FASTQ files for samples that have already been
sequenced.
When not to use it
} For runs that are not set up with the Prep tab.
} For runs where the wrong library prep kit was selected in the Prep tab.
How to use it
1
Go to the run with the wrong indexes.
2
Click the Run Settings button
3
Go to the bottom of the page, and click the pool.
4
Click the Plate ID of the plate.
5
Click Edit.
6
Correct the index in the dropdown menu.
7
Go to the affected run.
8
In the More dropdown menu, select Generate FASTQ Files.
in the navigation task pane.
BaseSpace now starts regenerating the FASTQ files with the corrected indexes. The new
FASTQ files get added to the sample list and you can identify the new files by date.
TIP
If you do not want to identify the samples by date, you can also rename the sample_ID, or
assign the new FASTQ files to a new project in the Prep tab.
NOTE
You can only submit a corrected sample sheet and requeue the run one time.
Purchasing
To buy app sessions from third-party vendors, you need iCredits. This chapter describes
how to purchase iCredits and manage your wallet and purchases.
Access Your Wallet
What is it
The wallet contains your iCredits and credit card information.
When to use it
Use it to update credit cards or add iCredits.
Why to use it
You need iCredits if you want to purchase app sessions. The wallet allows you to
manage your iCredits.
56
Part # 15044182 Rev. D
How To Use BaseSpace
How to use it
1
Go to your account.
2
Click the Wallet button.
When you are on the Wallet screen, you can add iCredits or credit cards.
Adding iCredits
What is it
iCredits allows you to purchase app sessions.
When to use it
Use Adding iCredits when you are running low on iCredits, and you want to buy app
sessions.
Why to use it
Third-party apps can provide functionality needed for your analysis, but cost iCredits.
How to use it
You can either add iCredits directly, or create a purchase order.
} Add iCredits directly:
1
Go to the Wallet screen.
2
Click the Add More button.
3
Enter the amount and select the desired credit card from the drop-down list
4
Click the Continue button.
5
Click the Purchase button.
A message appears stating how many credits have been added.
6 Click the OK button.
} Create a purchase order:
1
Go to the Wallet screen.
2
Click the Create a Quote link.
3
Enter the amount and the desired account.
4
Click the Create Quote button.
The purchase order appears, and when processed, Illumina credits the account
with the iCredits.
5
BaseSpace User Guide
You can generate a paper copy using the Print button.
57
Adding Credit Card
What is it
You can use a credit card to purchase iCredits, but you have to add it first to BaseSpace.
When to use it
When you want to buy iCredits with a new credit card.
Why to use it
Third-party apps can provide functionality you need for your analysis, but cost iCredits.
How to use it
1
Go to the Wallet screen.
2
Click the Add Credit Card button.
3
Fill in the credit card info and click Submit.
View Purchase History
What is it
The Purchase History page contains detailed information about purchases, adjustments,
and balance for your account.
When to use it
When you want to review your purchases.
Why to use it
You want to track where you spend your iCredits, or to see if a refund has been
processed
How to use it
1
Go to your account.
2
Click the Purchase History button.
Now you can review your purchases, filter on type of transaction, and sort by order
number, vendor, date, or total iCredits used.
Search for Runs, Projects, and Samples
What is it
The Search box allows you to find runs, projects, and samples.
When to use it
When you want to do a quick search for something
58
Part # 15044182 Rev. D
Use if there are a many runs and you want a quick way to search.
How to use it
1
Type in the run, project, or sample name in the search field and hit enter or click
magnifying glass icon.
2
Select the desired run, project, or sample in Search Results. You can also filter the
search results by these categories using the drop-down list at the left of the Search
Results page.
BaseSpace User Guide
59
How To Use BaseSpace
Why to use it
Workflow Reference
This section describes the Illumina workflow apps.
Resequencing on page 60
Custom/PCR Amplicon on page 60
Library QC on page 61
Small RNA Analysis on page 61
Metagenomics Analysis on page 62
De Novo Assembly on page 62
Generate FASTQ on page 62
Resequencing
The sample sheet driven Resequencing app compares the DNA sequence in the samples
against a reference genome and identifies any variants (SNPs or indels) relative to the
reference sequence. The main output files generated by the Resequencing workflow are
BAM files (containing the alignment results) and VCF files (containing the variant calls).
The Resequencing workflow can only be used to analyze MiSeq sequencing results.
The Resequencing App Results Page provides four graphs:
}
}
}
}
Low Percentage Graph on page 63
High Percentage Graph on page 64
Clusters Graph on page 65
Mismatch Graph on page 67
The Resequencing Sample Details Page provides five panes:
}
}
}
}
}
Samples Table on page 89
Coverage Graph on page 91
Q-Score Graph on page 91
Variant Score Graph on page 91
Variants Table on page 92
The graphs and variants table display data for the chromosome that is selected in the
drop-down list.
Custom/PCR Amplicon
The Custom/PCR Amplicon workflow evaluates short regions of amplified DNA
(amplicons) for variants. The focused sequencing of amplicons enables high-coverage
sequencing of particular regions across many samples. The main output files generated
by the Custom/PCR Amplicon workflow are BAM files (containing the aligned reads)
and VCF files (containing the variant calls). The Custom/PCR Amplicon workflow
supports multiple manifests (containing the probe regions) and consensus sequence
reporting for multi-manifest runs.
The Custom/PCR Amplicon workflow can only be used to analyze MiSeq sequencing
results
The Custom/PCR Amplicon App Results Page provides a four graphs:
60
Part # 15044182 Rev. D
Workflow Reference
}
}
}
}
Low Percentage Graph on page 63
High Percentage Graph on page 64
Clusters Graph on page 65
Mismatch Graph on page 67
The PCR amplicon Sample Details page provides six panes:
}
}
}
}
}
}
Samples Table on page 89
Amplicons Table on page 91
Coverage Graph on page 91
Q-Score Graph on page 91
Variant Score Graph on page 91
Variants Table on page 92
The graphs and variants table display data for the amplicon that is selected in the
Amplicon Table.
Library QC
The Library QC workflow is intended for evaluating the abundance, fragment length,
and sample quality of libraries. The analysis performed in the Library QC workflow is
similar to the Resequencing workflow. The Library QC workflow does not perform
variant calling; instead, it provides a report of the characteristics of each sample.
The Library QC workflow can only be used to analyze MiSeq sequencing results
The Library QC App Results page provides a four graphs:
}
}
}
}
Low Percentage Graph on page 63
High Percentage Graph on page 64
Clusters Graph on page 65
Mismatch Graph on page 67
The Library QC Sample Details Page provides four panes:
}
}
}
}
Samples Table on page 89
Coverage Graph on page 91
Q-Score Graph on page 91
Sample QC Table on page 93
The graphs display data for the chromosome that is selected in the drop-down list.
Small RNA Analysis
The Small RNA workflow measures the abundance of various types of short RNA
sequences, particularly miRNA. It is suitable for identifying and quantifying miRNA
expression and for comparing abundance across samples.
The Small RNA workflow can only be used to analyze MiSeq sequencing results
The small RNA analysis App Results page provides access to two graphs.
} Clusters Graph on page 65
} Trimmed Lengths on page 68
The Small RNA Sample Details Page provides three panes:
BaseSpace User Guide
61
} Small RNA Samples Table on page 90
} Small RNA Pie Chart on page 92
} Small RNA Graph on page 93
Metagenomics Analysis
The Metagenomics workflow enables the analysis of 16S ribosomal RNA, a component
of the 30S subunit of prokaryotic ribosomes. The 16S ribosomal sequences from an
environmental sample can be analyzed to determine which organisms are present. In
MiSeq Reporter, a naïve Bayesian classifier (based on Wang et al., Appl. Environ.
Microbiol. (2007) Aug; 73(16):5261-7) has been implemented that has been optimized for
Illumina paired-end reads. Our 16S rRNA data store contains sequences from the May
2011 release of the Greengenes 16S rRNA database. The main output of this workflow is
a classification of reads at several taxonomic levels (kingdom, phylum, class, order,
family, genus).
The Metagenomics workflow can only be used to analyze MiSeq sequencing results
The metagenomics App Results page provides one graph.
} Clusters Graph on page 65
The Metagenomics Sample Details Page provides two panes:
} Samples Table on page 89
} Metagenomics Pie Chart on page 93
De Novo Assembly
The Assembly workflow enables de novo assembly of a draft genome directly from the
sequencing reads. Because assembly relies upon significant coverage of the genome, this
workflow is best suited for the assembly of small genomes (up to 5 to 10 MB). The
assembly process uses the Velvet software (Velvet: algorithms for de novo short read
assembly using de Bruijn graphs (2008) D.R. Zerbino and E. Birney. Genome Research
18:821–829).
The Assembly workflow can only be used to analyze MiSeq sequencing results
The de novo assembly App Results page provides access to three graphs:
} Low Percentage Graph on page 63
} High Percentage Graph on page 64
} Clusters Graph on page 65
The De Novo Assembly Sample Details Page provides two panes:
} De Novo Assembly Samples Table on page 90
} Samples Graph on page 93
Generate FASTQ
The app Generate FASTQ does not perform any analysis, but generates FASTQ files for
download and shows basic summary data. The Generate FASTQ app can be used with
all sequencing instruments that BaseSpace supports. For more information, see FASTQ
Files on page 86.
Generate FASTQ is also used to analyze RNA-Seq samples from MiSeq.
62
Part # 15044182 Rev. D
This section provides the data references, and describes the files, charts, graphs, and
tables.
Workflow Graphs on page 63
Run Summary on page 68
Indexing QC on page 70
Charts on page 71
BaseSpace Files on page 78
Sample Details Page Components on page 89
Isaac App Results Page on page 94
Workflow Graphs
The workflow graphs provide metrics that allow you to judge the success of the
sequencing run for that sample. The following topics provide information about these
charts.
Low Percentage Graph
What is it?
The Low Percentage Graph represents statistics of the run that are generally near zero in
an ideal run. These graphs are a subset of all metrics of the sequencing run itself.
When to use it.
Use the Low Percentage Graph to judge sequencing metrics for a sample. This graph can
also be used when troubleshooting unexpected results.
BaseSpace User Guide
63
Data Reference
Data Reference
When not to use it.
This graph is not a good predictor of yields or quality of final results.
How to use it
Metric
Description
Phasing 1
The percentage of molecules in a cluster that fall behind the current cycle within
Read 1.
Phasing 2
The percentage of molecules in a cluster that fall behind the current cycle within
Read 2.
PrePhasing
1
The percentage of molecules in a cluster that run ahead of the current cycle
within Read 1.
PrePhasing
2
The percentage of molecules in a cluster that run ahead of the current cycle
within Read 2.
Mismatch
1
The average percentage of mismatches for Read 1 over all cycles.
Mismatch
2
The average percentage of mismatches for Read 2 over all cycles.
You can expand a chart by clicking the expand button.
High Percentage Graph
What is it?
The High Percentage Graph represents run statistics that are generally near 100% in an
ideal run. These graphs are metrics of the sequencing run or the analysis step.
64
Part # 15044182 Rev. D
Use the High Percentage Graph to judge sequencing metrics for a sample. This graph can
also be used when troubleshooting unexpected results.
When not to use it.
Do not use the High Percentage Graph to look at tertiary analysis metrics.
How to use it
Metric
Description
|20/|1 1
The ratio of intensities at cycle 20 to the intensities at cycle 1 for Read 1.
|20/|1 2
The ratio of intensities at cycle 20 to the intensities at cycle 1 for Read 2.
Align 1
The percentage of clusters that aligned to the reference in Read 1.
Align 2
The percentage of clusters that aligned to the reference in Read 2.
PE
Orientation
The percentage of paired-end alignments with the expected orientation.
PE
Resynthesis
The ratio of first cycle intensities for Read 1 to first cycle intensities for Read
2.
PF
The percentage of clusters passing filters.
You can expand a chart by clicking the expand button.
Clusters Graph
What is it?
The Clusters graph provides information about the number of clusters that are detected
during sequencing, split out by the following groups:
}
}
}
}
}
Total
Passing filter
Unaligned
Unindexed
Duplicates
BaseSpace User Guide
65
Data Reference
When to use it.
When to use it.
Use the Clusters Graph to judge clustering success and relative cluster density between
lanes (for flow cells with multiple lanes), and as a snap shot of the overall run. Can
assist with identifying overclustering issues.
When not to use it.
Do not use the Clusters Graph to look at tertiary analysis metrics.
How to use it
A cluster represents a clonal spot on the flow cell that contains the amplified DNA
strands to be sequenced.
X-axis
Description
Raw
The total number of clusters detected in the run.
PF
The total number of clusters passing filter in the run.
Unaligned
The total number of clusters passing filter that did not align to the reference
genome, if applicable. Clusters that are unindexed are not included in the
unaligned count.
Unindexed
The total number of clusters passing filter that were not associated with any
index sequence in the run.
Duplicate
The total number of clusters for a paired-end sequencing run that are
considered to be PCR duplicates. PCR duplicates are defined as two clusters
from a paired-end run where both clusters have the exact same alignment
positions for each read.
You can expand a chart by clicking the expand button.
66
Part # 15044182 Rev. D
What is it?
The Mismatch Graph plots the mismatches between a sequence read and a reference
genome after alignment.
When to use it.
To judge the quality of the sequencing run. Poor sequencing runs usually lead to high
numbers of mismatches.
When not to use it.
} When you are using a reference genome that has many errors or low confidence
stretches.
} When sample and reference differ too much.
} In de novo applications.
} In Methyl-Seq applications
How to use it
Mismatch refers to any mismatch between sequence read and a reference genome after
alignment.
} Cycle: Plots the % mismatches for all clusters in a run versus cycle
Mismatches can be due to two main causes:
} Sequencing errors (non-specific, random)
} Differences between your sample and the reference genomes
Make sure to keep in mind these causes when interpreting the mismatch rates.
You can expand a chart by clicking the expand button.
BaseSpace User Guide
67
Data Reference
Mismatch Graph
Trimmed Lengths
Y Axis
X Axis
Clusters
Trimmed
Lengths
Description
Histogram of reads indicating length at trimming because they
reached adapter.
Run Summary
What is it?
The Run Summary page shows tables with basic data quality metrics that are
summarized per lane and per read. All the statistics are given as means and standard
deviations over the tiles used in the lane.
When to use it.
When looking at basic data quality metrics for a run from primary analysis.
When not to use it.
The tables do not contain information about samples or projects. The tables also do not
contain app-generated information (secondary or tertiary analysis).
How to use it.
The following metrics are displayed in the top table, split out by read and total:
68
Level
The sequencing read number.
Cycles
The number of cycles in the read.
Yield Total
The number of bases sequenced, which is updated as the run
progresses
Projected Total Yield
The projected number of bases expected to be sequenced at
the end of the run.
Yield Perfect
The number of bases in reads that align perfectly, as
determined by alignment to PhiX of reads derived from a
spiked in PhiX control sample. If no PhiX control sample is
run in the lane, this chart is not available.
Part # 15044182 Rev. D
The number of bases in reads that align with three errors or
less, as determined by a spiked in PhiX control sample. If no
PhiX control sample is run in the lane, this chart is not
available, and shows a zero value. This value is not calculated
for NextSeq two-channel sequencing, and the value shown is
always zero.
Aligned
The percentage of the sample that aligned to the PhiX
genome, which is determined for each level or read
independently.
% Perfect [Num Usable Cycles]
The percentage of bases in reads that align perfectly, as
determined by a spiked in PhiX control sample, at the cycle
indicated in the brackets. If no PhiX control sample is run in
the lane, this chart shows 0% and the number of cycles used.
This value is not calculated for NextSeq two-channel
sequencing, and the value shown is always zero.
% <=3 errors [Num Usable
Cycles]
The percentage of bases in reads that align with three errors
or less, as determined by a spiked in PhiX control sample, at
the indicated cycle. If no PhiX control sample is run in the
lane, this chart shows 0% and the number of cycles used. This
value is not calculated for NextSeq two-channel sequencing,
and the value shown is always zero.
Error Rate
The calculated error rate of the reads that aligned to PhiX.
Intensity Cycle 1
The average of the A channel intensity measured at the first
cycle averaged over filtered clusters.
% Intensity Cycle 20
The corresponding intensity statistic at cycle 20 as a
percentage of that value at the first cycle. 100%x(Intensity at
cycle 20)/(Intensity at cycle 1).
%Q>=30
The percentage of bases with a quality score of 30 or higher,
th
respectively. This chart is generated after the 25 cycle, and
the values represent the current cycle.
The following metrics are available in the Read tables, split out by lane:
Tiles
The number of tiles per lane.
Density
The density of clusters (in thousands per mm ) detected by
image analysis, +/- one standard deviation.
Clusters PF
The percentage of clusters passing filtering, +/- one standard
deviation.
BaseSpace User Guide
2
69
Data Reference
Yield <=3 errors
Phas./Prephas.
The value used by RTA for the percentage of molecules in a
cluster for which sequencing falls behind (phasing) or jumps
ahead (prephasing) the current cycle within a read.
For MiSeq and NextSeq, RTA generates phasing and
prephasing estimates empirically for every cycle. The value
displayed here is therefore not used in the actual
phasing/prephasing calculations, but is an aggregate value
determined from the first 25 cycles. For most applications,
the value reported is very close to the value that is applied.
For low diversity samples or samples with unbalanced base
composition, the reported value can diverge from the values
being applied because the value changes from cycle to cycle.
Reads
The number of clusters (in millions).
Reads PF
The number of clusters (in millions) passing filtering.
%Q>=30
The percentage of bases with a quality score of 30 or higher,
th
respectively. This chart is generated after the 25 cycle, and
the values represent the current cycle.
Yield
The number of bases sequenced which passed filter.
Cycles Err Rated
The number of cycles that have been error rated using PhiX,
starting at cycle 1.
Aligned
The percentage that aligned to the PhiX genome.
Error Rate
The calculated error rate, as determined by the PhiX
alignment. Subsequent columns display the error rate for
cycles 1–35, 1–75, and 1–100.
Intensity Cycle 1
The average of the A channel intensity measured at the first
cycle averaged over filtered clusters.
%Intensity Cycle 20
The corresponding intensity statistic at cycle 20 as a
percentage of that value at the first cycle. 100%x(Intensity at
cycle 20)/(Intensity at cycle 1).
Indexing QC
What is it?
The Indexing QC page lists count information for indexes used in the run as designated
in the sample sheet. The Indexing QC is only available if the run is an index run.
When to use it.
Look at this page when you want to see indexing information for a lane after the index
read is completed.
When not to use it.
This page only provides indexing information. Do not use it for runs that were not
indexed, or to look at other primary, secondary, or tertiary analysis metrics. This
information is a quick estimation and can vary slightly from final output.
How to use it.
You can select the displayed lane through the drop-down list.
70
Part # 15044182 Rev. D
Total Reads
The total number of reads for this lane.
PF Reads
The total number of passing filter reads for this lane.
% Reads Identified (PF)
The total fraction of passing filter reads assigned to an index.
CV
The coefficient of variation for the number of counts across
all indexes.
Min
The lowest representation for any index.
Max
The highest representation for any index.
Further information is provided regarding the frequency of individual indexes in both
table and graph form. The table contains several columns, including
Index Number
A unique number assigned to each index by BaseSpace for
display purposes.
Sample ID
The sample ID assigned to an index in the sample sheet.
Project
The project assigned to an index in the sample sheet.
Index 1 (I7)
The sequence for the first index read.
Index 2 (I5)
The sequence for the second index read.
% Reads Identified (PF)
The number of reads (only includes Passing Filter reads)
mapped to this index.
This information is also displayed in graphical form. In the graphical display, indexes
are ordered according to the unique Index Number assigned by BaseSpace.
Charts
The Charts page shows five charts with run metrics. You can expand a chart by clicking
the expand button.
The following topics provide information about these charts.
Flow Cell Chart
What is it?
The Flow Cell Chart shows color-coded graphical quality metrics per tile for the entire
flow cell.
BaseSpace User Guide
71
Data Reference
The first table provides an overall summary of the indexing performance for that lane,
including:
When to use it.
Use the Flow Cell Chart to judge local differences per cycle, per lane, or per read in
sequencing metrics on a flow cell. It is also an easy way to see the %Q30 metric, which is
an excellent single metric to judge a run.
When not to use it.
Do not use the Flow Cell Chart to look at secondary or tertiary analysis metrics.
How to use it.
The Flow Cell Chart has the following features:
} You can select the displayed metric, surface, cycle, and base through the drop-down
lists.
} The color bar to the right of the chart indicates the values that the colors represent.
} The chart is displayed with tailored scaling by default.
} Tiles that have not been measured or are not monitored are gray.
You can monitor the following quality metrics with this chart:
72
Intensity
This chart shows the intensity by color and cycle of the 90%
percentile of the data for each tile.
FWHM
The average full width of clusters at half maximum (in
pixels). Used to display focus quality.
% Base
The percentage of clusters for which the selected base (A, C,
T, or G) has been called.
%Q>20, %Q>30
The percentage of bases with a quality score of > 20 or > 30,
th
respectively. These charts are generated after the 25 cycle,
and the values represent the current scored cycle.
Median Q-Score
The median Q-Score for each tile over all bases for the
th
current cycle. These charts are generated after the 25 cycle.
This plot is best used to examine the Q-scores of your run as
it progresses. Bear in mind that the %Q30 plot can give an
over simplified view due to its reliance on a single threshold.
Density
The density of clusters for each tile (in thousands per mm ).
2
Part # 15044182 Rev. D
The density of clusters passing filter for each tile (in
2
thousands per mm ).
Clusters
The number of clusters for each tile (in millions).
Clusters PF
The number of clusters passing filter for each tile (in
millions).
Error Rate
The calculated error rate, as determined by a spiked in PhiX
control sample. If no PhiX control sample is run in the lane,
this chart is not available.
% Phasing, % Prephasing.
The estimated percentage of molecules in a cluster for which
sequencing falls behind (phasing) or jumps ahead
(prephasing) the current cycle within a read.
% Aligned
The percentage of reads from clusters in each tile that aligned
to the PhiX genome.
Perfect Reads
The percentage of reads that align perfectly, as determined
by a spiked in PhiX control sample. If no PhiX control sample
is run in the lane, this chart is all gray.
Corrected Intensity
The intensity corrected for cross talk between the color
channels by the matrix estimation and phasing and
prephasing.
Called Intensity
The intensity for the called base.
Signal to Noise
The signal to noise ratio is calculated as mean called intensity
divided by standard deviation of non-called intensities.
Note the variable scales used on these different parameters.
Data By Cycle Plot
What is it?
The Data by Cycle plot shows the progression of quality metrics during a run as a line
graph.
BaseSpace User Guide
73
Data Reference
Density PF
When to use it.
Use the Data By Cycle Plot to judge the progression of quality metrics during a run on a
cycle by cycle basis.
When not to use it.
Do not use the Data By Cycle Plot to look at secondary or tertiary analysis metrics, or
aggregate analysis for a whole lane regardless of cycle.
How to use it.
The Data by Cycle plots allow you to follow the progression of quality metrics during a
run. These plots have the following features:
} You can select the displayed metric and base through the drop-down lists.
} The symbol in the top right-hand corner toggles the plot between pane view and full
screen view.
You can monitor the following quality metrics with this plot:
Intensity
This chart shows the intensity by color and cycle of the 90%
percentile of the data for each tile.
FWHM
The average full width of clusters at half maximum (in
pixels). Used to display focus quality.
% Base
The percentage of clusters for which the selected base (A, C,
T, or G) has been called.
%Q>20, %Q>30
The percentage of bases with a quality score of > 20 or > 30,
th
respectively. These charts are generated after the 25 cycle,
and the values represent the current scored cycle.
Median Q-Score
The median Q-Score for each tile over all bases for the
th
current cycle. These charts are generated after the 25 cycle.
This plot is best used to examine the Q-scores of your run as
it progresses. Bear in mind that the %Q30 plot can give an
over simplified view due to its reliance on a single threshold.
Error Rate
The calculated error rate, as determined by a spiked in PhiX
control sample. If no PhiX control sample is run in the lane,
this chart is not available.
Perfect Reads
The percentage of reads that align perfectly, as determined
by a spiked in PhiX control sample. If no PhiX control sample
is run in the lane, this chart is all gray.
Corrected Intensity
The intensity corrected for cross talk between the color
channels by the matrix estimation and phasing and
prephasing.
Called Intensity
The intensity for the called base.
Signal to Noise
The signal to noise ratio is calculated as mean called intensity
divided by standard deviation of non-called intensities.
You can expand a chart by clicking the expand button.
74
Part # 15044182 Rev. D
What is it?
The QScore Distribution Plot shows a bar graph that allows you to view the number of
bases by quality score. The quality score is cumulative for current cycle and previous
cycles, and only bases from reads that pass the quality filter are included.
When to use it.
Use it to judge the QScore distribution for a run, which is an excellent indicator for run
performance.
When not to use it.
Do not use the QScore Distribution Plot to look at secondary or tertiary analysis metrics,
or metrics other than quality scores.
How to use it.
The QScore Distribution pane shows plots that allow you to view the number of reads by
quality score. The quality score is cumulative for current cycle and previous cycles, and
only reads that pass the quality filter are included.
These plots have the following features:
} You can select the displayed read, and cycle through the drop-down lists.
} The symbol in the top right-hand corner toggles the plot between pane view and full
screen view.
The QScore is based on the Phred scale. The following list shows Q-scores and the
corresponding chance that the base call is wrong:
}
}
}
}
Q10: 10% chance of wrong base call
Q20: 1% chance of wrong base call
Q30: 0.1% chance of wrong base call
Q40: 0.01% chance of wrong base call
BaseSpace User Guide
75
Data Reference
QScore Distribution
You can slide the threshold (set at >=Q30 by default) to examine the proportion of bases
at or above any particular Q-score. When using Q-score binning, this plot reflects the
subset of Q-scores used.
Data by Lane Plot
What is it?
The Data by Lane plots allow you to view quality metrics per lane.
When to use it.
Use the Data By Lane Plot to judge the difference in quality metrics between lanes.
When not to use it.
Do not use the Data By Lane Plot to look at secondary or tertiary analysis metrics.
How to use it.
The Data by Lane plots have the following features:
} You can select the displayed metric through the drop-down lists.
} The symbol in the top right-hand corner toggles the plot between pane view and full
screen view.
The plots share a number of characteristics.
} The plots show the distribution of mean values for a given parameter across all tiles
in a given lane.
} The red line indicates the median tile value for the parameter displayed.
} Blue boxes are for raw clusters, green boxes for clusters passing filter.
} The box outlines the interquartile range (the middle 50% of the data) for the tiles
analyzed for the data point.
} The error bars delineate the minimum and maximum without outliers.
} The outliers are the values that are more than 1.5 times the interquartile range below
the 25th percentile, or more than 1.5 times the interquartile range above the 75th
percentile. Outliers are indicated as dots.
76
Part # 15044182 Rev. D
} The density of clusters for each tile (in thousands per mm2).
} The number of clusters for each tile (in millions).
} The estimated percentage of molecules in a cluster for which sequencing falls behind
(phasing) or jumps ahead (prephasing) the current cycle within a read.
} The percentage of reads from clusters in each tile that aligned to the PhiX genome.
You can expand a chart by clicking the expand button.
QScore Heatmap
What is it?
A heatmap of the Q-scores.
When to use it.
For a quick overview of the Q-scores over the cycles.
When not to use it.
Do not use the QScore Distribution Plot to look at secondary or tertiary analysis metrics,
or metrics other than quality scores.
How to use it.
The QScore Heatmap shows plots that allow you to view the QScore by cycle. These
plots have the following features:
} The color bars to the right of each chart indicate the values that the colors represent.
The charts are displayed with tailored scaling; the scale is always 0 to 100% of
maximum value.
} The symbol in the top right-hand corner toggles the plot between pane view and full
screen view.
You can expand a chart by clicking the expand button.
BaseSpace User Guide
77
Data Reference
You can monitor the following quality metrics with this plot:
BaseSpace Files
BaseSpace uses and produces various files. See the topics in this section for details.
Sample Sheet
What is it?
The sample sheet is a comma-delimited file (SampleSheet.csv) that stores the information
to set up and analyze a sequencing experiment. The file includes a list of samples and
their index sequences, as well as the workflow to be employed.
When to use it.
Every run in BaseSpacerequires an associated sample sheet to define projects and
samples, assign indexes, and run sample sheet driven workflow apps.
When not to use it.
Not applicable; you always employ a sample sheet with BaseSpace.
How to use it
The following table is for reference purposes only. For details about creating or modifying
a sample sheet, see the MiSeq Reporter User Guide, MiSeq Sample Sheet Quick Reference
Guide, or HiSeq User Guide. You can create a sample sheet using the Illumina Experiment
Manager Software.
Table 1 Sample Sheet Fields
78
Row
Description
Investigator
Name
(Optional) The name of the investigator.
Project Name
(Optional) A descriptive name of the run.
Part # 15044182 Rev. D
Description
Experiment
Name
(Optional) A descriptive name of the experiment.
Date
The date the sequencing run was performed.
Workflow
The analysis workflow for the run.
Manifests
This section is only for the Amplicon workflow and is the
name of the file (provided by Illumina or created by IEM)
used in the Amplicon Workflow. It is required for the
Amplicon workflow and ignored by other workflows. The
file specifies the alignments to a reference and the targeted
reference regions used in the Amplicon workflow.
Site Reports
This section is optional and only for the Resequencing and
Custom Amplicon workflows. Each line below the
SiteReports section header is the name of a SiteReport
Input File. This file designates positions on a given
chromosome to report the genotype found at that position.
Data
• Contaminants – The path to the folder containing FASTA
files of contaminants (used only for SmallRNA)
• GenomePath – The reference genome folder containing
the FASTA files to be used in the alignment step
• Index – Represents the sequence string of the first index.
Valid characters in this string are A, C, G, T, and N. 'N'
matches any base.
• Index2 – Represents the sequence string of the second
index. Valid characters in this string are A,C,G,T, and N.
'N' matches any base.
• MiRNA – The path to the folder containing FASTA files
of mature miRNAs (used only for SmallRNA)
• RNA – The path to the folder containing FASTA files of
small RNAs (used only for SmallRNA)
• SampleID – A string identifier for the sample, which is
usually a bar code but can have any value. Letters and
numbers only; some special characters can be detrimental
for file creation.
• Manifest – The manifest file letter as designated by the
manifest field.
• Name – A string identifier for the sample, which is used
in the reporting web page.
Data Reference
Row
BAM Files
What is it?
The Sequence Alignment/Map (SAM) format is a generic alignment format for storing
read alignments against reference sequences, supporting short and long reads (up to 128
Mb) produced by different sequencing platforms. SAM is a text format file that is humanreadable. The Binary Alignment/Map (BAM) keeps the same information as SAM, but in
a compressed, binary format that is only machine readable.
BaseSpace User Guide
79
When to use it.
Allows you to see alignments. Use it for direct interpretation or as a starting point for
tertiary analysis with downstream analysis tools that are compatible with BAM. BAM
files are suitable for viewing with an external viewer such as IGV or the UCSC Genome
Browser.
When not to use it.
Do not use it with tools that are not compatible with the BAM format. Do not use with
applications that cannot handle large files, as BAM files can get large, depending on the
application and data.
How to use it
If you use an app in BaseSpace that uses BAM files as input, the app locates the file
when launched. If using BAM files in other tools, download the file to use it in the
external tool.
Detailed Description
Go to samtools.sourceforge.net/SAM1.pdf to see the exact SAM specification.
gVCF Files
What is it?
This application also produces the Genome Variant Call Format file (gVCF). gVCF was
developed to store sequencing information for both variant and non-variant positions,
which is required for human clinical applications. gVCF is a set of conventions applied
to the standard variant call format (VCF) 4.1 as documented by the 1000 Genomes
Project. These conventions allow representation of genotype, annotation, and other
information across all sites in the genome in a compact format. Typical human wholegenome sequencing results expressed in gVCF with annotation are less than 1 Gbyte, or
about 1/100 the size of the BAM file used for variant calling. If you are performing
targeted sequencing, gVCF is also an appropriate choice to represent and compress the
results.
gVCF is a text file format, stored as a gzip compressed file (*.genome.vcf.gz).
Compression is further achieved by joining contiguous non-variant regions with similar
properties into single ‘block’ VCF records. To maximize the utility of gVCF, especially for
high stringency applications, the properties of the compressed blocks are conservative.
Block properties like depth and genotype quality reflect the minimum of any site in the
block. The gVCF file can be indexed (creating a *.tbi file) and used with existing VCF
tools such as tabix and IGV, making it convenient both for direct interpretation and as a
starting point for tertiary analysis.
For more information, see sites.google.com/site/gvcftools/home/about-gvcf.
When to use it.
Use it for direct interpretation or as a starting point for tertiary analysis with
downstream analysis that is compatible with gVCF, such as tabix and IGV.
When not to use it.
Do not use it with tools that are not compatible with the gVCF format.
80
Part # 15044182 Rev. D
Apps that use gVCF files find it when kicked off and directed to the sample. If using
gVCF files in other tools, download the file to use it in the outside tool.
Detailed Description
The following conventions are used in the variant caller gVCF files.
Samples per File
There is only one sample per gVCF file.
Non-Variant Blocks Using END Key
Contiguous non-variant segments of the genome can be represented as single records in
gVCF. These records use the standard 'END' INFO key to indicate the extent of the
record. Even though the record can span multiple bases, only the first base is provided
in the REF field to reduce file size.
The following is a simplified segment of a gVCF file, describing a segment of non-variant
calls (starting with an A) on chromosome 1 from position 51845 to 51862.
##INFO=<ID=END,Number=1,Type=Integer,Description="End position
of the variant described in this record">#CHROM POS ID REF
ALT QUAL FILTER INFO FORMAT NA19238chr1 51845 . A . . PASS
END=51862
Any field provided for a block of sites, such as read depth (using the DP key), shows the
minimum value that is observed among all sites encompassed by the block. Each
sample value shown for the block, such as the depth (DP), is restricted to a range where
the maximum value is within 30% or 3 of the minimum. For example, for sample value
range [x,y], y <= x+max(3,x*0.3). This range restriction applies to each of the sample
values printed in the final block record.
Indel Regions
Sites that are "filled in" inside of deletions have additional changes:
All deletions:
} Sites inside of any deletion are marked with the deletion filters, in addition to any
filters that have already been applied to the site.
} Sites inside of deletions cannot have a genotype or alternate allele quality score
higher than the corresponding value from the enclosing indel.
Heterozygous deletions:
} Sites inside of heterozygous deletions are altered to have haploid genotype entries
(e.g. "0" instead of "0/0", "1" instead of "1/1").
} Heterozygous SNV calls inside of heterozygous deletions are marked with the
"SiteConflict" filter and their genotype is unchanged.
Homozygous deletions:
} Homozygous reference and no-call sites inside of homozygous deletions have
genotype "."
} Sites inside of homozygous deletions that have a non-reference genotype are marked
with a “SiteConflict” filter, and their genotype is unchanged.
} Site and genotype quality are set to "."
BaseSpace User Guide
81
Data Reference
How to use it
The described modifications reflect the notion that the site confidence is bound within
the enclosing indel confidence.
On occasion, the variant caller produces multiple overlapping indel calls that cannot be
resolved into two haplotypes. If this case, all indels and sites in the region of the overlap
are marked with the IndelConflict filter.
Genotype Quality for Variant and Non-variant Sites
The gVCF file uses an adapted version of genotype quality for variant and non-variant
site filtration. This value is associated with the key GQX. The GQX value is intended to
represent the minimum of {Phred genotype quality assuming the site is variant, Phred
genotype quality assuming the site is non-variant}. The reason for using this value is to
allow a single value to be used as the primary quality filter for both variant and nonvariant sites. Filtering on this value corresponds to a conservative assumption
appropriate for applications where reference genotype calls must be determined at the
same stringency as variant genotypes, i.e.:
} An assertion that a site is homozygous reference at GQX >= 30 is made assuming the
site is variant.
} An assertion that a site is a non-reference genotype at GQX >= 30 is made assuming
the site is non-variant.
Section Descriptions
The gVCF file contains the following sections:
} Meta-information lines start with ## and contain metadata, config information, and
define the values that the INFO, FILTER, and FORMAT fields can have.
} The header line starts with # and names the fields that the data lines use. These
fields are #CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, followed
by one or more sample columns.
} Data lines that contain information about one or more positions in the genome.
If you extract the variant lines from a gVCF file, you produce a conventional variant VCF
file.
Field Descriptions
The fixed fields #CHROM, POS, ID, REF, ALT, QUAL are defined in the VCF 4.1
standard provided by the 1000 Genomes Project. The fields ID, INFO, FORMAT, and
sample are described in the meta-information.
} CHROM: Chromosome: an identifier from the reference genome or an anglebracketed ID String ("<ID>") pointing to a contig.
} POS: Position: The reference position, with the first base having position 1. Positions
are sorted numerically, in increasing order, within each reference sequence CHROM.
There can be multiple records with the same POS. Telomeres are indicated by using
positions 0 or N+1, where N is the length of the corresponding chromosome or
contig.
} ID: Semi-colon separated list of unique identifiers where available. If this ID is a
dbSNP variant, it is encouraged to use the rs number. No identifier is present in
more than one data record. If there is no identifier available, then the missing value
is used.
} REF: Reference bases: A,C,G,T,N; there can be multiple bases. The value in the POS
field refers to the position of the first base in the string. For simple insertions and
82
Part # 15044182 Rev. D
}
}
}
BaseSpace User Guide
83
Data Reference
}
deletions in which either the REF or one of the ALT alleles would otherwise be
null/empty, the REF and ALT strings include the base before the event. This
modification is reflected in the POS field. The exception is when the event occurs at
position 1 on the contig, in which case they include the base after the event. If any of
the ALT alleles is a symbolic allele (an angle-bracketed ID String "<ID>"), the
padding base is required. In that case, POS denotes the coordinate of the base
preceding the polymorphism.
ALT: Comma-separated list of alternate non-reference alleles called on at least one of
the samples. Options are:
• Base strings made up of the bases A,C,G,T,N
• Angle-bracketed ID String (”<ID>”)
• Break-end replacement string as described in the section on break-ends.
If there are no alternative alleles, then the missing value is used.
QUAL: Phred-scaled quality score for the assertion made in ALT. i.e. -10log_10
probability (call in ALT is wrong). If ALT is ”.” (no variant), this score is -10log_10 p
(variant). If ALT is not ”.”, this score is -10log_10 p(no variant). High QUAL scores
indicate high confidence calls. Although traditionally people use integer Phred
scores, this field is permitted to be a floating point to enable higher resolution for low
confidence calls if desired. If unknown, the missing value is specified. (Numeric)
FILTER: PASS if this position has passed all filters, i.e. a call is made at this
position. Otherwise, if the site has not passed all filters, a semicolon-separated list of
codes for filters that fail. gVCF files use the following values:
• PASS: position has passed all filters.
• IndelConflict: Locus is in region with conflicting indel calls.
• SiteConflict: Site genotype conflicts with proximal indel call, which is typically a
heterozygous SNV call made inside of a heterozygous deletion.
• LowGQX: Locus GQX (minimum of {Genotype quality assuming variant
position,Genotype quality assuming non-variant position}) is less than 30 or not
present.
• HighDPFRatio: The fraction of base calls filtered out at a site is greater than 0.3.
• HighSNVSB: SNV strand bias value (SNVSB) exceeds 10. High strand bias
indicates a potential high false-positive rate for SNVs.
• HighSNVHPOL: SNV contextual homopolymer length (SNVHPOL) exceeds 6.
• HighREFREP: Indel contains an allele that occurs in a homopolymer or
dinucleotide track with a reference repeat greater than 8.
• HighDepth: Locus depth is greater than 3x the mean chromosome depth.
INFO: Additional information. INFO fields are encoded as a semicolon-separated
series of short keys with optional values in the format: <key>=<data>[,data]. gVCF
files use the following values:
• END: End position of the region described in this record.
• BLOCKAVG_min30p3a: Non-variant site block. All sites in a block are
constrained to be non-variant, have the same filter value, and have all sample
values in range [x,y], y <= max(x+3,(x*1.3)). All printed site block sample values
are the minimum observed in the region spanned by the block.
• SNVSB: SNV site strand bias.
• SNVHPOL: SNV contextual homopolymer length.
• CIGAR: CIGAR alignment for each alternate indel allele.
• RU: Smallest repeating sequence unit extended or contracted in the indel allele
relative to the reference. If longer than 20 bases, RUs are not reported.
• REFREP: Number of times RU is repeated in reference.
• IDREP: Number of times RU is repeated in indel allele.
} FORMAT: Format of the sample field. FORMAT specifies the data types and order of
the subfields. gVCF files use the following values:
• GT: Genotype.
• GQ: Genotype Quality.
• GQX: Minimum of {Genotype quality assuming variant position, Genotype
quality assuming non-variant position}.
• DP: Filtered base call depth used for site genotyping.
• DPF: Base calls filtered from input before site genotyping.
• AD: Allelic depths for the ref and alt alleles in the order listed. For indels, this
value only includes reads that confidently support each allele (posterior
probability 0.999 or higher that read contains indicated allele vs all other
intersecting indel alleles).
• DPI: Read depth associated with indel, taken from the site preceding the indel.
} SAMPLE: Sample fields as defined by the header.
VCF Files
What is it?
VCF is a text file format that contains information about variants found at specific
positions in a reference genome. The file format consists of meta-information lines, a
header line, and then data lines. Each data line contains information about a single
variant.
When to use it.
Use it for direct interpretation or as a starting point for tertiary analysis with
downstream analysis that is compatible with VCF, such as IGV or the UCSC Genome
Browser.
When not to use it.
Do not use it with tools that are not compatible with the VCF format.
NOTE
Windows recognizes VCF files as an Outlook contact file. Do not open VCF files in
Outlook.
How to use it
If you use an app in BaseSpace that uses VCF files as input, the app locates the file when
launched. If using VCF files in other tools, download the file to use it in the external tool.
Detailed Description
The file naming convention for VCF files is as follows: SampleName_S#.vcf (where # is
the sample number determined by ordering in the sample sheet).
The header of the VCF file describes the tags used in the remainder of the file and has
the column header:
##fileformat=VCFv4.1
##fileDate=20120317
##source=SequenceAnalysisReport.vshost.exe
##reference=
##phasing=none
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=TI,Number=.,Type=String,Description="Transcript ID">
84
Part # 15044182 Rev. D
A sample line of the VCF file, with the data that is used to populate each column
described:
chr22 16285888 rs76548004 T C 17 d15;q20 DP=11;TI=NM_
001136213;GI=POTEH;CD GT:GQ 1/0:17
Setting
Description
ALT
The alleles that differ from the reference read. For example, an insertion of a
single T could show reference A and alternate AT.
CHROM
The chromosome of the reference genome. Chromosomes appear in the same
order as the reference FASTA file (generally karyotype order)
FILTER
If all filters are passed, the' PASS' is written. The possible filters are as follows:
• q20 – The variant score is less than 20. (Configurable using the
VariantFilterQualityCutoff setting in the config file)
• r8 – For an Indel, the number of repeats in the reference (of a one- or two-base
repeat) is greater than 8. (Configurable using the IndelRepeatFilterCutoff
setting in the config file)
FORMAT
The format column lists fields (separated by colons), for example, "GT:GQ". The
list of fields provided depends on the variant caller used. The available fields are
as follow:
AD – Entry of the form X,Y where X is the number of reference calls, Y the
number of alternate calls
GQ – Genotype quality
GT – Genotype. 0 corresponds to the reference base, 1 corresponds to the first
entry in the ALT column, 2 corresponds to the second entry in the ALT column,
etc. The '/' indicates that there is no phasing information.
NL – Noise level; an estimate of base calling noise at this position
SB – Strand bias at this position. Larger negative values indicate more bias;
values near zero indicate little strand bias.
VF – Variant frequency. The percentage of reads supporting the alternate allele.
ID
The rs number for the SNP obtained from dbSNP. If there are multiple rs
numbers at this location, the list is semi-colon delimited. If no dbSNP entry exists
at this position, the missing value ('.') is used.
BaseSpace User Guide
85
Data Reference
##INFO=<ID=GI,Number=.,Type=String,Description="Gene ID">
##INFO=<ID=CD,Number=0,Type=Flag,Description="Coding Region">
##FILTER=<ID=q20,Description="Quality below 20">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype
Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
Setting
Description
INFO
The possible entries in the INFO column:
• AD – Entry of the form X,Y where X is the number of reference calls, Y the
number of alternate calls.
• CD – A flag indicating that the SNP occurs within the coding region of at least
one RefGene entry
• DP – The depth (number of base calls aligned to this position)
• GI – A comma-separated list of gene IDs read from RefGene
• NL – Noise level; an estimate of base calling noise at this position.
• TI – A comma-separated list of transcript IDs read from RefGene
• SB – Strand bias at this position.
• VF – Variant frequency. The number of reads supporting the alternate allele.
POS
The one-based position of this variant in the reference chromosome. The
convention for VCF files is that, for SNPs, this base is the reference base with the
variant. For indels or deletions, this base is the reference base immediately
before the variant. Variants are in order of position.
QUAL
A Phred-scaled quality score assigned by the variant caller. Higher scores
indicate higher confidence in the variant (and lower probability of errors). For a
quality score of Q, the estimated probability of an error is 10-(Q/10). For
example, the set of Q30 calls has a 0.1% error rate. Many variant callers assign
quality scores (based on their statistical models) which are high relative to the
error rate observed in practice.
REF
The reference genotype. For example, a deletion of a single T can read reference
TT and alternate T.
SAMPLE
The sample column gives the values specified in the FORMAT column. One
MAXGT sample column is provided for the normal genotyping (assuming the
reference). For reference, a second column is provided for genotyping assuming
the site is polymorphic. See the Starling documentation for more details.
NOTE
Variant files for Isaac also contain off-target variant calls, with filter.
FASTQ Files
What is it?
BaseSpace converts *.bcl files into FASTQ files, which contain base call and quality
information for all reads passing filtering.
When to use it.
FASTQ files can be used as sequence input for alignment and other secondary analysis
software.
When not to use it.
Do not use it with tools that are not compatible with the FASTQ format.
How to use it
BaseSpace automatically generates FASTQ files in sample sheet-driven workflow apps.
Other apps that perform alignment and variant calling also automatically use FASTQ
files.
86
Part # 15044182 Rev. D
FASTQ files are named with the sample name and the sample number, which is a
numeric assignment based on the order that the sample is listed in the sample sheet.
Example:
Data\Intensities\BaseCalls\samplename_S1_L001_R1_001.fastq.gz
• samplename—The sample name provided in the sample sheet. If a sample
name is not provided, the file name includes the sample ID, which is a required
field in the sample sheet and must be unique.
• S1—The sample number based on the order that samples are listed in the
sample sheet starting with 1. In this example, S1 indicates that this sample is the
first sample listed in the sample sheet.
NOTE
Reads that cannot be assigned to any sample are written to a FASTQ file for sample
number 0, and excluded from downstream analysis.
• L001—The lane number.
• R1—The read. In this example, R1 means Read 1. For a paired-end run, there is
at least one file with R2 in the file name for Read 2.
• 001—The last segment is always 001.
Compression
FASTQ files are saved compressed in the GNU zip format (an open source file
compression program), indicated by the .gz file extension.
Format
Each entry in a FASTQ file consists of four lines:
} Sequence identifier
} Sequence
} Quality score identifier line (consisting only of a +)
} Quality score
NOTE
For the Undetermined FASTQ files only, the sequence observed in the index read is
written to the FASTQ header in place of the sample number. This information can be
useful for troubleshooting demultiplexing.
@<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<xpos>:<y-pos> <read>:<is filtered>:<control number>:<sample
number>
The following table describes the elements:
Element
Requirements
@
@
<instrument> Characters
allowed:
a–z, A–Z, 0–9 and
underscore
<run number> Numerical
<flowcell
Characters
ID>
allowed:
a–z, A–Z, 0–9
BaseSpace User Guide
Description
Each sequence identifier line starts with @
Instrument ID
Run number on instrument
87
Data Reference
Naming
Element
<lane>
<tile>
<x_pos>
<y_pos>
<read>
Requirements
Numerical
Numerical
Numerical
Numerical
Numerical
<is
filtered>
<control
number>
Y or N
<sample
number>
Numerical
Numerical
Description
Lane number
Tile number
X coordinate of cluster
Y coordinate of cluster
Read number. 1 can be single read or Read 2 of pairedend
Y if the read is filtered (did not pass), N otherwise
0 when none of the control bits are on, otherwise it is
an even number.
On HiSeq X systems, control specification is not
performed and this number is always 0.
Sample number from sample sheet
An example of a valid entry is as follows; note the space preceding the read number
element:
@SIM:1:FCX:1:15:6329:1045 1:N:0:2
TCGCACTCAACGCCCTGCATATGACAAGACAGAATC
+
<>;##=><9=AAAAAAAAAA9#:<#<;<<<????#=
Control Values
If the read is not identified as a control, then the 10th column (<control number>) is
zero. If the read is identified as a control, the number is greater than zero, and the value
specifies what type of control it is. The value is the decimal representation of a bit-wise
encoding scheme. In that scheme bit 0 has a decimal value of 1, bit 1 a value of 2, bit 2 a
value of 4, and so on.
Quality Scores
A quality score (or Q-score) expresses an error probability. In particular, it serves as a
convenient and compact way to communicate very small error probabilities.
Given an assertion, A, the quality score, Q(A), expresses the probability that A is not true,
P(~A), according to the relationship:
Q(A) =-10 log10(P(~A))
where P(~A) is the estimated probability of an assertion A being wrong.
The relationship between the quality score and error probability is demonstrated with the
following table:
Quality score, Q
(A)
10
20
30
Error probability, P
(~A)
0.1
0.01
0.001
Quality Scores Encoding
In FASTQ files, quality scores are encoded into a compact form, which uses only 1 byte
per quality value. In this encoding, the quality score is represented as the character with
an ASCII code equal to its value + 33. The following table demonstrates the relationship
between the encoding character, its ASCII code, and the quality score represented.
88
Part # 15044182 Rev. D
QScore
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Symbol
=
>
?
@
A
B
C
D
E
F
G
H
I
ASCII
Code
61
62
63
64
65
66
67
68
69
70
71
72
73
QScore
28
29
30
31
32
33
34
35
36
37
38
39
40
Health Runs
What is it?
A user can choose whether to send anonymous system health information to Illumina.
Health runs help Illumina diagnose issues and improve our products. The information
consists of InterOp files and log files, and is not tied to any user account. This option is
on by default.
Sample Details Page Components
The Sample Details Page shows metrics for a sample that the app that ran the analysis
generated. Different panes are displayed on this page depending on the app.
Samples Table
The samples table contains general analysis information for the sample. Depending on
the workflow, the following metrics can be shown:
Column
Description
Sample Name
The sample name from the sample sheet.
Sample ID
The sample ID from the sample sheet. Sample ID must always be a unique
value.
Genome
The name of the reference genome.
Chr
The reference target or chromosome name.
Cluster PF
The number of clusters passing filter for the sample that aligned to the
reference genome.
Mismatch
The percentage mismatch to reference averaged over cycles per read
(Read 1/Read 2).
BaseSpace User Guide
89
Data Reference
Table 2 ASCII Characters Encoding Q-scores 0–40
Symbol ASCII
QSymbol ASCII
Code
Score
Code
!
33
0
/
47
"
34
1
0
48
#
35
2
1
49
$
36
3
2
50
%
37
4
3
51
&
38
5
4
52
'
39
6
5
53
(
40
7
6
54
)
41
8
7
55
*
42
9
8
56
+
43
10
9
57
,
44
11
:
58
45
12
;
59
.
46
13
<
60
Column
Description
No Call
The percentage of bases that could not be called (no-call) for the sample
averaged over cycles per read (Read 1/Read 2).
Coverage
Median coverage (number of bases aligned to a given reference position)
averaged over all positions.
Het SNPs
The number of heterozygous SNPs detected for the sample.
Hom SNPs
The number of homozygous SNPs detected for the sample.
Insertions
The number of insertions detected for the sample.
Deletions
The number of deletions detected for the sample.
The workflows apps Small RNA and De Novo Assembly have custom samples tables:
} Small RNA Samples Table on page 90
} De Novo Assembly Samples Table on page 90
Small RNA Samples Table
Column
Description
Sample Name
The sample name from the sample sheet.
Cluster Raw
The number of raw clusters detected for the sample.
Cluster PF
The number of clusters passing filter for the sample.
Cluster Align
Contam
The number of clusters that match records in the Contaminants
database.
Cluster Align
miRNA
The number of clusters that exactly match records in the Mature miRNA
database.
Cluster Align
RNA
The number of clusters that match records in the RNA database.
Cluster Align
Genome
The number of clusters that match records in the genomic database.
Cluster Unaligned
The number of clusters that did not align against any reference
database.
De Novo Assembly Samples Table
90
Column
Description
Sample Name
The sample name from the sample sheet.
Num Contigs
The number of contigs assembled for this sample.
Part # 15044182 Rev. D
Data Reference
Column
Description
Mean Contig
Length
The average contig length for this sample.
Median Contig
Length
The median contig length for this sample.
Min Contig
Length
The minimum contig length for this sample.
Max Contig
Length
The maximum contig length for this sample.
Base Count
The total length of the resulting assembly.
N50
N50 length is the length of the shortest contig, such that the sum of
contigs of equal length or longer is at least 50% of the total length of all
contigs.
Amplicons Table
Column
Description
#
An ordinal identification number in the table.
Amplicons
The amplicon name.
Location
The position at which the variant was found.
Variants #
The number of variants for this amplicon.
Coverage Graph
Y Axis
X Axis
Description
Coverage
Position
The green curve is the number of aligned reads that cover each
position in the reference.
The red curve is the number of aligned reads that have a miscall at
this position in the reference. SNPs and other variants show up as
spikes in the red curve.
Q-Score Graph
Y Axis
X Axis
QScore
Position
Description
The average quality score of bases at the given position of the reference.
Variant Score Graph
Y Axis X Axis
Score
BaseSpace User Guide
Position
Description
Graphically depicts quality score and the position of SNPs and indels.
91
Variants Table
The variants table shows variants for your sample per chromosome or amplicon.
Column
Description
#
An ordinal identification number in the table.
Location
The position at which the variant was found.
Score
The quality score for this variant.
Type
The variant type, which can be either SNP or indel.
Call
A string representing how the base or bases changed at this location in the
reference.
dbSNP
The dbSNP name of the variant, if applicable.
RefGene
The gene according to RefGene in which this variant appears.
Frequency
The fraction of reads for the sample that includes the variant. For example,
if the reference base is A, and sample 1 has 60 A reads and 40 T reads, then
the SNP has a variant frequency of 0.4.
Depth
The number of reads for a sample covering a particular position. The
GATK variant caller subsamples data in regions of high coverage.
Filter
The criteria for a filtered variant.
Small RNA Pie Chart
The Small RNA pie chart provides a visualization of clusters identified as mature
miRNA, other forms of RNA, genomic sequence, or contaminants.
Figure 14 Small RNA Pie Chart
Common categories for the Small RNA pie chart are as follows:
} Unaligned clusters that did not align against any reference
} Genome clusters that aligned to the reference genome
} miRNA clusters that aligned to the mature miRNA database
Hits to the mature miRNA database are counted only if the cluster aligned to the correct
strand and position for the mature miRNA.
The remaining category names in the Small RNA pie chart are taken from the FASTA file
names in the databases. For example, if the RNA database contains a file named
rRNA.fa, then matches to this file are reported as the category rRNA.
92
Part # 15044182 Rev. D
The Small RNA graph provides a plot of the common mature miRNA sequences for a
sample and their abundances. The most common miRNA sequences for the selected
sample (up to 10 records) are shown in proportion to the number of clusters matched.
Metagenomics Pie Chart
The Metagenomics pie chart provides a visualization of how many clusters from each
sample were assigned to a category in each taxonomic level.
Click another row in the taxonomy table to change the pie chart to that sample or
taxonomic level.
Samples Graph
Contigs are arranged end-to-end along the X axis and the reference chromosomes are
arranged bottom-to-top along the Y axis. Each pixel of the plot is colored according to
how many short sequences of the corresponding contig have a match in the
corresponding portion of the reference genome.
An identical assembly results in a diagonal line. A vertical gap in the plot might indicate
a portion of the reference that is absent in the assembly, such as a plasmid, which is
found in some bacteria populations.
Y Axis
X Axis
Reference
Assembly
Position
Description
A syntenic plot of assembled contigs compared to a reference. A
reference genome must be specified in the sample sheet.
Sample QC Table
Column
Description
Sample Name
The sample name from the sample sheet.
Clusters Count
The number of clusters sequenced for this sample.
Clusters
Percentage
The percentage of the total cluster number matching the index for this
sample.
Pass Filter
The percentage of clusters passing filter for this sample.
Alignment
R1/R2
The percentage of clusters successfully aligned in Read 1/ Read 2.
Length Median
The median fragment length for the sample.
Length Min
The low percentile of fragment lengths for this sample as they correspond
to three standard deviations from the median.
Length Max
The high percentile of fragment lengths for this sample as they
correspond to three standard deviations from the median.
Mismatch
R1/R2
The percentage mismatch to reference averaged over cycles per read
(Read 1/Read 2).
BaseSpace User Guide
93
Data Reference
Small RNA Graph
Column
Description
Estimated
Diversity
An estimate of the total library diversity derived from the observed
diversity and the number of apparent PCR duplicates. This calculation is
available for paired-end runs unless PCR duplicate flagging was disabled
in the sample sheet.
Observed
Diversity
Number of distinct aligned positions. Reads with the same aligned
positions are assumed to be PCR duplicates. PCR duplicates are defined as
sequences with identical Read 1 and Read 2 start sites.
Isaac App Results Page
The Isaac App Results Page consists of three panes.
Isaac Alignment Statistics
Isaac Alignment Statistics display alignment information for the sample.
Column
Description
Number of
Reads
The number of reads sequenced for this sample.
Coverage
Median coverage (number of bases aligned to a given reference position)
averaged over all positions.
Fragment Length
Median
The median fragment length for the sample.
Fragment Length
Standard
Deviation
The standard deviation of the fragment length for the sample.
Aligned %
The total count of PF clusters aligning for the sample (Read 1/Read 2).
Mismatch
The percentage mismatch to reference averaged over cycles per read
(Read 1/Read 2).
Isaac Variants Statistics
The variants table shows three tables of variant statistics, for Single Nucleotide Variants
(SNVs), insertions, and deletions. The rows contain the following information.
Column
94
Description
Total number
Total numbers of the specific variant.
Het/Hom Ratio
The ratio between heterozygote and homozygote variants.
% in dbSNP 131
The percentage of the specific variants found in dbSNP 131.
Transitions /
Transversions
The ratio of transitions (A-G or C-T changes) to transversions (other
changes).
Part # 15044182 Rev. D
Y Axis
X Axis
Description
# Reference
Bases
Read
Depth
The coverage graph shows the number of bases that are covered
at each read depth.
BaseSpace User Guide
95
Data Reference
Isaac Coverage Graph
Notes
For technical assistance, contact Illumina Technical Support.
Table 3 Illumina General Contact Information
Illumina Website
Email
www.illumina.com
[email protected]
Table 4 Illumina Customer Support Telephone Numbers
Region
Contact Number
Region
North America
1.800.809.4566
Italy
Austria
0800.296575
Netherlands
Belgium
0800.81102
Norway
Denmark
80882346
Spain
Finland
0800.918363
Sweden
France
0800.911850
Switzerland
Germany
0800.180.8994
United Kingdom
Ireland
1.800.812949
Other countries
Contact Number
800.874909
0800.0223859
800.16836
900.812168
020790181
0800.563118
0800.917.0041
+44.1799.534000
Safety Data Sheets
Safety data sheets (SDSs) are available on the Illumina website at
support.illumina.com/sds.ilmn.
Product Documentation
Product documentation in PDF is available for download from the Illumina website. Go
to support.illumina.com, select a product, then click Documentation & Literature.
BaseSpace User Guide
Technical Assistance
Technical Assistance
Illumina
San Diego, California 92122 U.S.A.
+1.800.809.ILMN (4566)
+1.858.202.4566 (outside North America)
[email protected]
www.illumina.com
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement