MiSeq Reporter Software Guide (15042295 Rev. E)

MiSeq Reporter Software Guide (15042295 Rev. E)
MiSeq Reporter
Software Guide
FOR RESEARCH USE ONLY
ILLUMINA PROPRIETARY
Part # 15042295 Rev. E
December 2014
Customize a short end-to-end workflow guide with the Custom Protocol Selector
support.illumina.com/custom-protocol-selector.html
This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"), and are intended solely for the
contractual use of its customer in connection with the use of the product(s) described herein and for no other purpose. This
document and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed,
or reproduced in any way whatsoever without the prior written consent of Illumina. Illumina does not convey any license
under its patent, trademark, copyright, or common-law rights nor similar rights of any third parties by this document.
The instructions in this document must be strictly and explicitly followed by qualified and properly trained personnel in order
to ensure the proper and safe use of the product(s) described herein. All of the contents of this document must be fully read
and understood prior to using such product(s).
FAILURE TO COMPLETELY READ AND EXPLICITLY FOLLOW ALL OF THE INSTRUCTIONS CONTAINED HEREIN
MAY RESULT IN DAMAGE TO THE PRODUCT(S), INJURY TO PERSONS, INCLUDING TO USERS OR OTHERS, AND
DAMAGE TO OTHER PROPERTY.
ILLUMINA DOES NOT ASSUME ANY LIABILITY ARISING OUT OF THE IMPROPER USE OF THE PRODUCT(S)
DESCRIBED HEREIN (INCLUDING PARTS THEREOF OR SOFTWARE) OR ANY USE OF SUCH PRODUCT(S) OUTSIDE
THE SCOPE OF THE EXPRESS WRITTEN LICENSES OR PERMISSIONS GRANTED BY ILLUMINA IN CONNECTION
WITH CUSTOMER'S ACQUISITION OF SUCH PRODUCT(S).
FOR RESEARCH USE ONLY
© 2011–2014 Illumina, Inc. All rights reserved.
Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cBot, CSPro, CytoChip, DesignStudio,
Epicentre, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, HiSeq X, Infinium,
iScan, iSelect, ForenSeq, MiSeq, MiSeqDx, MiSeq FGx, NeoPrep, Nextera, NextBio, NextSeq, Powered by Illumina,
SeqMonitor, SureMDA, TruGenome, TruSeq, TruSight, Understand Your Genome, UYG, VeraCode, verifi, VeriSeq, the
pumpkin orange color, and the streaming bases design are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or
other countries. All other names, logos, and other trademarks are the property of their respective owners.
Read Before Using this Product
This Product, and its use and disposition, is subject to the following terms and conditions. If Purchaser does not agree to these
terms and conditions then Purchaser is not authorized by Illumina to use this Product and Purchaser must not use this Product.
1
Definitions. "Application Specific IP" means Illumina owned or controlled intellectual property rights that pertain to
this Product (and use thereof) only with regard to specific field(s) or specific application(s). Application Specific IP
excludes all Illumina owned or controlled intellectual property that cover aspects or features of this Product (or use
thereof) that are common to this Product in all possible applications and all possible fields of use (the "Core IP").
Application Specific IP and Core IP are separate, non-overlapping, subsets of all Illumina owned or controlled intellectual
property. By way of non-limiting example, Illumina intellectual property rights for specific diagnostic methods, for
specific forensic methods, or for specific nucleic acid biomarkers, sequences, or combinations of biomarkers or
sequences are examples of Application Specific IP. "Consumable(s)" means Illumina branded reagents and consumable
items that are intended by Illumina for use with, and are to be consumed through the use of, Hardware.
"Documentation" means Illumina's user manual for this Product, including without limitation, package inserts, and any
other documentation that accompany this Product or that are referenced by the Product or in the packaging for the Product
in effect on the date of shipment from Illumina. Documentation includes this document. "Hardware" means Illumina
branded instruments, accessories or peripherals. "Illumina" means Illumina, Inc. or an Illumina affiliate, as applicable.
"Product" means the product that this document accompanies (e.g., Hardware, Consumables, or Software). "Purchaser"
is the person or entity that rightfully and legally acquires this Product from Illumina or an Illumina authorized dealer.
"Software" means Illumina branded software (e.g., Hardware operating software, data analysis software). All Software is
licensed and not sold and may be subject to additional terms found in the Software's end user license agreement.
"Specifications" means Illumina's written specifications for this Product in effect on the date that the Product ships from
Illumina.
2
Research Use Only Rights. Subject to these terms and conditions and unless otherwise agreed upon in writing by an
officer of Illumina, Purchaser is granted only a non-exclusive, non-transferable, personal, non-sublicensable right under
Illumina's Core IP, in existence on the date that this Product ships from Illumina, solely to use this Product in Purchaser's
facility for Purchaser's internal research purposes (which includes research services provided to third parties) and solely
in accordance with this Product's Documentation, but specifically excluding any use that (a) would require rights or a
license from Illumina to Application Specific IP, (b) is a re-use of a previously used Consumable, (c) is the disassembling,
reverse-engineering, reverse-compiling, or reverse-assembling of this Product, (d) is the separation, extraction, or
isolation of components of this Product or other unauthorized analysis of this Product, (e) gains access to or determines
the methods of operation of this Product, (f) is the use of non-Illumina reagent/consumables with Illumina's Hardware
(does not apply if the Specifications or Documentation state otherwise), or (g) is the transfer to a third-party of, or sublicensing of, Software or any third-party software. All Software, whether provided separately, installed on, or embedded
in a Product, is licensed to Purchaser and not sold. Except as expressly stated in this Section, no right or license under
any of Illumina's intellectual property rights is or are granted expressly, by implication, or by estoppel.
ii
Part # 15042295 Rev. E
Purchaser is solely responsible for determining whether Purchaser has all intellectual property rights that are
necessary for Purchaser's intended uses of this Product, including without limitation, any rights from third
parties or rights to Application Specific IP. Illumina makes no guarantee or warranty that purchaser's specific
intended uses will not infringe the intellectual property rights of a third party or Application Specific IP.
3
Regulatory. This Product has not been approved, cleared, or licensed by the United States Food and Drug
Administration or any other regulatory entity whether foreign or domestic for any specific intended use, whether
research, commercial, diagnostic, or otherwise. This Product is labeled For Research Use Only. Purchaser must ensure it
has any regulatory approvals that are necessary for Purchaser's intended uses of this Product.
4
Unauthorized Uses. Purchaser agrees: (a) to use each Consumable only one time, and (b) to use only Illumina
consumables/reagents with Illumina Hardware. The limitations in (a)-(b) do not apply if the Documentation or
Specifications for this Product state otherwise. Purchaser agrees not to, nor authorize any third party to, engage in any of
the following activities: (i) disassemble, reverse-engineer, reverse-compile, or reverse-assemble the Product, (ii) separate,
extract, or isolate components of this Product or subject this Product or components thereof to any analysis not expressly
authorized in this Product's Documentation, (iii) gain access to or attempt to determine the methods of operation of this
Product, or (iv) transfer to a third-party, or grant a sublicense, to any Software or any third-party software. Purchaser
further agrees that the contents of and methods of operation of this Product are proprietary to Illumina and this Product
contains or embodies trade secrets of Illumina. The conditions and restrictions found in these terms and conditions are
bargained for conditions of sale and therefore control the sale of and use of this Product by Purchaser.
5
Limited Liability. TO THE EXTENT PERMITTED BY LAW, IN NO EVENT SHALL ILLUMINA OR ITS
SUPPLIERS BE LIABLE TO PURCHASER OR ANY THIRD PARTY FOR COSTS OF PROCUREMENT OF
SUBSTITUTE PRODUCTS OR SERVICES, LOST PROFITS, DATA OR BUSINESS, OR FOR ANY INDIRECT,
SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL, OR PUNITIVE DAMAGES OF ANY KIND ARISING
OUT OF OR IN CONNECTION WITH, WITHOUT LIMITATION, THE SALE OF THIS PRODUCT, ITS USE,
ILLUMINA'S PERFORMANCE HEREUNDER OR ANY OF THESE TERMS AND CONDITIONS, HOWEVER
ARISING OR CAUSED AND ON ANY THEORY OF LIABILITY (WHETHER IN CONTRACT, TORT
(INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE).
6
ILLUMINA'S TOTAL AND CUMULATIVE LIABILITY TO PURCHASER OR ANY THIRD PARTY ARISING OUT
OF OR IN CONNECTION WITH THESE TERMS AND CONDITIONS, INCLUDING WITHOUT LIMITATION,
THIS PRODUCT (INCLUDING USE THEREOF) AND ILLUMINA'S PERFORMANCE HEREUNDER, WHETHER
IN CONTRACT, TORT (INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE, SHALL IN NO
EVENT EXCEED THE AMOUNT PAID TO ILLUMINA FOR THIS PRODUCT.
7
Limitations on Illumina Provided Warranties. TO THE EXTENT PERMITTED BY LAW AND SUBJECT TO THE
EXPRESS PRODUCT WARRANTY MADE HEREIN ILLUMINA MAKES NO (AND EXPRESSLY DISCLAIMS
ALL) WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, WITH RESPECT TO THIS PRODUCT,
INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, NONINFRINGEMENT, OR ARISING FROM COURSE OF PERFORMANCE,
DEALING, USAGE OR TRADE. WITHOUT LIMITING THE GENERALITY OF THE FOREGOING, ILLUMINA
MAKES NO CLAIM, REPRESENTATION, OR WARRANTY OF ANY KIND AS TO THE UTILITY OF THIS
PRODUCT FOR PURCHASER'S INTENDED USES.
8
Product Warranty. All warranties are personal to the Purchaser and may not be transferred or assigned to a third-party,
including an affiliate of Purchaser. All warranties are facility specific and do not transfer if the Product is moved to
another facility of Purchaser, unless Illumina conducts such move.
a
Warranty for Consumables. Illumina warrants that Consumables, other than custom Consumables, will conform to
their Specifications until the later of (i) 3 months from the date of shipment from Illumina, and (ii) any expiration
date or the end of the shelf-life pre-printed on such Consumable by Illumina, but in no event later than 12 months
from the date of shipment. With respect to custom Consumables (i.e., Consumables made to specifications or
designs made by Purchaser or provided to Illumina by, or on behalf of, Purchaser), Illumina only warrants that the
custom Consumables will be made and tested in accordance with Illumina's standard manufacturing and quality
control processes. Illumina makes no warranty that custom Consumables will work as intended by Purchaser or for
Purchaser's intended uses.
b
Warranty for Hardware. Illumina warrants that Hardware, other than Upgraded Components, will conform to its
Specifications for a period of 12 months after its shipment date from Illumina unless the Hardware includes Illumina
provided installation in which case the warranty period begins on the date of installation or 30 days after the date it
was delivered, whichever occurs first ("Base Hardware Warranty"). "Upgraded Components" means Illumina
provided components, modifications, or enhancements to Hardware that was previously acquired by Purchaser.
Illumina warrants that Upgraded Components will conform to their Specifications for a period of 90 days from the
date the Upgraded Components are installed. Upgraded Components do not extend the warranty for the Hardware
unless the upgrade was conducted by Illumina at Illumina's facilities in which case the upgraded Hardware shipped
to Purchaser comes with a Base Hardware Warranty.
c
Exclusions from Warranty Coverage. The foregoing warranties do not apply to the extent a non-conformance is
due to (i) abuse, misuse, neglect, negligence, accident, improper storage, or use contrary to the Documentation or
Specifications, (ii) improper handling, installation, maintenance, or repair (other than if performed by Illumina's
personnel), (iii) unauthorized alterations, (iv) Force Majeure events, or (v) use with a third party's good not provided
MiSeq Reporter Software Guide
iii
d
e
f
9
by Illumina (unless the Product's Documentation or Specifications expressly state such third party's good is for use
with the Product).
Procedure for Warranty Coverage. In order to be eligible for repair or replacement under this warranty Purchaser
must (i) promptly contact Illumina's support department to report the non-conformance, (ii) cooperate with Illumina
in confirming or diagnosing the non-conformance, and (iii) return this Product, transportation charges prepaid to
Illumina following Illumina's instructions or, if agreed by Illumina and Purchaser, grant Illumina's authorized repair
personnel access to this Product in order to confirm the non-conformance and make repairs.
Sole Remedy under Warranty. Illumina will, at its option, repair or replace non-conforming Product that it
confirms is covered by this warranty. Repaired or replaced Consumables come with a 30-day warranty. Hardware
may be repaired or replaced with functionally equivalent, reconditioned, or new Hardware or components (if only a
component of Hardware is non-conforming). If the Hardware is replaced in its entirety, the warranty period for the
replacement is 90 days from the date of shipment or the remaining period on the original Hardware warranty,
whichever is shorter. If only a component is being repaired or replaced, the warranty period for such component is
90 days from the date of shipment or the remaining period on the original Hardware warranty, whichever ends later.
The preceding states Purchaser's sole remedy and Illumina's sole obligations under the warranty provided
hereunder.
Third-Party Goods and Warranty. Illumina has no warranty obligations with respect to any goods originating
from a third party and supplied to Purchaser hereunder. Third-party goods are those that are labeled or branded
with a third-party's name. The warranty for third-party goods, if any, is provided by the original manufacturer.
Upon written request Illumina will attempt to pass through any such warranty to Purchaser.
Indemnification.
a
Infringement Indemnification by Illumina. Subject to these terms and conditions, including without limitation,
the Exclusions to Illumina's Indemnification Obligations (Section 9(b) below), the Conditions to Indemnification
Obligations (Section 9(d) below), Illumina shall (i) defend, indemnify and hold harmless Purchaser against any
third-party claim or action alleging that this Product when used for research use purposes, in accordance with these
terms and conditions, and in accordance with this Product's Documentation and Specifications infringes the valid
and enforceable intellectual property rights of a third party, and (ii) pay all settlements entered into, and all final
judgments and costs (including reasonable attorneys' fees) awarded against Purchaser in connection with such
infringement claim. If this Product or any part thereof, becomes, or in Illumina's opinion may become, the subject of
an infringement claim, Illumina shall have the right, at its option, to (A) procure for Purchaser the right to continue
using this Product, (B) modify or replace this Product with a substantially equivalent non-infringing substitute, or
(C) require the return of this Product and terminate the rights, license, and any other permissions provided to
Purchaser with respect this Product and refund to Purchaser the depreciated value (as shown in Purchaser's official
records) of the returned Product at the time of such return; provided that, no refund will be given for used-up or
expired Consumables. This Section states the entire liability of Illumina for any infringement of third party
intellectual property rights.
b
Exclusions to Illumina Indemnification Obligations. Illumina has no obligation to defend, indemnify or hold
harmless Purchaser for any Illumina Infringement Claim to the extent such infringement arises from: (i) the use of
this Product in any manner or for any purpose outside the scope of research use purposes, (ii) the use of this Product
in any manner not in accordance with its Specifications, its Documentation, the rights expressly granted to Purchaser
hereunder, or any breach by Purchaser of these terms and conditions, (iii) the use of this Product in combination
with any other products, materials, or services not supplied by Illumina, (iv) the use of this Product to perform any
assay or other process not supplied by Illumina, or (v) Illumina's compliance with specifications or instructions for
this Product furnished by, or on behalf of, Purchaser (each of (i) – (v), is referred to as an "Excluded Claim").
c
Indemnification by Purchaser. Purchaser shall defend, indemnify and hold harmless Illumina, its affiliates, their
non-affiliate collaborators and development partners that contributed to the development of this Product, and their
respective officers, directors, representatives and employees against any claims, liabilities, damages, fines, penalties,
causes of action, and losses of any and every kind, including without limitation, personal injury or death claims, and
infringement of a third party's intellectual property rights, resulting from, relating to, or arising out of (i) Purchaser's
breach of any of these terms and conditions, (ii) Purchaser's use of this Product outside of the scope of research use
purposes, (iii) any use of this Product not in accordance with this Product's Specifications or Documentation, or (iv)
any Excluded Claim.
d
Conditions to Indemnification Obligations. The parties' indemnification obligations are conditioned upon the
party seeking indemnification (i) promptly notifying the other party in writing of such claim or action, (ii) giving the
other party exclusive control and authority over the defense and settlement of such claim or action, (iii) not admitting
infringement of any intellectual property right without prior written consent of the other party, (iv) not entering into
any settlement or compromise of any such claim or action without the other party's prior written consent, and (v)
providing reasonable assistance to the other party in the defense of the claim or action; provided that, the party
reimburses the indemnified party for its reasonable out-of-pocket expenses incurred in providing such assistance.
e
Third-Party Goods and Indemnification. Illumina has no indemnification obligations with respect to any goods
originating from a third party and supplied to Purchaser. Third-party goods are those that are labeled or branded
with a third-party's name. Purchaser's indemnification rights, if any, with respect to third party goods shall be
pursuant to the original manufacturer's or licensor's indemnity. Upon written request Illumina will attempt to pass
through such indemnity, if any, to Purchaser.
iv
Part # 15042295 Rev. E
Revision History
Part #
Revision
Date
15042295
E
December
2014
Added a note in the Demultiplexing section about the default
index recognition for index pairs that differ by < 3 bases.
15042295
D
September
2014
Updated computing requirements for installing MiSeq
Reporter on an off-instrument computer.
Updated information on the ConvertMissingBclsToNoCalls to
clarify the default setting.
Updated the reference for a network Linux storage tech note
to Configuring MiSeq Reporter to Work with Samba Shares on a
Linux Server (part # 970-2014-027).
15042295
C
February
2014
Updated to changes introduced in MiSeq Reporter v2.4:
• Added the alignment method to the description of the BAM
file header.
• Added the command line and annotation algorithm to the
description of VCF file header.
• Added information on configuring the
FileCopyWaitFinishTimeInSeconds parameter.
Updated information on the Starling variant caller.
Removed the section on gVCF files. See the reference guide
for your workflow for gVCF output information.
Removed information on the ELAND alignment algorithm,
which was deprecated in MiSeq Reporter v2.2. For more
information, see the MiSeq Sample Sheet Quick Reference Guide
(part # 15028392).
15042295
B
August
2013
15042295
A
May 2013
MiSeq Reporter Software Guide
Description of Change
Updated to changes introduced MiSeq Reporter v2.3:
• Increased default for configuration setting
MaximumHoursPerProcess from 1.5 to 72.
• Changed letter designator for the TruSeq Amplicon
workflow from C to TA.
• Added description of genome VCF file, a file format
optionally generated for the Enrichment, PCR Amplicon,
and TruSeq Amplicon workflows.
Initial release.
This guide provides information about the MiSeq Reporter
web interface, how to view run results, how to requeue a run,
and how to install and configure the software.
For information about analysis workflows performed by
MiSeq Reporter, see the workflow-specific reference guide. A
reference guide for each analysis workflow is available for
download from the Illumina website.
v
vi
Part # 15042295 Rev. E
Table of Contents
Revision History
Table of Contents
Chapter 1 Getting Started
Introduction
Viewing MiSeq Reporter
MiSeq Reporter Concepts
MiSeq Reporter Interface
Requeue Analysis
Input File Requirements
Pre-Installed Databases and Genomes
Chapter 2 Analysis Metrics and Procedures
Introduction
Analysis Metrics
Demultiplexing
FASTQ File Generation
Alignment
Variant Calling
Chapter 3 Folders, File Formats, and Settings
MiSeqAnalysis Folder
Folder Structure
Analysis File Formats
MiSeq Reporter Configurable Settings
Restarting the Service
v
vii
1
2
3
4
5
10
11
12
13
14
15
17
18
19
20
21
22
23
24
29
31
Chapter 4 Installation and Troubleshooting
33
MiSeq Reporter Off-Instrument Requirements
Installing MiSeq Reporter Off-Instrument
Using MiSeq Reporter Off-Instrument
Troubleshooting MiSeq Reporter
34
35
37
38
Index
41
Technical Assistance
43
MiSeq Reporter Software Guide
vii
viii
Part # 15042295 Rev. E
Chapter 1 Getting Started
Introduction
Viewing MiSeq Reporter
MiSeq Reporter Concepts
MiSeq Reporter Interface
Requeue Analysis
Input File Requirements
Pre-Installed Databases and Genomes
MiSeq Reporter Software Guide
2
3
4
5
10
11
12
1
Chapter 1
Getting Started
Getting Started
Introduction
The MiSeq® system provides on-instrument secondary analysis using the MiSeq Reporter
software. MiSeq Reporter performs secondary analysis on the base calls and quality scores
generated by real-time analysis (RTA) during the sequencing run.
MiSeq Reporter performs analysis based on the analysis workflow specified in the sample
sheet. The analysis workflow is a series of steps specific to a type of analysis. Upon
completion of analysis, MiSeq Reporter generates various types of information specific to
the workflow. For most workflows, results appear on the MiSeq Reporter web interface in
the form of graphs and tables for each run.
MiSeq Reporter runs as a Windows service and is viewed through a web browser.
About Windows Service Applications
Windows service applications perform specific functions without user intervention and
continue to run in the background as long as Windows is running. Because MiSeq Reporter
runs as a Windows service, it automatically begins secondary analysis when base calling
is complete.
Sequencing During Analysis
The MiSeq system computing resources are dedicated to either sequencing or analysis. If a
new sequencing run is started on the MiSeq before secondary analysis of an earlier run is
complete, secondary analysis is stopped automatically.
To restart secondary analysis, use the Requeue feature on the MiSeq Reporter interface after
the new sequencing run is complete. At that point, secondary analysis starts from the
beginning.
2
Part # 15042295 Rev. E
The MiSeq Reporter interface can only be viewed through a web browser. To view the
MiSeq Reporter interface during analysis, open any web browser on a computer with
access to the same network as the MiSeq system. Connect to the HTTP service on port 8042
using one of the following methods:
} Connect using the instrument IP address followed by 8042.
IP Address
10.10.10.10, for example
HTTP Service Port
8042
HTTP Address
10.10.10.10:8042
} Connect using the network name for the MiSeq followed by 8042
Network Name
MiSeq01, for example
HTTP Service Port
8042
HTTP Address
MiSeq01:8042
For off-instrument installations of MiSeq Reporter, connect using the method for locally
installed service applications, localhost followed by 8042.
Off-Instrument
localhost
HTTP Service Port
8042
HTTP Address
localhost:8042
For more information, see Installing MiSeq Reporter Off-Instrument on page 35.
MiSeq Reporter Software Guide
3
Viewing MiSeq Reporter
Viewing MiSeq Reporter
Getting Started
MiSeq Reporter Concepts
The following concepts and terms are common to MiSeq Reporter.
4
Concept
Description
Analysis Workflow
A secondary analysis procedure performed by MiSeq Reporter. The
workflow for each run is specified in the sample sheet.
Manifest
The file that specifies a reference genome and targeted reference
regions to be used in the alignment step.
Manifests are not required for all workflows. For more information,
see the workflow-specific reference guide.
Reference Genome
A FASTA format file that contains the genome sequences used during
analysis.
For some workflows, the reference genome is for alignment. For
other workflows, the reference genome is used to generate
supplementary data.
The FASTA files can use the extension *.fa or *.fasta. They are
contained in subfolders of the Genome Repository, which is specified
in the MiSeq Reporter.config file.
For more information, see MiSeq Reporter Configurable Settings on
page 29 and Pre-Installed Databases and Genomes on page 12.
Repository
A folder that holds the data generated during sequencing runs. Each
run folder is a subfolder in the repository.
Run Folder
The folder structure populated by Real-Time Analysis software
(MiSeqOutput folder) or the folder populated by MiSeq Reporter
(MiSeqAnalysis). For more information, see MiSeqAnalysis Folder on
page 22.
Sample Sheet
A comma-separated values file (*.csv) that contains information
required to set up and analyze a sequencing run, including a list of
samples and their index sequences.
The sample sheet must be provided during the run setup steps on the
MiSeq. After the run begins, the sample sheet is renamed to
SampleSheet.csv and copied to the run folders: MiSeqTemp,
MiSeqOutput, and MiSeqAnalysis.
Part # 15042295 Rev. E
When MiSeq Reporter opens in the browser, the main screen appears with an image of the
instrument in the center. The Settings icon and Help icon are in the upper-right corner, and
the Analyses tab is in the upper-left corner.
} MiSeq Reporter Help—Select the Help icon to open MiSeq Reporter documentation in
the browser window.
} Settings—Select the Settings icon
to change the server URL and Repository path.
} Analyses Tab—Select Analyses to expand the tab. The Analyses tab shows a list of
analysis runs that are either completed, queued for analysis, or currently processing.
Figure 1 MiSeq Reporter Main Screen
Server URL or Repository Settings
Select the Settings
icon. The Settings dialog box opens. Set the server URL and the
repository path:
} Server URL—The server on which MiSeq Reporter is running.
} Repository path—Location of the analysis folder where output files are written.
Figure 2 Settings for Server URL and Repository
Typically, it is not necessary to change these settings unless MiSeq Reporter is running offinstrument. In this case, set the repository path to the network location of the MiSeqOutput
folder. For more information, see Using MiSeq Reporter Off-Instrument on page 37.
Analyses Tab
The Analyses tab lists the sequencing runs located in the specified repository. From this
tab, you can open the results from any runs listed, or requeue a selected run for analysis.
To refresh the list, select the Refresh Analysis List icon
MiSeq Reporter Software Guide
in the upper-right corner.
5
MiSeq Reporter Interface
MiSeq Reporter Interface
Getting Started
Figure 3 Analyses Tab Expanded
The Analyses tab columns are State, Type, Run, Completed On, and Requeue:
} State—Shows the current state of the analysis using one of three status icons.
Table 1 State of Analysis Icons
Icon
Description
Indicates that secondary analysis completed successfully.
Indicates that secondary analysis is in progress.
Indicates that secondary analysis was not completed successfully.
} Type—Lists the analysis workflow associated with each run using a single letter
designation. Letter designators for each workflow are standard in the MiSeq Reporter
interface.
Table 2 Letter Designators for Analysis Workflows
6
Letter
Workflow
A
Assembly
E
Enrichment
G
GenerateFASTQ
L
Library QC
M
Metagenomics
P
PCR Amplicon
R
Resequencing
S
Small RNA
T
Targeted RNA
TA
TruSeq Amplicon
U
Unknown
This designator is used to represent a
plug-in workflow
Part # 15042295 Rev. E
Analysis Information and Results Tabs
After selecting a run from the Analyses tab, information and results for that run appear in
a series of tabs on the MiSeq Reporter interface.
Analysis results that appear on the Summary and Details tabs vary by workflow. For more
information, see the workflow-specific reference guide. A reference guide for each workflow
is available from the Illumina website.
Information on the Analysis tab, Sample Sheet tab, Logs tab, and Errors tab are similar for
each workflow. All tabs are populated when analysis is complete.
Tab Name
Description
Summary Tab
Contains a summary of analysis results in graphs for mismatches,
phasing and prephasing, alignment, and clusters passing filter, for
example.
Details Tab
Contains details of analysis results in tables and graphs for samples,
coverage, Q-scores, variants, and targets, for example.
Analysis Tab
Contains logistical information about the run.
Sample Sheet Tab
Contains run parameters specified in the sample sheet, and provides
tools to edit the sample sheet and requeue the run.
Logs Tab
Lists every step performed during analysis. These steps are recorded in
log files located in the Logs folder. A summary is written to
AnalysisLog.txt, which is an important file for troubleshooting purposes.
Errors Tab
Lists any errors that occurred during analysis. A summary is written to
AnalysisError.txt, which is an important file for troubleshooting
purposes.
Analysis Info Tab
Row
Description
Investigator
(Optional) The name of the investigator.
Read Cycles
Represents the number of cycles in each read, including notation for any
index reads. For example, 151, 8(I), 8(I), 151, indicates a first read of 151
cycles, 2 reads of 8 cycles, and a final read of 151 cycles.
MiSeq Reporter Software Guide
7
MiSeq Reporter Interface
} Run—The name of the run as it is listed in the Experiment Name field of the sample
sheet. If an experiment name was not included in the sample sheet before the
sequencing run, this field lists the run folder name.
Alternatively, you can specify a different name for the run by editing the Experiment
Name field in the sample sheet. For more information, see Editing the Sample Sheet in
MiSeq Reporter on page 8.
} Completed On—The date that secondary analysis completed.
} Requeue—Select the checkbox to requeue a specific job for analysis. The Requeue
button appears.
When analysis is queued, the run appears at the bottom of the Analyses tab and
indicated as in-progress with the icon
.
Getting Started
Row
Description
Start Time
The clock time that secondary analysis was started.
Completion Time
The clock time that secondary analysis was completed.
Data Folder
The root level of the output folder produced by Real-Time Analysis
software (MiSeqOutput), which contains all primary and secondary
analysis output for the run.
Analysis Folder
The full path to the Alignment folder in the MiSeqAnalysis folder
(Data\Intensities\BaseCalls\Alignment).
Copy Folder
The full path to the Queued subfolder in the MiSeqAnalysis folder.
Sample Sheet Tab
Row
Description
Investigator Name
(Optional) The name of the investigator.
Project Name
(Optional) A descriptive name of the run.
Experiment Name
(Optional) A descriptive name of the experiment.
Date
The date the sequencing run was performed.
Workflow
The analysis workflow for the run.
Assay
The name of the assay used to prepare your samples.
Chemistry
The chemistry name identifies recipe fragments used to build the runspecific recipe. For runs using the TruSeq Amplicon workflow or PCR
Amplicon workflow, the name is amplicon. For all other workflows, the
name is default or the field can be blank.
Manifests
The name of the manifest file that specifies alignments to a reference and
targeted reference regions. This section is used with the TruSeq Amplicon
workflow, Enrichment workflow, and PCR Amplicon workflow.
Reads
The number of cycles performed in Read 1 and Read 2.
Index reads are not included in this section.
Settings
Optional run parameters used for modifying analysis results.
Data
The sample ID, sample name, index sequences, and path to the genome
folder. Requirements vary by workflow.
For information about sample sheets and sample sheet settings, see the MiSeq Sample Sheet
Quick Reference Guide (part # 15028392).
Editing the Sample Sheet in MiSeq Reporter
You can edit the sample sheet for a specific run from the Sample Sheet tab on the MiSeq
Reporter web interface. A mouse and keyboard are required to edit the sample sheet.
} To edit a row in the sample sheet, click any field in the row and make required
changes.
8
Part # 15042295 Rev. E
} To delete a row from the sample sheet, click anywhere in the row and select Delete
Row.
} After editing the sample sheet, select Save and Requeue to save changes and initiates
secondary analysis with the edited sample sheet.
} If a change to the sample sheet was made in error, click an adjacent tab before saving
any changes. A warning appears that states changes were not saved. Click Discard to
undo any changes or Save to save and requeue analysis.
Saving Graphs as Images
MiSeq Reporter provides the option to save an image of graphs shown on the Summary or
Details tabs. Right-click any location on the Summary tab or the graphs location on the
Details tab, and then left-click Save Image As. When prompted, name the file and browse
to a location to save the file.
All images are saved in a JPG (*.jpg) format. Graphs are exported as a single graphic for all
graphs shown on the tab. A mouse is required to use this option.
MiSeq Reporter Software Guide
9
MiSeq Reporter Interface
} To add a row to the sample sheet, click the row above the intended location of the new
row and select Add Row.
Getting Started
Requeue Analysis
To requeue a run for analysis, use the Requeue feature from the MiSeq Reporter Analyses
tab. Make sure that a sequencing run on the MiSeq is not currently in progress.
Each time analysis is requeued, the following folders and files are created:
} A new Alignment folder is created with a sequential number appended to the folder
name, such as Alignment2.
MiSeqAnalysis\<RunFolderName>\Data\Intensities\BaseCalls\Alignment2
} Existing intermediate analysis files written in FASTQ file format are overwritten with
new analysis files. FASTQ files are written to the BaseCalls folder.
MiSeqAnalysis\<RunFolderName>\Data\Intensities\BaseCalls.
NOTE
If changes were made to the sample sheet, make sure that the file is named SampleSheet.csv
and saved to the root level of the analysis folder.
1
From the MiSeq Reporter web interface, click Analyses.
2
Locate the run from the list of available runs on the Analyses tab, and click the
Requeue checkbox next to the run name.
If the run is not listed, confirm that the correct repository is specified using the Settings
icon. For more information, see Server URL or Repository Settings on page 5.
Figure 4 Requeue Button
3
Click Requeue. The State icon to the left of the run name changes to show that analysis
is in progress .
} If analysis does not start, make sure that the following input files are present in the
analysis run folder: SampleSheet.csv, RTAComplete.txt, and RunInfo.xml.
} During analysis, a status bar with elapsed time appears on the Analysis Info tab.
To stop analysis, select the stop analysis
icon next to the status bar on the
Analysis Info tab.
10
Part # 15042295 Rev. E
MiSeq Reporter requires the following files generated during the sequencing run to perform
secondary analysis or to requeue analysis. Files, such as *.bcl, *.filter, and *.locs, are
required to perform analysis.
There is no need to move or copy files to another location before analysis begins. Required
files are copied automatically to the MiSeqAnalysis folder during the sequencing process.
File Name
Description
RTAComplete.txt
A marker file that indicates RTA processing is complete. The presence of
this file triggers MiSeq Reporter to queue analysis.
SampleSheet.csv
Provides parameters for the run and subsequent analysis. At the start of
the run, the sample sheet is copied to the root level of the run folder and
renamed SampleSheet.csv.
RunInfo.xml
Contains high-level run information, such as the number of reads and
cycles in the sequencing run, and whether a read is indexed.
Required Files
MiSeq Reporter requires the following files generated during the sequencing run to perform
secondary analysis.
File Type
Path and File Name Example
Description
*.bcl files
Data\Intensities\BaseCalls\L001\C1.1\s_1_3.bcl
Base calls for lane
1, cycle 1, tile 3
*.filter files
Data\Intensities\BaseCalls\L001\s_1_0003.filter
Filter results file
for lane 1, tile 3
*.locs files
Data\Intensities\L001\s_1_3.locs
Location file for
lane 1, tile 3
MiSeq Reporter Software Guide
11
Input File Requirements
Input File Requirements
Getting Started
Pre-Installed Databases and Genomes
For most workflows, a reference is required to perform alignment. The MiSeq includes
several pre-installed databases and genomes.
Pre-Installed
Description
Databases
• miRbase for human
• dbSNP for human
• RefGene for human
Genomes
• Arabidopsis thaliana
• cow (Bos taurus)
• E. coli strain DH10b
• human (Homo sapiens) build hg19
• mouse (Mus musculus)
• rat (Rattus norvegicus)
• yeast (Saccharomyces cerevisiae)
• Staphylococcus aureus
The reference genome used for analysis by MiSeq Reporter is specified for each sample in
the sample sheet (SampleSheet.csv). The full path to the folder containing the whole
genome FASTA file must be specified in the sample sheet.
NOTE
Enter the full path (UNC path) to the GenomeFolder in the sample sheet. Do not enter the
path using a mapped drive.
NOTE
Introduced in MiSeq Reporter v2.1, you can specify genome references for multiple species in
the same sample sheet for all workflows except the Small RNA workflow.
Available Genomes
In addition to the pre-installed genomes, genome sequence files and reference annotation
for other commonly used model organisms are available from the Illumina iGenomes page.
Go to my.illumina.com/Message/iGenome. A MyIllumina login is required.
The sequence and annotation files for each iGenome are provided in a compressed file
format, *.tar.gz. Refer to the iGenomes Overview for installation instructions.
Custom Genomes
You can upload your own reference in FASTA format to the MiSeq computer. The reference
must have a *.fa or *.fasta extension and be stored in a single folder.
You can upload several single FASTA files or a single multi-FASTA file (recommended),
but not a combination of both. To upload files, use the Manage Files feature in CS.
NOTE
The chromosome name, which is the section of the > line up to any white space, must not
contain the following characters:
# - ? ( ) [ ] / \ = + < > : ; " ' , * ^ | &
For best results, use only alpha-numeric characters as chromosome names.
Illumina recommends the use of a simple text editor, such as Notepad to make sure that no
illegal or invisible characters are added to the file.
12
Part # 15042295 Rev. E
Chapter 2 Analysis Metrics and Procedures
Introduction
Analysis Metrics
Demultiplexing
FASTQ File Generation
Alignment
Variant Calling
MiSeq Reporter Software Guide
Chapter 2
Analysis Metrics and
Procedures
14
15
17
18
19
20
13
Analysis Metrics and Procedures
Introduction
During the sequencing run, Real-Time Analysis (RTA) generates data files that include
analysis metrics used by MiSeq Reporter for secondary analysis. The following metrics
appear in secondary analysis reports:
} Clusters passing filter
} Base call quality scores
} Phasing and prephasing values
MiSeq Reporter performs secondary analysis using a series of analysis procedures, which
include demultiplexing, FASTQ file generation, alignment, and variant calling.
Table 3 Analysis Procedures
14
Analysis Procedure
Description
Demultiplexing
Performed for all workflows if the run has index reads and the
sample sheet lists multiple samples.
For indexed libraries containing either one or two indexes,
demultiplexing separates data from pooled samples based on short
index sequences from different libraries.
FASTQ File Generation
Performed for all workflows.
FASTQ files are the primary input for the alignment step. FASTQ
files contain non-indexed reads for each sample, excluding reads
identified as in-line controls and reads that did not pass filter.
Alignment
Performed for workflows that require alignment against a
reference.
Alignment compares sequences against the reference specified in
the sample sheet and assigns a score based on regions of similarity.
MiSeq Reporter uses an alignment method best-suited for the
workflow.
Aligned reads are written to files in BAM file format.
Variant Calling
Performed for workflows that require variant identification as a
final output.
Variant calling records SNPs and other structural variants in a
standardized and parsable text file. MiSeq Reporter uses variant
calling algorithms best-suited for the workflow.
Variant calls are written to files in VCF file format.
Part # 15042295 Rev. E
During primary analysis, filters and statistical estimates measure data quality and later
include these metrics with secondary analysis results. Metrics that appear in secondary
analysis reports are clusters passing filter, base call quality scores, and phasing and
prephasing values.
Clusters Passing Filter
During primary analysis, RTA filters raw data to remove any reads that do not meet the
overall quality as measured by the Illumina chastity filter. The chastity of a base call is
calculated as the ratio of the brightest intensity divided by the sum of the brightest and
second brightest intensities.
Clusters pass filter (PF) when no more than one base call in the first 25 cycles has a
chastity of < 0.6.
Quality Scores
A quality score, or Q-score, is a prediction of the probability of an incorrect base call. A
higher Q-score implies that a base call is more reliable and less likely to be incorrect.
Based on the Phred scale, the Q-score serves as a compact way to communicate small error
probabilities. Given a base call, X, the probability that X is not true, P(~X), results in a
quality score, Q(X), according to the relationship:
Q(X) = -10 log10(P(~X))
where P(~X) is the estimated probability of the base call being wrong.
The following table shows the relationship between the quality score and error probability.
Quality Score Q(X)
Q40
Q30
Q20
Q10
Error Probability P(~X)
0.0001 (1 in 10,000)
0.001 (1 in 1,000)
0.01 (1 in 100)
0.1 (1 in 10)
For more information on the Phred quality score, see en.wikipedia.org/wiki/Phred_quality_
score.
During the sequencing run, base call quality scores are calculated after cycle 25 and results
are recorded in base call (*.bcl) files, which contain the base call and quality score per cycle.
ASCII Format for Quality Scores
During analysis, base call quality scores are written to FASTQ files in an encoded ASCII
format (the value + 33). The ASCII format is illustrated in the following table.
Table 4 ASCII Codes for Q-Scores 0–40
Symbol
ASCII Code
Q-score
!
33
0
"
34
1
#
35
2
$
36
3
%
37
4
MiSeq Reporter Software Guide
Symbol
6
7
8
9
:
ASCII Code
54
55
56
57
58
Q-score
21
22
23
24
25
15
Analysis Metrics
Analysis Metrics
Analysis Metrics and Procedures
Table 4 ASCII Codes for Q-Scores 0–40
Symbol
ASCII Code
Q-score
&
38
5
'
39
6
(
40
7
)
41
8
*
42
9
+
43
10
,
44
11
45
12
.
46
13
/
47
14
0
48
15
1
49
16
2
50
17
3
51
18
4
52
19
5
53
20
Symbol
;
<
=
>
?
@
A
B
C
D
E
F
G
H
I
ASCII Code
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
Q-score
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Phasing and Prephasing
During the sequencing reaction, each DNA strand in a cluster extends by one base per
cycle. A small portion of strands can become out of phase with the current incorporation
cycle. Phasing occurs when a base falls behind. Prephasing occurs when a base jumps
ahead. Phasing and prephasing rates indicate an estimate of the fraction of molecules that
became phased or prephased in each cycle.
Figure 5 Phasing and Prephasing
A
B
Read with a base that is phasing
Read with a base that is prephasing
The number of cycles performed in a read is one more cycle than the number of cycles
analyzed. For example, a paired-end 150-cycle run performs two 151-cycle reads (2 x 151)
for a total of 302 cycles. At the end of the run, 2 x 150 cycles are analyzed. The one extra
cycle for Read and Read 2 is required for prephasing calculations. Phasing and prephasing
results are recorded in the file named phasing.xml, which is located in the folder
Data\Intensities\BaseCalls\Phasing.
Phasing and prephasing calculations use statistical averaging over many clusters and
sequences to estimate the correlation of signal between different cycles. Therefore, phasing
estimates tend to be more accurate for tiles with larger numbers of clusters and a mixture of
different sequences. Samples containing only a few different sequences do not produce
reliable estimates. Sequencing into adapters or other highly homogeneous samples are
expected to result in poor phasing estimates.
16
Part # 15042295 Rev. E
For runs with multiple samples and index reads, demultiplexing compares each Index
Read sequence to the index sequences specified in the sample sheet. No quality values are
considered in this step.
Demultiplexing separates data from pooled samples based on short index sequences that
tag samples from different libraries. Index reads are identified using the following steps:
} Samples are numbered starting from 1 based on the order they are listed in the sample
sheet.
} Sample number 0 is reserved for clusters that were not successfully assigned to a
sample.
} Clusters are assigned to a sample when the index sequence matches exactly or there is
up to a single mismatch per Index Read.
NOTE
Illumina indexes are designed so that any index pair differs by ≥ 3 bases, allowing for a single
mismatch in index recognition. Index sets that are not from Illumina can include pairs of
indexes that differ by < 3 bases. In such cases, the software detects the insufficient difference
and modifies the default index recognition (mismatch=1). Instead, the software performs
demultiplexing using only perfect index matches (mismatch=0).
When demultiplexing is complete, one demultiplexing file named
DemultiplexSummaryF1L1.txt is written to the Alignment folder, and summarizes the
following information:
} In the file name, F1 represents the flow cell number.
} In the file name, L1 represents the lane number, which is always L1 for MiSeq.
} Reports demultiplexing results in a table with one row per tile and one column per
sample, including sample 0.
} Reports the most commonly occurring sequences for the index reads.
Other demultiplexing files are generated for each tile of the flow cell. For more information,
see Demultiplexing File Format on page 24.
MiSeq Reporter Software Guide
17
Demultiplexing
Demultiplexing
Analysis Metrics and Procedures
FASTQ File Generation
MiSeq Reporter generates intermediate analysis files in the FASTQ format, which is a text
format used to represent sequences. FASTQ files contain reads for each sample and their
quality scores, excluding reads identified as in-line controls and clusters that did not pass
filter.
FASTQ files are the primary input for alignment. The files are written to the BaseCalls
folder (Data\Intensities\BaseCalls) in the MiSeqAnalysis folder, and then copied to the
BaseCalls folder in the MiSeqOutput folder. Each FASTQ file contains reads for only one
sample, and the name of that sample is included in the FASTQ file name. For more
information, see FASTQ File Naming on page 25.
FASTQ Config Settings
Some default settings for FASTQ file generation can be changed by editing the following
settings in the MiSeq Reporter configuration file (C:\Illumina\MiSeq Reporter\MiSeq
Reporter.exe.config):
} ConvertMissingBclsToNoCalls—By default, FASTQ files include all tiles. During
FASTQ file generation, MiSeq Reporter treats *.bcl files that are missing or corrupt as
no-calls (Ns), and logs a warning in the Analysis.Error.txt file for the affected cycle and
tile. You can override this default setting by changing the value to 0 (false), so that the
software logs a fatal error and aborts analysis when encountering a missing or invalid
base call.
} CreateFastqForIndexReads—By default, FASTQ files are not generated for index reads.
You can override this setting by changing the value to 1 (true).
} FilterNonPFReads—By default, FASTQ files only include clusters passing filter. You
can override this setting by changing the value to 0 (false).
For more information, see MiSeq Reporter Configurable Settings on page 29.
Quality Trimming
FASTQ file generation optionally performs quality trimming of the 3' portion of non-index
reads with low quality scores. This step is performed by default during alignment using
BWA. For workflows that do not use BWA, use the QualityScoreTrim sample sheet setting
to include trimming during FASTQ file generation. For more information, see the MiSeq
Sample Sheet Quick Reference Guide (part # 15028392).
18
Part # 15042295 Rev. E
Alignment is a way of identifying optimal matches between read sequences and the
sequence of a reference genome. Aligned sequences are assigned a score based on their
similarity to the reference.
Alignment results are written to Binary Alignment/Map (BAM) files. BAM files are the
primary input for variant calling. For more information, see BAM File Format on page 25.
Alignment Methods
For workflows that include alignment, reads are aligned against the reference specified in
the sample sheet or in a manifest file. MiSeq Reporter uses one of the following alignment
methods best-suited for the workflow: Smith-Waterman or BWA, or Bowtie.
Smith-Waterman Algorithm
The banded Smith-Waterman algorithm performs local sequence alignments to determine
similar regions between two sequences. Instead of looking at the total sequence, the SmithWaterman algorithm compares segments of all possible lengths. Local alignments are
useful for dissimilar sequences that are suspected to contain regions of similarity within
the larger sequence.
BWA
The Burrows-Wheeler Aligner (BWA) aligns relatively short nucleotide sequences against a
long reference sequence. BWA automatically adjusts parameters based on read lengths and
error rates, and then estimates insert size distribution.
When using BWA for alignment, GATK is used for variant calling, by default.
Bowtie
Bowtie is a short-read aligner that quickly aligns large sets of short sequences. For more
information, see bowtie-bio.sourceforge.net.
MiSeq Reporter Software Guide
19
Alignment
Alignment
Analysis Metrics and Procedures
Variant Calling
Variant calling records single nucleotide polymorphisms (SNPs), insertions/deletions
(indels), and other structural variants in a standardized variant call format (VCF). For more
information, see VCF File Format on page 26.
For each SNP or indel call, the probability of an error is provided as a variant quality score.
Reads are realigned around candidate indels to improve the quality of the calls and site
coverage summaries.
Variant Callers
For workflows that include variant calling, variants are detected using one of the following
variant callers best-suited for the workflow: GATK, the somatic variant caller, or Starling.
GATK
The Genome Analysis Toolkit (GATK) calls raw variants for each sample, analyzes
variants against known variants, and then calculates a false discovery rate for each
variant. Variants are flagged as homozygous (1/1) or heterozygous (0/1) in the VCF file
sample column. For more information, see www.broadinstitute.org/gatk.
Somatic Variant Caller
Developed by Illumina, the somatic variant caller identifies variants present at low
frequency in the DNA sample and minimizes false positives.
The somatic variant caller identifies SNPs in three steps:
} Considers each position in the reference genome separately
} Counts bases at the given position for aligned reads that overlap the position
} Computes a variant score that measures the quality of the call. Variant scores are
computed using a Poisson model that excludes variants with a quality score below
Q20.
For indels, the somatic variant caller analyzes how many alignments covering a given
position include a particular indel compared to the overall coverage at that position. The
somatic variant caller does not perform an indel realignment step included in other variant
callers, such as GATK.
For more information, see the Somatic Variant Caller Tech Note available on the Illumina
website.
Starling
Starling calls both SNPs and small indels, and summarizes depth and probabilities for
every site in the genome. The output files Starling produces includes a .vcf file for each
sample that contains variants.
Starling treats each insertion or deletion as a single mismatch. Base calls with more than
two mismatches to the reference sequence within 20 bases of the call are ignored. If the call
occurs within the first or last 20 bases of a read, the mismatch limit is increased to 41
bases.
Starling can be used as an optional alternative variant caller to GATK.
20
Part # 15042295 Rev. E
Chapter 3 Folders, File Formats, and Settings
MiSeqAnalysis Folder
Folder Structure
Analysis File Formats
MiSeq Reporter Configurable Settings
Restarting the Service
MiSeq Reporter Software Guide
Chapter 3
Folders, File Formats, and
Settings
22
23
24
29
31
21
Folders, File Formats, and Settings
MiSeqAnalysis Folder
The MiSeqAnalysis folder is the main run folder for MiSeq Reporter. The relationship
between the MiSeqOutput and MiSeqAnalysis run folders is summarized as follows:
} During sequencing, real-time analysis (RTA) populates the MiSeqOutput folder with
files generated during primary analysis.
} Except for focus images and thumbnail images, RTA copies files to the MiSeqAnalysis
folder in real time. When primary analysis is complete, RTA writes the file
RTAComplete.xml to both run folders.
} MiSeq Reporter monitors the MiSeqAnalysis folder and begins secondary analysis
when the file RTAComplete.xml appears.
} As secondary analysis continues, MiSeq Reporter writes analysis output files to the
MiSeqAnalysis folder, and then copies the files to the MiSeqOutput folder.
22
Part # 15042295 Rev. E
Folder Structure
Folder Structure
Data
Intensities
Basecalls
Alignment—Contains *.bam and *.vcf files, if applicable.
L001—Contains one subfolder per cycle, each containing *.bcl files.
Sample1_S1_L001_R1_001.fastq.gz
Sample2_S2_L001_R1_001.fastq.gz
Undetermined_S0_L001_R1_001.fastq.gz
L001—Contains *.locs files, one for each tile.
RTA Logs—Contains log files from primary analysis.
InterOp—Contains binary files used by Sequencing Analysis Viewer (SAV).
Logs—Contains log files describing steps performed during sequencing.
Queued—A working folder for MiSeq Reporter; also called the copy folder.
AnalysisError.txt
AnalysisLog.txt
CompletedJobInfo.xml
QueuedForAnalysis.txt
[Workflow]RunStatistics
RTAComplete.xml
RunInfo.xml
runParameters.xml
SampleSheet.csv
When using BaseSpace for secondary analysis without replicating analysis locally, the local
MiSeqAnalysis folder is empty.
Alignment Folder Contents
Most secondary analysis files are written to the Alignment folder. Each time that analysis
is requeued, MiSeq Reporter creates an Alignment folder named AlignmentN, where N is a
sequential number.
Log files from analysis algorithms, such as BWA or GATK, are written to
Data\BaseCalls\Alignment\Logging.
MiSeq Reporter Software Guide
23
Folders, File Formats, and Settings
Analysis File Formats
Analysis results are written to file formats specific to their function and purpose.
Analysis Step
Format
Purpose
Demultiplexing
*.demux
Intermediate files containing demultiplexing results.
FASTQ
*.fastq.gz
Intermediate files containing quality scored base calls. FASTQ
files are the primary input for the alignment step.
Alignment
*.bam
Compressed binary files containing sequence alignment data.
BAM files are the primary input for the variant calling step.
Variant Calling
*.vcf
Text files containing SNPs, indels, and other structural variants.
Other file formats used in analysis results are *.txt, *.xml, *.htm, and *.png. Many of these
files contain information that appears in tables, graphs, and charts on the MiSeq Reporter
web interface.
Demultiplexing File Format
For multiple sample indexed runs, the process of demultiplexing reads the index sequence
attached to each cluster to determine from which sample the cluster originated. The
mapping between clusters and sample number are written to one demultiplexing (*.demux)
file for each tile of the flow cell.
Demultiplexing files are binary files written to the L001 folder in
Data\Intensities\BaseCalls\L001. The file naming format is s_1_X.demux, where X is the
tile number.
Demultiplexing files start with a header:
Version (4-byte integer), currently 1
Cluster count (4-byte integer)
The remainder of the file consists of sample numbers for each cluster from the tile.
FASTQ File Format
FASTQ file is a text-based file format that contains base calls and quality values per read.
Each record contains four lines:
}
}
}
}
The identifier
The sequence
A plus sign (+)
The quality scores in an ASCII encoded format
The identifier is formatted as @Instrument:RunID:FlowCellID:Lane:Tile:X:Y
ReadNum:FilterFlag:0:SampleNumber as shown in the following example:
@SIM:1:FCX:1:15:6329:1045 1:N:0:2
TCGCACTCAACGCCCTGCATATGACAAGACAGAATC
+
<>;##=><9=AAAAAAAAAA9#:<#<;<<<????#=
24
Part # 15042295 Rev. E
FASTQ files are named with the sample name and the sample number. The sample
number is a numeric assignment based on the order that the sample is listed in the sample
sheet. For example:
Data\Intensities\BaseCalls\samplename_S1_L001_R1_001.fastq.gz
• samplename—The sample name provided in the sample sheet. If a sample name
is not provided, the file name includes the sample ID.
• S1—The sample number based on the order that samples are listed in the sample
sheet starting with 1. In this example, S1 indicates that this sample is the first
sample listed in the sample sheet.
NOTE
Reads that cannot be assigned to any sample are written to a FASTQ file for sample
number 0, and excluded from downstream analysis.
• L001—The lane number. This segment is always L001 with the single-lane flow
cell.
• R1—The read. In this example, R1 means Read 1. For a paired-end run, a file from
Read 2 includes R2 in the file name.
• 001—The last segment is always 001.
FASTQ files are compressed in the GNU zip format, as indicated by *.gz in the file name.
FASTQ files can be uncompressed using tools such as gzip (command-line) or 7-zip (GUI).
BAM File Format
A BAM file (*.bam) is the compressed binary version of a SAM file that is used to represent
aligned sequences. SAM and BAM formats are described in detail on the SAM Tools
website: samtools.sourceforge.net.
BAM files are written to the alignment folder in Data\Intensities\BaseCalls\Alignment.
BAM files use the file naming format of SampleName_S#.bam, where # is the sample
number determined by the order that samples are listed in the sample sheet.
BAM files contain a header section and an alignments section:
} Header—Contains information about the entire file, such as sample name, sample
length, and alignment method. Alignments in the alignments section are associated
with specific information in the header section.
Alignment methods include banded Smith-Waterman, Burrows-Wheeler Aligner
(BWA), and Bowtie. The term Isis indicates that an Illumina alignment method is in
use, which is the banded Smith-Waterman method.
} Alignments—Contains read name, read sequence, read quality, and custom tags.
GA23_40:8:1:10271:11781 64 chr22 17552189 8 35M * 0 0
TACAGACATCCACCACCACACCCAGCTAATTTTTG
IIIII>FA?C::B=:GGGB>GGGEGIIIHI3EEE#
BC:Z:ATCACG XD:Z:55 SM:I:8
The read name includes the chromosome and start coordinate chr22 17552189, the
alignment quality 8, and the match descriptor 35M * 0 0.
BAM files are suitable for viewing with an external viewer such as IGV or the UCSC
Genome Browser.
BAM index files (*.bam.bai) provide and index of the corresponding BAM file.
MiSeq Reporter Software Guide
25
Analysis File Formats
FASTQ File Naming
Folders, File Formats, and Settings
VCF File Format
VCF is a widely used file format developed by the genomics scientific community that
contains information about variants found at specific positions in a reference genome.
VCF files use the file naming format SampleName_S#.vcf, where # is the sample number
determined by the order that samples are listed in the sample sheet.
VCF File Header—Includes the VCF file format version and the variant caller version. The
header lists the annotations used in the remainder of the file. If MARS is listed as the
annotator, the Illumina internal annotation algorithm is in use to annotate the VCF file. The
VCF header also contains the command line call used by MiSeq Reporter to run the variant
caller. The command line call specifies all parameters used by the variant caller, including
the reference genome file and .bam file. The last line in the header is column headings for
the data lines. For more information, see VCF File Annotations on page 27.
##fileformat=VCFv4.1
##FORMAT=<ID=GQX,Number=1,Type=Integer>
##FORMAT=<ID=AD,Number=.,Type=Integer>
##FORMAT=<ID=DP,Number=1,Type=Integer>
##FORMAT=<ID=GQ,Number=1,Type=Float>
##FORMAT=<ID=GT,Number=1,Type=String>
##FORMAT=<ID=PL,Number=G,Type=Integer>
##FORMAT=<ID=VF,Number=1,Type=Float>
##INFO=<ID=TI,Number=.,Type=String>
##INFO=<ID=GI,Number=.,Type=String>
##INFO=<ID=EXON,Number=0,Type=Flag>
##INFO=<ID=FC,Number=.,Type=String>
##INFO=<ID=IndelRepeatLength,Number=1,Type=Integer>
##INFO=<ID=AC,Number=A,Type=Integer>
##INFO=<ID=AF,Number=A,Type=Float>
##INFO=<ID=AN,Number=1,Type=Integer>
##INFO=<ID=DP,Number=1,Type=Integer>
##INFO=<ID=QD,Number=1,Type=Float>
##FILTER=<ID=LowQual>
##FILTER=<ID=R8>
##annotator=MARS
##CallSomaticVariants_cmdline=" -B D:\Amplicon_DS_Soma2\121017_
M00948_0054_000000000A2676_Binf02\Data\Intensities\BaseCalls\Alignment3_Tamsen_
SomaWorker -g [D:\Genomes\Homo_sapiens
\UCSC\hg19\Sequence\WholeGenomeFASTA,] -f 0.01 -fo False -b 20 -q
100 -c 300 -s 0.5 -a 20 -F 20 -gVCF
True -i true -PhaseSNPs true -MaxPhaseSNPLength 100 -r D:
\Amplicon_DS_Soma2\121017_M00948_0054_000000000-A2676_Binf02"
##reference=file://d:\Genomes\Homo_
sapiens\UCSC\hg19\Sequence\WholeGenomeFASTA\genome.fa
##source=GATK 1.6
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 10002 - R1
VCF File Data Lines—Contains information about a single variant. Data lines are listed
under the column headings included in the header.
26
Part # 15042295 Rev. E
The VCF file format is flexible and extensible, so not all VCF files contain the same fields.
The following tables describe VCF files generated by MiSeq Reporter.
Heading
Description
CHROM
The chromosome of the reference genome. Chromosomes appear in the same order
as the reference FASTA file.
POS
The single-base position of the variant in the reference chromosome.
For SNPs, this position is the reference base with the variant; for indels or deletions,
this position is the reference base immediately before the variant.
ID
The rs number for the SNP obtained from dbSNP.txt, if applicable.
If there are multiple rs numbers at this location, the list is semi-colon delimited. If no
dbSNP entry exists at this position, a missing value marker ('.') is used.
REF
The reference genotype. For example, a deletion of a single T is represented as
reference TT and alternate T.
ALT
The alleles that differ from the reference read.
For example, an insertion of a single T is represented as reference A and alternate AT.
QUAL
A Phred-scaled quality score assigned by the variant caller.
Higher scores indicate higher confidence in the variant and lower probability of
(Q/10
errors. For a quality score of Q, the estimated probability of an error is 10). For
example, the set of Q30 calls has a 0.1% error rate. Many variant callers assign quality
scores based on their statistical models, which are high relative to the error rate
observed.
VCF File Annotations
Heading
Description
FILTER
If all filters are passed, PASS is written in the filter column.
• LowDP—Applied to sites with depth of coverage below a cutoff. Configure
cutoff using the MinimumCoverageDepth sample sheet setting.
• LowGQ—The genotyping quality (GQ) is below a cutoff. Configure cutoff using
the VariantMinimumGQCutoff sample sheet setting.
• LowQual—The variant quality (QUAL) is below a cutoff. Configure using the
VariantMinimumQualCutoff sample sheet setting.
• LowVariantFreq—The variant frequency is less than the given threshold.
Configure using the VariantFrequencyFilterCutoff sample sheet setting.
• R8—For an indel, the number of adjacent repeats (1-base or 2-base) in the
reference is greater than 8. This filter is configurable using the
IndelRepeatFilterCutoff setting in the config file or the sample sheet.
• SB—The strand bias is more than the given threshold. This filter is configurable
using the StrandBiasFilter sample sheet setting; available only for somatic
variant caller and GATK.
For more information about sample sheet settings, see MiSeq Sample Sheet Quick
Reference Guide (part # 15028392).
MiSeq Reporter Software Guide
27
Analysis File Formats
VCF File Headings
Folders, File Formats, and Settings
Heading
Description
INFO
Possible entries in the INFO column include:
• AC—Allele count in genotypes for each ALT allele, in the same order as listed.
• AF—Allele Frequency for each ALT allele, in the same order as listed.
• AN—The total number of alleles in called genotypes.
• CD—A flag indicating that the SNP occurs within the coding region of at least
one RefGene entry.
• DP—The depth (number of base calls aligned to a position and used in variant
calling). In regions of high coverage, GATK down-samples the available reads.
• Exon—A comma-separated list of exon regions read from RefGene.
• FC—Functional Consequence.
• GI—A comma-separated list of gene IDs read from RefGene.
• QD—Variant Confidence/Quality by Depth.
• TI—A comma-separated list of transcript IDs read from RefGene.
FORMAT
The format column lists fields separated by colons. For example, GT:GQ. The list of
fields provided depends on the variant caller used. Available fields include:
• AD—Entry of the form X,Y, where X is the number of reference calls, and Y is the
number of alternate calls.
• DP—Approximate read depth; reads with MQ=255 or with bad mates are filtered.
• GQ—Genotype quality.
• GQX—Genotype quality. GQX is the minimum of the GQ value and the QUAL
column. In general, these values are similar; taking the minimum makes GQX the
more conservative measure of genotype quality.
• GT—Genotype. 0 corresponds to the reference base, 1 corresponds to the first
entry in the ALT column, and so on. The forward slash (/) indicates that no
phasing information is available.
• NL—Noise level; an estimate of base calling noise at this position.
• PL—Normalized, Phred-scaled likelihoods for genotypes.
• SB—Strand bias at this position. Larger negative values indicate less bias; values
near zero indicate more bias.
• VF—Variant frequency; the percentage of reads supporting the alternate allele.
SAMPLE
28
The sample column gives the values specified in the FORMAT column.
Part # 15042295 Rev. E
Typically, you do not need to change configurable settings. However, if you want to
customize analysis results, you can edit settings in MiSeq Reporter.exe.config located in the
MiSeq Reporter installation folder, C:\Illumina\MiSeqReporter, by default. Always restart
the service after modifying the config file.
The editable portion of this file is contained between the <appSettings> tags, which show
key/value pairs for the parameter settings applied.
Available Configurable Settings
The following configurable settings are used in MiSeq Reporter.exe.config.
Setting Name
Values and Description
AdapterTrimmingStringency
0.9 (default)
The minimum match rate allowed in adapter trimming. The
default setting trims sequences with > 90% sequence identity
with the adapter.
ConvertMissingBclsToNoCalls
1 (true; default)
0 (false)
If set to true, any missing or invalid *.bcl files cause MiSeq
Reporter to log an error and flag the tile as having no-calls
(Ns) for the affected cycle.
If set to false, any missing or truncated *.bcl files cause MiSeq
Reporter to log an error and abort analysis.
CopyToRTAOutputPath
1 (true; default)
0 (false)
If set to true, copy all alignment data to the <OutputDirectory>
specified in the RTAConfiguration.xml file, which is located in
Data\Intensities.
CreateFastqForIndexReads
0 (false; default)
1 (true)
If set to false, FASTQ files are not generated for index reads.
If set to true, FASTQ files are generated for index reads.
EnableHTTPService
1 (true; default)
0 (false)
Determines whether MiSeq Reporter provides the web
interface.
MiSeq Reporter Software Guide
29
MiSeq Reporter Configurable Settings
MiSeq Reporter Configurable Settings
Folders, File Formats, and Settings
30
Setting Name
Values and Description
FilterNonPFReads
1 (true; default)
0 (false)
Determines whether those clusters that fail the chastity filter
are filtered from all FASTQ files.
GATKDownsampleDepth
5000 (default)
When using GATK for variant calling, reads in regions of high
depth are (optionally) randomly down-sampled.
• Set to a higher value to retain more reads.
• Set to 0 to disable down-sampling. Disabling down-sampling
can lead to increased run time and memory use on high-coverage
runs.
IndelRepeatFilterCutoff
8 (default)
By default, indels are flagged as filtered if the reference has a 1or 2-base motif repeated eight or more times next to the
variant.
MaximumGigabytesPerProcess
Varies
The maximum gigabytes of memory allowed for a child
process. By default, this threshold is adjusted automatically
based on the memory available on the system.
MaximumHoursPerProcess
72 (default)
The maximum number of hours to allow a child process to
run.
MaximumMegabasesAssembly
550 (default)
The maximum number of megabases to assemble. Larger
values require more RAM.
Assembly of reads from longer runs requires more memory
than assembly of reads from shorter runs. If the process
terminates due to memory requirements, consider lowering
the MaximumMegabasesAssembly value.
MinimumAlignReadLength
21 (maximum; default)
8 (min)
The minimum length of a non-indexed read to align using
BWA or ELAND (deprecated in v2.2).
NMaskShortAdapterReads
10-base (default)
The number of bases from the start of the adapter that triggers
N-masking of the entire read.
RetainTempFiles
0 (false; default)
1 (true)
If set to true, temporary files are retained. Retaining
temporary files requires large amounts of disk space. Use this
setting for troubleshooting only.
VariantFilterQualityCutoff
30 (default) for GATK and somatic variant caller
20 (default) for Starling
SNPs with variant quality scores below this threshold are
flagged as filtered in the *.vcf files.
Part # 15042295 Rev. E
Restarting the Service
Restarting the Service
After updating MiSeq Reporter.exe.config, restart the service to enable changes.
1
From the Control Panel, select Administrative Tools | Services.
2
Select MiSeq Reporter service, and then click the Restart Service icon
MiSeq Reporter Software Guide
.
31
32
Part # 15042295 Rev. E
Chapter 4 Installation and Troubleshooting
MiSeq Reporter Off-Instrument Requirements
Installing MiSeq Reporter Off-Instrument
Using MiSeq Reporter Off-Instrument
Troubleshooting MiSeq Reporter
MiSeq Reporter Software Guide
Chapter 4
Installation and
Troubleshooting
34
35
37
38
33
Installation and Troubleshooting
MiSeq Reporter Off-Instrument Requirements
Installing a copy of MiSeq Reporter on an off-instrument Windows computer allows
secondary analysis of sequencing data while the MiSeq performs a subsequent sequencing
run.
For more information, see Installing MiSeq Reporter Off-Instrument on page 35.
Computing Requirements
MiSeq Reporter software requires the following computing components:
} 64-bit Windows OS (Vista, Windows 7, Windows Server 2008 64-bit, English-US)
} ≥ 8 GB RAM minimum; ≥ 16 GB RAM recommended
} ≥ 1 TB disk space
} Quad core processor (2.8 GHz or higher)
} Microsoft .NET 4
Supported Browsers
MiSeq Reporter can be viewed with the following web browsers:
} Safari 5.1.7, or later
} Chrome 20.0, or later
} Firefox 13.0.1, or later
} Internet Explorer 8, or later
Downloading and Licensing
34
1
Download a second copy of the MiSeq Reporter software from the Illumina website. A
MyIllumina login is required.
2
Accept the end-user licensing agreement (EULA) when prompted during installation.
No license key is required as this additional copy is free of charge.
Part # 15042295 Rev. E
To install MiSeq Reporter on an off-instrument Windows computer, first set up Log on as a
service permission, and then run the installation wizard. Then, configure the software to
point to the appropriate Repository and GenomePath.
Uninstall Previous Versions of MiSeq Reporter
If MiSeq Reporter v1.0.27, or earlier, is installed on the computer, first uninstall it before
running the installation wizard.
NOTE
If a later version is installed, skip to Set Up User or Group Accounts on Windows 7.
1
[Optional] Save a copy of the folder where the FASTA files for the reference genomes
are stored.
2
From the Windows Start menu, select Control Panel, and then click Programs.
3
Click Programs and Features.
4
Right-click MiSeq Reporter, and then click Uninstall.
5
Click OK through any prompts.
Set Up User or Group Accounts on Windows 7
To configure user or group accounts to enable Log on as a service permission, you must
administrator rights to the computer. If you do have administrator rights or need assistance
setting up a user or group account, contact your local facility administrator.
1
From the Windows Start menu, select Control Panel, and then click System and
Security.
2
Click Administrative Tools, and then double-click Local Security Policy.
3
From the Security Settings tree on the left, double-click Local Policies and then click
User Rights Assignments.
4
In the details pane on the right, double-click Log on as a service.
5
In the Properties dialog box, click Add User or Group.
6
Enter the name of the user or group account for this computer. Click Check Names to
validate the account.
7
Click OK through any open dialog boxes and then close the control panel.
For more information, see technet.microsoft.com/en-us/library/cc739424(WS.10).aspx on the
Microsoft website.
Run the MiSeq Reporter Installation Wizard
1
Download and unzip the MiSeq Reporter installation package from the Illumina
website.
2
Double-click the setup.exe file.
3
Click Next through the prompts in the installation wizard.
MiSeq Reporter Software Guide
35
Installing MiSeq Reporter Off-Instrument
Installing MiSeq Reporter Off-Instrument
Installation and Troubleshooting
4
When prompted, specify the user name and password for an account with Log on as a
service permission, as set up in the previous step.
5
Continue through any remaining prompts.
Configure MiSeq Reporter
To configure MiSeq Reporter to locate the run folder and reference genome folder, edit the
configuration file in a text editor, such as Notepad.
1
Navigate to the installation folder (C:\Illumina\MiSeq Reporter, by default) and open
the file MiSeq Reporter.exe.config in a text editor.
2
Locate the Repository tag and change the value to the default data location on the offinstrument computer.
<add key="Repository" value="E:\Data\Repository" />
Alternatively, this location can be a network location accessible from the off-instrument
computer.
3
Locate the GenomePath tag and change the value to the location of the folder
containing reference genomes files in FASTA format.
<add key="GenomePath" value="E:\MyGenomes\FASTA" />
Start the MiSeq Reporter Service
After completing the installation, the MiSeq Reporter service starts automatically. If the
service does not start, start it manually using the following instructions, or reboot the
computer.
36
1
From the Windows Start menu, right-click Computer and select Manage.
2
From the Computer Management tree on the left, double-click Services and
Applications and then click Services.
3
Right-click MiSeq Reporter and select Properties.
4
On the General tab, make sure that the Startup Type is set to Automatic, and then click
Start.
5
On the Log On tab, set the user name and password for a Services account that has
permissions to write to the server. Illumina recommends the Local System account for
most users. For assistance or site-specific network requirements, contact the local
facility administrator.
6
Click OK through any open dialog boxes and then close the Computer Management
window.
7
After starting the MiSeq Reporter service, connect to the software locally using
localhost:8042 in a web browser.
Part # 15042295 Rev. E
To use MiSeq Reporter off-instrument, make sure that folders containing run data and
reference genomes are accessible.
1
If you are not using a network location for sequencing data and reference genomes,
copy the following folders to your local computer:
• Copy run data from the MiSeq computer in D:\MiSeqOutput\<RunFolder>.
• Copy reference genomes from the MiSeq computer in C:\Illumina\MiSeq
Reporter\Genomes.
2
Open a web browser to localhost:8042, which opens the MiSeq Reporter web interface.
3
If the location of the run data differs from the location specified in MiSeq
Reporter.exe.config, change the path using the Settings
icon.
NOTE
Specifying the repository path in Settings is temporary. The next time you restart your
computer, the path defaults to the Repository location specified in MiSeq
Reporter.exe.config.
4
Select Analyses on the left-side of the web interface to view the runs available in the
specified Repository location.
5
Before you requeue analysis using an off-instrument installation of MiSeq Reporter,
update the path of the GenomeFolder in the sample sheet to the new location. After
updating the GenomeFolder path, click Save and Requeue. For more information, see
Editing the Sample Sheet in MiSeq Reporter on page 8.
MiSeq Reporter Software Guide
37
Using MiSeq Reporter Off-Instrument
Using MiSeq Reporter Off-Instrument
Installation and Troubleshooting
Troubleshooting MiSeq Reporter
MiSeq Reporter runs as Windows service application. User accounts must be configured to
enable Log on as a service permission before installing MiSeq Reporter. For more
information, see Set Up User or Group Accounts on Windows 7 on page 35.
For more information, see msdn.microsoft.com/en-us/library/ms189964.aspx.
Service Fails to Start
If the service fails to start, check the Window Event Log and view the details of the error
message.
1
Open the Control Panel and select Administrative Tools.
2
Select Event Viewer.
3
In the Event Viewer window, select Windows Logs | Application. The error listed in
the event log describes any syntax errors in MiSeq Reporter.exe.config. Incorrect syntax
in the MiSeq Reporter.exe.config file can cause the service to fail.
Files Failed to Copy
If files fail to copy to the intended location, check the following settings:
38
1
Check the path to the specified repository folder or MiSeqOutput folder:
• If you are using MiSeq Reporter off-instrument, check the repository location using
Settings
on the MiSeq Reporter web interface.
• If you are using MiSeq Reporter on-instrument, check the MiSeqOutput folder
location on the MCS Run Options screen, Folder Settings tab.
Use the full UNC path, such as \\server1\Runs. Because MiSeq Reporter runs as a
Windows service, it does not recognize user-mapped drives, such as Z:\Runs.
2
Confirm that you have write-access to the output folder location. If you need assistance,
contact your facility administrator.
3
If you use a network Linux storage location, and MiSeq Reporter analysis files fail to
transfer there, see the technote Configuring MiSeq Reporter to Work with Samba Shares on a
Linux Server (part # 970-2014-027) for assistance. The technote is on the Documentation
and Literature page of support.illumina.com.
4
Make sure that copying is not disabled in the <appSettings> section of the MiSeq
Reporter.exe.config file. Make sure that the value is set to 1.
<add key="CopyToRTAOutputPath" value="1"/>
5
Check if the files failed to copy because of a timeout error.
• Open the AnalysisError.txt file, located in the root level of the MiSeqAnalysis
folder.
• If there is a timeout error, the file contains the message
Copy thread has taken too long (over 1800 seconds) -aborting.
Use the procedure Configuring File Copy Timeout to increase the file copy timeout
value.
If you continue to receive timeout errors after adjusting the parameter value, a network
problem can be the cause of file copy delays. Consult your IT department.
Part # 15042295 Rev. E
File copy timeout length is determined by the FileCopyWaitFinishTimeInSeconds parameter
setting in the MiSeq Reporter.exe.config file.
1
Open the MiSeq Reporter.exe.config file and check that the file contains the string
<add key="FileCopyWaitFinishTimeInSeconds" value="1800"/>.
For more information on the MiSeq Reporter.exe.config file see, MiSeq Reporter
Configurable Settings on page 29
2
If the string is not in the MiSeq Reporter.exe.config file, add it under <appSettings>.
3
Configure the FileCopyWaitFinishTimeInSeconds parameter value according to the
recommendation of your IT department.
The FileCopyWaitFinishTimeInSeconds value is in seconds. The default value is 1800,
which is equivalent to 30 minutes.
4
Restart the service to enable changes.
For more information, see Restarting the Service on page 31.
NOTE
Setting the FileCopyWaitFinishTimeInSeconds value too high can delay MiSeq Reporter
analysis.
Viewing Log Files for a Failed Run
Viewing logs files can help identify specific errors for troubleshooting purposes.
1
To view the log files using the MiSeq Reporter web browser interface, select the run in
the Analyses tab.
2
Select the Logs tab to view a list of every step that occurred during analysis. Log
information is recorded in AnalysisLog.txt, which is located in the root level of the
MiSeqAnalysis folder.
3
Select the Errors tab to view a list of errors that occurred during analysis. Error
information is recorded in AnalysisError.txt, which is located in the root level of the
MiSeqAnalysis folder.
MiSeq Reporter Software Guide
39
Troubleshooting MiSeq Reporter
Configuring File Copy Timeout
40
Part # 15042295 Rev. E
*
*.bam 25
*.bam.bai 25
*.bcl files 11
*.demux 24
*.fastq.gz 25
*.filter files 11
*.locs.files 11
*.vcf 26
A
AdapterTrimmingStringency 29
alignment
BWA 19
scores 19
Smith-Waterman 19
analysis
during sequencing 2
analysis folder 7, 22
analysis tab 7
AnalysisError.txt 39
AnalysisLog.txt 39
ASCII codes 15
B
BAM files
file format 25
in alignment 19
BAM index files 25
base call files 11
bcl files 11
BWA 19
C
CD coding region 27
clusters passing filter 15
computing requirements 34
configurable settings 29
ConvertMissingBclsToNoCalls 18, 29
copy folder 7
CopyToRTAOutputPath 29
CreateFastqForIndexReads 18, 29
customer support 43
D
data folder 7
databases, pre-installed 12
dbsnp database 12
demultiplexing 17, 24
DemultiplexSummaryF1L1.txt 17
details tab 7
documentation 43
DP depth 27
E
Index
Index
EnableHTTPService 29
error probability 15
errors tab 7
F
FASTQ files
config settings 18
file format 24
file naming 25
generation 18
quality trimming 18
FASTQ files for index reads 29
files fail to copy 38-39
filter files 11
FilterNonPFReads 18, 29
G
GATK 20
GATKDownsampleDepth 29
genome path 29, 36
GI gene ID 27
GNU zip format 25
GT genotype 27
H
help, technical 43
I
icons, state of analysis 5
iGenomes 12
IndelRepeatFilterCutoff 27, 29
input files 11
installation, off-instrument 35
IP address, MiSeq Reporter 3
L
license (EULA) 34
Linux 38
local security policy 35
Local System account 36
localhost 3
locs files 11
log files 39
log on as a service 35
logs tab 7
LowDP 27
LowGQ 27
LowVariantFreq 27
M
manifest file
definition 4
in sample sheet 8
MaxGigabytesPerProcess 29
MaxHoursPerProcess 29
editing the sample sheet 8
MiSeq Reporter Software Guide
41
Index
MaxMegabasesAssembly 29
MinimumAlignReadLength 29
MinimumCoverageDepth 27
miRbase database 12
MiSeq Reporter.exe.config 29
MiSeqAnalysis folder 22
MiSeqOutput folder 22
N
NL noise level 27
NMaskShortAdapterReads 29
P
passing filter (PF) 15
phasing 16
Phred scale 15
prephasing 16
Q
Q-scores 15
q20 27
quality score 20
quality scores 15
QualityScoreTrim 18
R
r8s 27
read cycles 7
reference genome
file format 4
reference genomes
custom genomes 12
file format 12
pre-installed 12
refGene database 12
repository path 5, 29, 36
requeue analysis 5, 8, 10
RetainTempFiles 29
RTAComplete.txt 11
run folder
definition 4
relationship 22
RunInfo.xml 11
T
technical assistance 43
TI transcript ID 27
timeout error 38-39
troubleshooting
files fail to copy 38-39
log files 39
service fails to start 38
V
variant caller
GATK 20
somatic variant caller 20
Starling 20
VariantFilterQualityCutoff 27, 29
VariantFrequencyFilterCutoff 27
VariantMinimumGQCutoff 27
VCF files
annotations 27
file format 26
filter annotations 27
in variant calling 20
VF variant frequency 27
viewing MiSeq Reporter 3
W
Windows service
about 2
Log on as service 38
restart the service 31
workflows
letter designators 5
S
SAM tools 25
sample number 0 17, 25
sample sheet
definition 4
editing 8
sample sheet tab 7
SampleSheet.csv 11
SB strand bias 27
sb0.5 27
server URL 5
service fails to start 38
Smith-Waterman 19
SNPs 20
somatic variant caller 20
Starling 20
StrandBiasFilter 27
summary tab 7
42
Part # 15042295 Rev. E
For technical assistance, contact Illumina Technical Support.
Table 5 Illumina General Contact Information
Website
Email
www.illumina.com
[email protected]
Table 6 Illumina Customer Support Telephone Numbers
Region
Contact Number
Region
North America
1.800.809.4566
Italy
Australia
1.800.775.688
Netherlands
Austria
0800.296575
New Zealand
Belgium
0800.81102
Norway
Denmark
80882346
Spain
Finland
0800.918363
Sweden
France
0800.911850
Switzerland
Germany
0800.180.8994
United Kingdom
Ireland
1.800.812949
Other countries
Contact Number
800.874909
0800.0223859
0800.451.650
800.16836
900.812168
020790181
0800.563118
0800.917.0041
+44.1799.534000
Safety Data Sheets
Safety data sheets (SDSs) are available on the Illumina website at
support.illumina.com/sds.html.
Product Documentation
Product documentation in PDF is available for download from the Illumina website. Go
to support.illumina.com, select a product, then click Documentation & Literature.
MiSeq Reporter Software Guide
43
Technical Assistance
Technical Assistance
Illumina
5200 Illumina Way
San Diego, California 92122 U.S.A.
+1.800.809.ILMN (4566)
+1.858.202.4566 (outside North America)
[email protected]
www.illumina.com
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement