Local Run Manager Generate FASTQ Analysis Module Workflow Guide (1000000003344 v00)

Local Run Manager Generate FASTQ Analysis Module Workflow Guide (1000000003344 v00)

Local Run Manager

Generate FASTQ Analysis Module

Workflow Guide

For Research Use Only. Not for use in diagnostic procedures.

Overview

Set Parameters

Analysis Methods

View Analysis Results

Analysis Report

Analysis Output Files

Custom Analysis Settings

Technical Assistance

6

7

3

4

8

9

12

13

ILLUMINA PROPRIETARY

Document # 1000000003344 v00

January 2016

This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"), and are intended solely for the contractual use of its customer in connection with the use of the product(s) described herein and for no other purpose. This document and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed, or reproduced in any way whatsoever without the prior written consent of Illumina. Illumina does not convey any license under its patent, trademark, copyright, or common-law rights nor similar rights of any third parties by this document.

The instructions in this document must be strictly and explicitly followed by qualified and properly trained personnel in order to ensure the proper and safe use of the product(s) described herein. All of the contents of this document must be fully read and understood prior to using such product(s).

FAILURE TO COMPLETELY READ AND EXPLICITLY FOLLOW ALL OF THE INSTRUCTIONS CONTAINED HEREIN

MAY RESULT IN DAMAGE TO THE PRODUCT(S), INJURY TO PERSONS, INCLUDING TO USERS OR OTHERS, AND

DAMAGE TO OTHER PROPERTY.

ILLUMINA DOES NOT ASSUME ANY LIABILITY ARISING OUT OF THE IMPROPER USE OF THE PRODUCT(S)

DESCRIBED HEREIN (INCLUDING PARTS THEREOF OR SOFTWARE).

© 2016 Illumina, Inc. All rights reserved.

Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cBot, CSPro, CytoChip, DesignStudio,

Epicentre, ForenSeq, Genetic Energy, GenomeStudio, GoldenGate, HiScan, HiSeq, HiSeq X, Infinium, iScan, iSelect,

MiSeq, MiSeqDx, MiSeq FGx, NeoPrep, NextBio, Nextera, NextSeq, Powered by Illumina, SureMDA, TruGenome,

TruSeq, TruSight, Understand Your Genome, UYG, VeraCode, verifi, VeriSeq, the pumpkin orange color, and the streaming bases design are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. All other names, logos, and other trademarks are the property of their respective owners.

Overview

The Local Run Manager Generate FASTQ analysis module first demultiplexes indexed reads, if present, generates intermediate analysis files in the FASTQ file format, and then exits the workflow. No alignment or further analysis is performed. FASTQ files are required input for analysis with third-party analysis tools.

Compatible Library Types

The Generate FASTQ analysis module is compatible with specific library types represented by library kit categories on the Create Run screen. For a current list of compatible library kits, see the Local Run Manager support page on the Illumina website.

Input Requirements

The Generate FASTQ analysis module requires the base call files (*.bcl) and the run summary files generated during the sequencing run. Because the workflow ends after

FASTQ file generation, no other input files are required.

About This Guide

This guide provides instructions for setting up run parameters for sequencing and analysis parameters for the Generate FASTQ analysis module. For information about the

Local Run Manager dashboard and system settings, see the Local Run Manager Software

Guide (document # 1000000002702).

Local Run Manager Generate FASTQ Analysis Module Workflow Guide

3

Set Parameters

1 Click Create Run, and select Generate FASTQ.

2 Enter a run name that identifies the run from sequencing through analysis.

Use alphanumeric characters, spaces, underscores, or dashes.

3 [Optional] Enter a run description to help identify the run.

Use alphanumeric characters.

Specify Run Settings

1 From the Library Kit drop-down list, select an appropriate library kit category for the run.

Library Kit

Category

Nextera

Nextera XT

Nextera XT V2

Nextera Rapid

Capture

Nextera Mate Pair

Read Type

Paired End or

Single Read

Paired End only

Number of Index Reads

Possible number of index reads is none, 1, or 2 reads of 8 cycles each.

TruSeq HT

TruSeq LT

TruSeq Amplicon

TruSeq Small RNA

TruSight Amplicon

Panels

TruSight Enrichment

Panels

Paired End or

Single Read

Paired End or

Single Read

Paired End only

Single Read only

Paired End only

Paired End or

Single Read

Possible number of index reads is none or 1 read of 6 cycles.

Possible number of index reads is none, 1, or 2 reads of 8 cycles each.

Possible number of index reads is none or 1 read of 6 cycles.

Possible number of index reads is 2 reads of 8 cycles each.

Possible number of index reads is none or 1 read of 6 cycles.

Possible number of index reads is 2 reads of 8 cycles each.

Possible number of index reads is none, 1, or 2 reads of 8 cycles each.

2 Specify the number of index reads, if a change is possible.

} 0 for a run with no indexing

} 1 for a single-indexed run

}

2 for a dual-indexed run

3 Specify a read type: Single Read or Paired End, if a change is possible.

4 Enter the number of cycles for the run.

5 [Optional] Specify any custom primers to be used for the run.

Specify Samples for the Run

Specify samples for the run using the following options:

} Enter samples manually—Use the blank table on the Create Run screen.

} Import samples—Navigate to an external file in a comma-separated values (*.csv) format. A template is available for download on the Create Run screen.

4

Document # 1000000003344 v00

After you have populated the samples table, you can export the sample information to an external file, and use the file as a reference when preparing libraries or import the file for another run.

Enter Samples Manually

1 Adjust the samples table to an appropriate number of rows.

} Click the + icon to add a row.

} Use the up/down arrows to add multiple rows. Click the + icon.

}

Click the x icon to delete a row.

}

Right-click on a row in the table and use the commands in the drop-down menu.

2 Enter a unique sample ID in the Sample ID field.

Use alphanumeric characters, dashes, or underscores.

3 [Optional] Enter a sample description in the Sample Description field.

Use alphanumeric characters, dashes, underscores, or spaces.

4 Expand the Index 1 (i7) drop-down list and select an Index 1 adapter.

5 Expand the Index 2 (i5) drop-down list and select an Index 2 adapter.

6 [Optional] Click the Export icon to export sample information in *.csv format.

7 When finished, click Save Run.

Import Samples

1 Click Template. The template file contains the correct column headings for import.

2 Enter the sample information in each column for the samples in the run, and then save the file.

3 Click Import Samples and browse to the location of the sample information file.

4 When finished, click Save Run.

Local Run Manager Generate FASTQ Analysis Module Workflow Guide

5

Analysis Methods

The Generate FASTQ analysis module performs the following analysis steps and then writes analysis output files to the folder.

}

Demultiplexes index reads

}

Generates FASTQ files

Demultiplexing

Demultiplexing compares each Index Read sequence to the index sequences specified for the run. No quality values are considered in this step.

Index reads are identified using the following steps:

} Samples are numbered starting from 1 based on the order they are listed for the run.

} Sample number 0 is reserved for clusters that were not assigned to a sample.

}

Clusters are assigned to a sample when the index sequence matches exactly or when there is up to a single mismatch per Index Read.

FASTQ File Generation

After demultiplexing, the software generates intermediate analysis files in the FASTQ format, which is a text format used to represent sequences. FASTQ files contain reads for each sample and the associated quality scores. Any controls used for the run and clusters that did not pass filter are excluded.

Each FASTQ file contains reads for only 1 sample, and the name of that sample is included in the FASTQ file name. FASTQ files are the primary input for alignment.

6

Document # 1000000003344 v00

View Analysis Results

1 From the Local Run Manager dashboard, click the run name.

2 From the Run Overview tab, review the sequencing run metrics.

3 [Optional] Click the Copy to Clipboard icon for access to the output run folder.

4 Click the Sequencing Information tab to review run parameters and consumables information.

5 Click the Samples and Results tab to view the analysis report.

}

If analysis was repeated, expand the Select Analysis drop-down and select the appropriate analysis.

}

From the left navigation bar, select a sample name to view the report for another sample.

6 [Optional] Click the Copy to Clipboard icon for access to the Analysis folder.

Local Run Manager Generate FASTQ Analysis Module Workflow Guide

7

Analysis Report

Analysis results are provided on the Samples and Results tab. Results include the 100 most popular index sequences and the counts for each sequence. This list is useful in troubleshooting indexing issues.

Indexing

Table 1 Indexing Table

Column Heading

Index Number

Sample Name

Index 1 (i7)

Index 2 (i5)

% Reads Identified (PF)

Description

An assigned ID based on the order that samples are listed in the sample table.

The sample name provided when the run was created.

The Index 1 adapter used with the sample.

The Index 2 adapter used with the sample.

The percentage of reads identified from the reads that passed filter.

Most Popular Index Sequences

For runs with an Index 1 Read, the most popular Index 1 sequences are listed. For runs with an Index 2 Read, the most popular Index 2 sequences table are listed.

Table 2 Most Popular Index Sequences Table

Column Heading

Sequence

Description

Shows the sequence of the 100 most popular index sequences.

Reverse Complement

Hit Counts

Shows the reverse complement of the original sequences.

Lists the number of times the sequence was counted.

8

Document # 1000000003344 v00

Analysis Output Files

The following analysis output files are generated for the Generate FASTQ analysis module.

File Name

Demultiplexing (*.demux)

FASTQ (*.fastq.gz)

Description

Intermediate files containing demultiplexing results.

Intermediate files containing quality scored base calls.

FASTQ files are the primary input for the alignment step.

Demultiplexing File Format

The process of demultiplexing reads the index sequence attached to each cluster to determine from which sample the cluster originated. The mapping between clusters and sample number are written to 1 demultiplexing (*.demux) file for each tile of the flow cell.

The demultiplexing file naming format is s_1_X.demux, where X is the tile number.

Demultiplexing files start with a header:

} Version (4 byte integer), currently 1

}

Cluster count (4 byte integer)

The remainder of the file consists of sample numbers for each cluster from the tile.

When the demultiplexing step is complete, the software generates a demultiplexing file named DemultiplexSummaryF1L1.txt.

} In the file name, F1 represents the flow cell number.

} In the file name, L1 represents the lane number.

}

Demultiplexing results in a table with 1 row per tile and 1 column per sample, including sample 0.

}

The most commonly occurring sequences in index reads.

FASTQ File Format

FASTQ file is a text-based file format that contains base calls and quality values per read.

Each record contains 4 lines:

}

The identifier

}

The sequence

}

A plus sign (+)

} The quality scores in an ASCII encoded format

The identifier is formatted as:

@Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber

Example:

@SIM:1:FCX:1:15:6329:1045 1:N:0:2

TCGCACTCAACGCCCTGCATATGACAAGACAGAATC

+

<>;##=><9=AAAAAAAAAA9#:<#<;<<<????#=

Local Run Manager Generate FASTQ Analysis Module Workflow Guide

9

FASTQ File Names

FASTQ files are named with the sample name and the sample number. The sample number is a numeric assignment based on the order that the sample is listed for the run.

For example:

Data\Intensities\BaseCalls\samplename_S1_L001_R1_001.fastq.gz

} samplename—The sample name listed for the sample. If a sample name is not provided, the file name includes the sample ID.

}

S1—The sample number based on the order that samples are listed for the run starting with 1. In this example, S1 indicates that this sample is the first sample listed for the run.

NOTE

Reads that cannot be assigned to any sample are written to a FASTQ file for sample number 0, and excluded from downstream analysis.

} L001—The lane number.

}

R1—The read. In this example, R1 means Read 1. For a paired-end run, a file from

Read 2 includes R2 in the file name.

}

001—The last segment is always 001.

FASTQ files are compressed in the GNU zip format, as indicated by *.gz in the file name.

FASTQ files can be uncompressed using tools such as gzip (command-line) or 7-zip

(GUI).

Supplementary Output Files

The following output files provide supplementary information, or summarize run results and analysis errors. Although, these files are not required for assessing analysis results, they can be used for troubleshooting purposes. All files are located in the Alignment folder unless otherwise specified.

File Name

AdapterTrimming.txt

AnalysisLog.txt

AnalysisError.txt

CompletedJobInfo.xml

DemultiplexSummaryF1L1.txt

GenerateFASTQRunStatistics.xml

Description

Lists the number of trimmed bases and percentage of bases for each tile. This file is present only if adapter trimming was specified for the run.

Processing log that describes every step that occurred during analysis of the current run folder.

This file does not contain error messages.

Located in the root level of the run folder.

Processing log that lists any errors that occurred during analysis. This file is present only if errors occurred.

Located in the root level of the run folder.

Written after analysis is complete, contains information about the run, such as date, flow cell

ID, software version, and other parameters.

Located in the root level of the run folder.

Reports demultiplexing results in a table with 1 row per tile and 1 column per sample.

Contains summary statistics specific to the run.

Located in the root level of the run folder.

10

Document # 1000000003344 v00

Analysis Folder

The analysis folder holds the files generated by the Local Run Manager software.

The relationship between the output folder and analysis folder is summarized as follows:

} During sequencing, Real-Time Analysis (RTA) populates the output folder with files generated during image analysis, base calling, and quality scoring.

}

RTA copies files to the analysis folder in real time. After RTA assigns a quality score to each base for each cycle, the software writes the file RTAComplete.xml to both folders.

}

When the file RTAComplete.xml is present, analysis begins.

} As analysis continues, Local Run Manager writes output files to the analysis folder, and then copies the files back to the output folder.

Folder Structure

Data

Intensities

Basecalls

Sample1_S1_L001_R1_001.fastq.gz

Sample2_S2_L001_R1_001.fastq.gz

Undetermined_S0_L001_R1_001.fastq.gz

L001—Contains *.locs files, 1 for each tile.

RTA Logs—Contains log files from RTA software analysis.

InterOp—Contains binary files used by Sequencing Analysis Viewer (SAV).

Logs—Contains log files describing steps performed during sequencing.

Queued—A working folder for software; also called the copy folder.

AnalysisError.txt

AnalysisLog.txt

CompletedJobInfo.xml

QueuedForAnalysis.txt

[Workflow]RunStatistics

RTAComplete.xml

RunInfo.xml

runParameters.xml

Local Run Manager Generate FASTQ Analysis Module Workflow Guide

11

Custom Analysis Settings

Custom analysis settings are intended for technically advanced users. If settings are applied incorrectly, serious problems can occur.

Add a Custom Analysis Setting

1 From the Module-Specific Settings section of the Create Run screen, click Show

advanced module settings.

2 Click Add custom setting.

3 In the custom setting field, enter the setting name as listed in the Available Analysis

Settings section.

4 In the setting value field, enter the setting value.

5 To remove a setting, click the x icon.

Available Analysis Settings

}

Adapter Trimming—By default, adapter trimming is enabled in the Generate FASTQ analysis module. To specify a different adapter, use the Adapter setting. The same adapter sequence is trimmed for Read 1 and Read 2.

} To specify 2 adapter sequences, separate the sequences with a plus (+) sign.

} To specify a different adapter sequence for Read 2, use the AdapterRead2 setting.

Setting Name

Adapter

AdapterRead2

Setting Value

Enter the sequence of the adapter to be trimmed.

Enter the sequence of the adapter to be trimmed.

12

Document # 1000000003344 v00

Technical Assistance

For technical assistance, contact Illumina Technical Support.

Table 3 Illumina General Contact Information

Website

www.illumina.com

Email

[email protected]

Table 4 Illumina Customer Support Telephone Numbers

Region

North America

Australia

Austria

Belgium

China

Denmark

Finland

France

Germany

Hong Kong

Ireland

Italy

Contact Number

1.800.809.4566

1.800.775.688

0800.296575

0800.81102

400.635.9898

80882346

0800.918363

0800.911850

0800.180.8994

800960230

1.800.812949

800.874909

Region

Japan

Netherlands

New Zealand

Norway

Singapore

Spain

Sweden

Switzerland

Taiwan

United Kingdom

Other countries

Contact Number

0800.111.5011

0800.0223859

0800.451.650

800.16836

1.800.579.2745

900.812168

020790181

0800.563118

00806651752

0800.917.0041

+44.1799.534000

Safety data sheets (SDSs)—Available on the Illumina website at support.illumina.com/sds.html

.

Product documentation—Available for download in PDF from the Illumina website. Go to support.illumina.com, select a product, then select Documentation & Literature.

Local Run Manager Generate FASTQ Analysis Module Workflow Guide

Illumina

5200 Illumina Way

San Diego, California 92122 U.S.A.

+1.800.809.ILMN (4566)

+1.858.202.4566 (outside North America) [email protected]umina.com

www.illumina.com

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement

Table of contents