Quartus II Handbook Version 10.1 Volume 2: Design Implementation

Quartus II Handbook Version 10.1 Volume 2: Design Implementation
Section III. Area, Timing, Power, and
Compilation Time Optimization
This section introduces features in the Quartus® II software that you can use to
optimize area, timing, power, and compilation time when you design for
programmable logic devices (PLDs).
This section includes the following chapters:
■
Chapter 11, Design Optimization Overview
This chapter summarizes features in the Quartus II software that you can use to
achieve the highest design performance when you design for PLDs, especially
high density FPGAs.
■
Chapter 12, Reducing Compilation Time
This chapter describes techniques for reducing the amount of time it takes to
compile and recompile your design, accelerating your design process.
■
Chapter 13, Area and Timing Optimization
This chapter describes a broad spectrum of Quartus II software features and
design techniques to reduce resource usage and improve timing performance
when designing for Altera® devices. This chapter also explains how and when to
use some of the features described in other chapters of the Quartus II Handbook.
■
Chapter 14, Power Optimization
This chapter describes the power-driven compilation feature and flow in detail, as
well as low power design techniques that can further reduce power consumption
in your design.
■
Chapter 15, Analyzing and Optimizing the Design Floorplan with the Chip
Planner
You can use the Chip Planner to perform design analysis and create a design
floorplan. This chapter discusses how to analyze and optimize the design
floorplan with the Chip Planner.
■
Chapter 16, Netlist Optimizations and Physical Synthesis
This chapter explains how the physical synthesis optimizations in the Quartus II
software can improve your quality of results. This chapter also provides
information about preserving and writing out a new netlist, and provides
guidelines for applying the various options.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
III–2
Section III: Area, Timing, Power, and Compilation Time Optimization
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
11. Design Optimization Overview
December 2010
QII52021-10.0.2
QII52021-10.0.2
This chapter introduces features in Altera’s Quartus® II software that you can use to
achieve the highest design performance when you design for programmable logic
devices (PLDs), especially high density FPGAs.
Introduction
Physical implementation can be an intimidating and challenging phase of the design
process. The Quartus II software provides a comprehensive environment for FPGA
designs, delivering unmatched performance, efficiency, and ease-of-use.
In a typical design flow, you must synthesize your design with Quartus II integrated
synthesis or a third-party tool, place and route your design with the Fitter, and use the
TimeQuest timing analyzer to ensure your design meets the timing requirements.
With the PowerPlay Power Analyzer, you ensure the design’s power consumption is
within limits. .
Physical Implementation
Most optimization issues involve preserving previous results, reducing area, reducing
critical path delay, reducing power consumption, and reducing runtime. The
Quartus II software includes advisors to address each of these issues and helps you
optimize your design. Run these advisors during physical implementation for advice
about your specific design.
You can reduce the time spent on design iterations by following the recommended
design practices for designing with Altera® devices. Design planning is critical for
successful design timing implementation and closure.
f For more information, refer to the Design Planning with the Quartus II Software chapter
in volume 1 of the Quartus II Handbook.
Trade-Offs and Limitations
Many optimization goals can conflict with one another, so you might need to make
trade-offs between different goals. For example, one major trade-off during physical
implementation is between resource usage and critical path timing, because certain
techniques (such as logic duplication) can improve timing performance at the cost of
increased area. Similarly, a change in power requirements can result in area and
timing trade-offs, such as if you reduce the number of high-speed tiles available, or if
you attempt to shorten high-power nets at the expense of critical path nets.
In addition, system cost and time-to-market considerations can affect the choice of
device. For example, a device with a higher speed grade or more clock networks can
facilitate timing closure at the expense of higher power consumption and system cost.
© 2010 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off.
and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at
www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but
reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any
information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device
specifications before relying on any published information and before placing orders for products or services.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010
Subscribe
11–2
Chapter 11: Design Optimization Overview
Physical Implementation
Finally, not all designs can be realized in a hardware circuit with limited resources and
given constraints. If you encounter resource limitations, timing constraints, or power
constraints that cannot be resolved by the Fitter, consider rewriting parts of the HDL
code.
f For more information, refer to the Area and Timing Optimization chapter in volume 2 of
the Quartus II Handbook.
Preserving Results and Enabling Teamwork
For some Quartus II Fitter algorithms, small changes to the design can have a large
impact on the final result. For example, a critical path delay can change by 10% or
more because of seemingly insignificant changes. If you are close to meeting your
timing objectives, you can use the Fitter algorithm to your advantage by changing the
fitter seed, which changes the pseudo-random result of the Fitter.
Conversely, if you cannot meet timing on a portion of your design, you can partition
that portion and prevent it from recompiling if an unrelated part of the design is
changed. This feature, known as incremental compilation, can reduce the Fitter
runtimes by up to 70% if the design is partitioned, such that only small portions
require recompilation at any one time.
When you use incremental compilation, you can apply design optimization options to
individual design partitions and preserve performance in other partitions by leaving
them untouched. Many optimization techniques often result in longer compilation
times, but by applying them only on specific partitions, you can reduce this impact
and complete iterations more quickly.
In addition, by physically floorplanning your partitions with LogicLock regions, you
can enable team-based flows and allow multiple people to work on different portions
of the design.
f For more information, refer to Quartus II Incremental Compilation for Hierarchical and
Team-Based Designs in volume 1 of the Quartus II Handbook and About Incremental
Compilation in Quartus II Help.
Reducing Area
By default, the Quartus II Fitter might phyically spread a design over the entire device
to meet the set timing constraints. If you prefer to optimize your design to use the
smallest area, you can change this behavior. If you require reduced area, you can
enable certain physical synthesis options to modify your netlist to create a more
area-efficient implementation, but at the cost of increased runtime and decreased
performance.
f For more information, refer to the Area and Timing Optimization and Netlist
Optimizations and Physical Synthesis chapters in volume 2 and the Recommended HDL
Coding Styles chapter in volume 1 of the Quartus II Handbook.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 11: Design Optimization Overview
Physical Implementation
11–3
Reducing Critical Path Delay
To meet complex timing requirements involving multiple clocks, routing resources,
and area constraints, the Quartus II software offers a close interaction between
synthesis, timing analysis, floorplan editing, and place-and-route processes.
By default, the Quartus II Fitter tries to meet the specified timing requirements and
stops trying when the requirements are met. Therefore, using realistic constraints is
important to successfully close timing. If you under-constrain your design, you may
get sub-optimal results. By contrast, if you over-constrain your design, the Fitter
might over-optimize non-critical paths at the expense of true critical paths. In
addition, you might incur an increased area penalty. Compilation time may also
increase because of excessively tight constraints.
If your resource usage is very high, the Quartus II Fitter might have trouble finding a
legal placement. In such circumstances, the Fitter automatically modifies some of its
settings to try to trade off performance for area.
The Quartus II Fitter offers a number of advanced options that can help you improve
the performance of your design when you properly set constraints. Use the Timing
Optimization Advisor to determine which options are best suited for your design.
If you use incremental compilation, you can help resolve inter-partition timing
requirements by locking down the results one partition at a time or by guiding the
placement of the partitions with LogicLock regions. You might be able to improve the
timing on such paths by placing the partitions optimally to reduce the length of
critical paths. Once your inter-partition timing requirements are met, use incremental
compilation to preserve the results and work on partitions that have not met timing
requirements.
In high-density FPGAs, routing accounts for a major part of critical path timing.
Because of this, duplicating or retiming logic can allow the Fitter to reduce delay on
critical paths. The Quartus II software offers push-button netlist optimizations and
physical synthesis options that can improve design performance at the expense of
considerable increases of compilation time and area. Turn on only those options that
help you keep reasonable compilation times and resource usage. Alternately, you can
modify your HDL to manually duplicate or retime logic.
Reducing Power Consumption
The Quartus II software has features that help reduce design power consumption. The
PowerPlay power optimization options control the power-driven compilation settings
for Synthesis and the Fitter.
f For more information, refer to the Power Optimization chapter in volume 2 of the
Quartus II Handbook.
Reducing Runtime
Many Fitter settings influence compilation time. Most of the default settings in the
Quartus II software are set for reduced compilation time. You can modify these
settings based on your project requirements.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
11–4
Chapter 11: Design Optimization Overview
Using Quartus II Tools
The Quartus II software supports parallel compilation in computers with multiple
processors. This can reduce compilation times by up to 15% while giving the identical
result as serial compilation.
You can also reduce compilation time with your iterations by using incremental
compilation. Use incremental compilation when you want to change parts of your
design, while keeping most of the remaining logic unchanged.
Using Quartus II Tools
The following sections describe several Quartus II tools that you can use to help
optimize your design.
Design Analysis
The Quartus II software provides tools that help with a visual representation of your
design. You can use the RTL Viewer to see a schematic representation of your design
before synthesis and place-and-route. The Technology Map Viewer provides a
schematic representation of the design implementation in the selected device
architecture after synthesis and place-and-route. It can also include timing
information.
With incremental compilation, the Design Partition Planner and the Chip Planner
allow you to partition and layout your design at a higher level. In addition, you can
perform many different tasks with the Chip Planner, including: making floorplan
assignments, implementing engineering change orders (ECOs), and performing
power analysis. Also, you can analyze your design and achieve a faster timing closure
with the Chip Planner. The Chip Planner provides physical timing estimates, critical
path display, and routing congestion view to help guide placement for optimal
performance.
f For more information, refer to the Quartus II Incremental Compilation for Hierarchical
and Team-Based Designs and Best Practices for Incremental Compilation Partitions and
Floorplan Assignments chapters in volume 1 and the Engineering Change Management
with the Chip Planner chapter in volume 2 of the Quartus II Handbook.
Advisors
The Quartus II software includes several advisors to help you optimize your design
and reduce compilation time. You can complete your design faster by following the
recommendations in the Compilation Time Advisor, Incremental Compilation
Advisor, Timing Optimization Advisor, Area Optimization Advisor, Resource
Optimization Advisor, and Power Optimization Advisor. These advisors give
recommendations based on your project settings and your design constraints.
h For more information about advisors, refer to Quartus II Help.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 11: Design Optimization Overview
Conclusion
11–5
Design Space Explorer
Use the Design Space Explorer (DSE) to find optimal settings in the Quartus II
software. DSE automatically tries different combinations of netlist optimizations and
advanced Quartus II software compiler settings, and reports the best settings for your
design, based on your chosen primary optimization goal. You can try different seeds
with the DSE if you are fairly close to meeting your timing or area requirements and
find one seed that meets timing or area requirements. Finally, the DSE can run the
different compilations on multiple computers in parallel, which shortens the timing
closure process.
h For more information, refer to About Design Space Explorer in Quartus II Help.
Conclusion
The Quartus II software includes a number of features and tools that you can use to
optimize area, timing, power, and compilation time when you design for
programmable logic devices (PLDs).
Document Revision History
Table 11–1 shows the revision history for this chapter.
Table 11–1. Document Revision History
Date
Version
Changes
December 2010
10.0.2
Changed to new document template. No change to content.
August 2010
10.0.1
Corrected link
July 2010
10.0.0
Initial release. Chapter based on topics and text in Section III of volume 2.
f For previous versions of the Quartus II Handbook, refer to the Quartus II Handbook
Archive.
f Take an online survey to provide feedback about this handbook chapter.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
11–6
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
Chapter 11: Design Optimization Overview
Document Revision History
December 2010 Altera Corporation
12. Reducing Compilation Time
May 2011
QII52022-11.0.0
QII52022-11.0.0
The Quartus® II software offers several features and techniques to help reduce
compilation time.
This chapter describes techniques to reduce compilation time when designing for
Altera® devices, and includes the following topics:
■
“Compilation Time Optimization Techniques”
■
“Compilation Time Advisor” on page 12–2
■
“Strategies to Reduce the Overall Compilation Time” on page 12–2
■
“Reducing Synthesis Time and Synthesis Netlist Optimization Time” on page 12–5
■
“Reducing Placement Time” on page 12–7
■
“Reducing Routing Time” on page 12–8
■
“Reducing Static Timing Analysis Time” on page 12–9
■
“Setting Process Priority” on page 12–10
Compilation Time Optimization Techniques
The Analysis and Synthesis and Fitter modules require a lot of time. The Analysis and
Synthesis module includes physical synthesis optimizations performed during
synthesis, if you have turned on physical synthesis optimizations. The Fitter includes
two steps, placement and routing, and also includes physical synthesis if you turned
on the physical synthesis option with Normal or Extra effort levels. The Flow Elapsed
Time section of the Compilation Report shows the duration of the Analysis and
Synthesis and Fitter modules. The Fitter Messages report in the Fitter section of the
Compilation Report shows the duration of placement and routing.
Placement is the process of finding optimum locations for the logic in your design.
Placement includes Quartus II pre-Fitter operations, which place dedicated logic such
as clocks, PLLs, and transceiver blocks. Routing is the process of connecting the nets
between the logic in your design. Finding better placements for the logic in a design
uses more compilation time. Good logic placement allows you to more easily meet
your timing requirements and makes your design easier to route.
Example 12–1 shows the applicable messages with each time component in two-digit
format, and days shown only if applicable:
Example 12–1.
Info: Fitter placement operations ending: elapsed time =
<days:hours:minutes:seconds>
Info: Fitter routing operations ending: elapsed time =
<days:hours:minutes:seconds>
© 2011 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off.
and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at
www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but
reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any
information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device
specifications before relying on any published information and before placing orders for products or services.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011
Subscribe
12–2
Chapter 12: Reducing Compilation Time
Compilation Time Optimization Techniques
Example 12–2 shows an info message while the Fitter is running (including Placement
and Routing). The Message window displays this message every hour to indicate
Fitter operations are progressing normally.
Example 12–2.
Info: Placement optimizations have been running for 4 hour(s)
Compilation Time Advisor
A Compilation Time Advisor is available in the Quartus II software, which helps you
to reduce compilation time. Run the Compilation Time Advisor on the Tools menu by
pointing to Advisors and clicking Compilation Time Advisor. You can find all the
compilation time optimizing techniques described in this section in the Compilation
Time Advisor as well.
Strategies to Reduce the Overall Compilation Time
This section discusses strategies to reduce overall compilation time, including the
following topics:
■
“Using Parallel Compilation with Multiple Processors”
■
“Using Incremental Compilation” on page 12–3
■
“Using the Smart Compilation Setting” on page 12–4
■
“Using Rapid Recompile” on page 12–4
Using Parallel Compilation with Multiple Processors
The Quartus II software can detect the number of processors available on a computer
and use available processors to reduce compilation time. You can also control the
number of processors used during a compilation on a per user basis. The Quartus II
software can use up to 16 processors to run some algorithms in parallel and reduce
compilation time. The Quartus II software turns on the parallel compilation by default
to enable the software to detect available multiple processors. You can specify the
maximum number of processors that the software can use if you want to reserve some
of the available processors for other tasks.
1
Do not consider processors with Intel Hyper-Threading as more than one processor. If
you have a single processor with Intel Hyper-Threading enabled, you should set the
number of processors to one. Altera recommends that you do not use the Intel
Hyper-Threading feature for Quartus II compilations, because it can increase
runtimes.
The software does not necessarily use all the processors that you specify during a
given compilation. Additionally, the software never uses more than the specified
number of processors, enabling you to work on other tasks on your computer without
it becoming slow or less responsive.
If you have partitioned your design and enabled parallel compilation, the Quartus II
software can use different processors to compile those partitions simultaneously
during the Analysis and Synthesis stage, resulting in high peak memory usage during
Analysis and Synthesis.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 12: Reducing Compilation Time
Compilation Time Optimization Techniques
12–3
By partitioning your design and allowing the Quartus II software to use two
processors, you can reduce the compilation time by up to 10% on systems with two
processing cores and by up to 20% on systems with four cores. With certain design
flows in which timing analysis runs alone, using multiple processors can reduce the
time required for timing analysis by an average of 10% when using two processors.
This reduction can reach an average of 15% when using four processors.
1
You must partition your design to reduce compilation time successfully.
The actual reduction in compilation time depends on your design and on the specific
compilation settings. For example, compilations with multi-corner optimization
turned on benefit more from using multiple processors than do compilations that do
not use multi-corner optimization. The runtime requirement is not reduced for some
other compilation goals, such as Analysis and Synthesis. The Fitter (quartus_fit) and
the Quartus II TimeQuest Timing Analyzer (quartus_sta) stages in the compilation
can, in certain cases, benefit from the use of multiple processors. The Flow Elapsed
Time panel of the Compilation Report shows the average number of processors for
these stages. The Parallel Compilation panel of the appropriate report shows a more
detailed breakdown of processor usage, such as the Fitter report. This panel is
displayed only if parallel compilation is enabled.
This feature is available for Arria® series, Cyclone®, HardCopy III, HardCopy IV,
MAX® II, MAX V (limited support), and Stratix® series devices.
h For more information, refer to Processing Page (Options Dialog Box) in Quartus II Help.
h For information about how to control the number of processors used during
compilation for a specific project, refer to Compilation Process Settings Page (Settings
Dialog Box) in Quartus II Help.
You can also set the number of processors available for Quartus II compilation using
the following Tcl command in your script.
set_global_assignment -name NUM_PARALLEL_PROCESSORS <value> r
In this case, <value> is an integer from 1 to 16.
If you want the Quartus II software to detect the number of processors and use all the
processors for the compilation, use the following Tcl command in your script:
set_global_assignment -name NUM_PARALLEL_PROCESSORS ALL r
Using multiple processors does not affect the quality of the fit. For a given Fitter seed
on a specific design, the fit is exactly the same, regardless of whether the Quartus II
software uses one processor or multiple processors. The only difference between
compilations using a different number of processors is the compilation time.
Using Incremental Compilation
The incremental compilation feature can speed up design iteration time by up to 70%
for small design changes, and helps you reach design timing closure more efficiently.
You can speed up design iterations by recompiling only a particular design partition
and merging results with previous compilation results from other partitions. You can
also use physical synthesis optimization techniques for specific design partitions
while leaving other parts of your design untouched to preserve performance.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
12–4
Chapter 12: Reducing Compilation Time
Compilation Time Optimization Techniques
If you are using a third-party synthesis tool, you can create separate atom netlist files
for parts of your design that you already have synthesized and optimized so that you
update only the parts of your design that change.
In the standard incremental compilation design flow, you can divide the top-level
design into partitions, which the software can compile and optimize in the top-level
Quartus II project. You can preserve fitting results and performance for completed
partitions while other parts of your design are changing, which reduces the
compilation time for each design iteration because the software does not synthesize or
fit the unchanged partitions in your design.
The incremental compilation feature also facilitates team-based design flows by
enabling designers to create and optimize design blocks independently, when
necessary, and support third-party IP integration.
f For information about the full incremental compilation flow in the Quartus II
software, refer to the Quartus II Incremental Compilation for Hierarchical and Team-Based
Design chapter in volume 1 of the Quartus II Handbook. For information about creating
multiple netlist files in third-party tools for use with incremental compilation, refer to
the appropriate chapter in Section IV. Synthesis in volume 1 of the Quartus II Handbook.
h For additional information about incremental compilation, refer to About Incremental
Compilation in Quartus II Help.
Using the Smart Compilation Setting
Smart compilation can reduce compilation time by skipping unnecessary Compiler
stages to recompile your design. This setting is especially useful when you perform
multiple compilation iterations during the optimization phase of your design process.
However, smart compilation uses more disk space. To turn on smart compilation, on
the Assignments menu, click Settings. In the Category list, select Compilation
Process Settings and turn on Use smart compilation.
1
Smart compilation skips unnecessary Compiler stages (such as Analysis and
Synthesis). This feature is different from incremental compilation, which you can use
to compile parts of your design while preserving results for unchanged parts.
Using Rapid Recompile
The Rapid Recompile feature maximizes designer productivity when making small
engineering change order (ECO)-style design changes after a full compilation,
reducing compilation times by an average of 50%. Rapid Recompile also significantly
improves designer productivity during timing closure by preserving critical timing
during late design changes.
You can use the Rapid Recompile feature on its own or along with standard
incremental flow for compatible nodes in your design. A compatible node is a node
that you can match to a node from previous compilation results. Rapid Recompile
allows the Quartus II software to reuse placement and routing resources of
compatible nodes from previous results with a high degree of confidence.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 12: Reducing Compilation Time
Compilation Time Optimization Techniques
12–5
If you enable the Rapid Recompile feature, you can view the compilation time
reduction after a full compilation. Turn on the Rapid Recompile feature in later
compilations to view further reductions. The Incremental Compilation Preservation
Summary section in the Fitter Report provides details about the placement and
routing preservation for your design.
The performance of Rapid Recompile is largely dependent on the nature of your
design change. If the Quartus II software determines that full optimization is
necessary for design performance, you may not see much compilation time reduction.
For example, if the total time taken by the Fitter is dominated by the time taken for
fitter preparation operations, using this feature may not save you a lot of compilation
time. When you apply extensive global optimizations, a small user change may be
required to obtain optimal performance. Be sure to select the right flow to achieve
your end goals.
1
If you see the message Fitter has failed to locate previous placement
information during the compilation of your design, Rapid Recompile does not
provide any compile time reduction.
h For more information about this feature, refer to Incremental Compilation Page (Settings
Dialog Box) in Quartus II Help.
Reducing Synthesis Time and Synthesis Netlist Optimization Time
You can reduce synthesis time by reducing your use of netlist optimizations and by
using incremental compilation (with Netlist Type set to Post-Synthesis) without
affecting the Fitter time. For tips for reducing synthesis time when using third-party
EDA synthesis tools, refer to your synthesis software’s documentation.
Settings to Reduce Synthesis Time and Synthesis Netlist Optimization Time
You can use Quartus II integrated synthesis to synthesize and optimize HDL designs,
and you can use synthesis netlist optimizations to optimize netlists that were
synthesized by third-party EDA software. When using Quartus II Integrated
Synthesis, you can also enable specific Physical Synthesis Optimizations during
Analysis and Synthesis. Using these netlist optimizations can cause the Analysis and
Synthesis module to take much longer to run. Read the Analysis and Synthesis
messages to find out how much time these optimizations take. The compilation time
spent in Analysis and Synthesis is usually small compared to the compilation time
spent in the Fitter.
If your design meets your performance requirements without synthesis netlist
optimizations, turn off the optimizations to save time. If you require synthesis netlist
optimizations to meet performance, you can optimize parts of your design hierarchy
separately to reduce the overall time spent in Analysis and Synthesis.
Turn off settings that are not useful. In general, if you carry over compilation settings
from a previous project, evaluate all settings and keep only those that you need.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
12–6
Chapter 12: Reducing Compilation Time
Compilation Time Optimization Techniques
Use Appropriate Coding Style to Reduce Synthesis Time
The method that you use to code your design in HDL can affect the synthesis time.
For example, if you want to infer RAM blocks from your code, you must follow the
guidelines for inferring RAMs. If not, the software implements those blocks as
registers, and if you are trying to infer a large memory, the software uses a large
amount of resources in the FPGA, causing routing congestion and increases
compilation time drastically. If you see high routing utilizations in certain blocks, it is
a good idea to review the code for such blocks.
f For more information about coding guidelines, refer to the Recommended HDL Coding
Styles chapter in volume 1 of the Quartus II Handbook.
Using Early Timing Estimation
The Quartus II software provides an Early Timing Estimation feature that estimates
your design’s timing results before the software performs full placement and routing.
On the Processing menu, point to Start, and click Start Early Timing Estimate to
generate initial compilation results after you have run Analysis and Synthesis. When
you want a quick estimate of a design’s performance before proceeding with further
design or synthesis tasks, this command can save significant compilation time. Using
this feature provides a timing estimate 2.5× faster (on average) than running a full
compilation (8.5× faster in best case), although the fit is not fully optimized or routed.
Therefore, the timing report is only an estimate. On average, the estimated delays are
within 15% of the final timing results as achieved by a full compilation.
You can specify the type of delay estimates to use with Early Timing Estimation. On
the Assignments menu, click Settings. In the Category list, select Compilation
Process Settings, and select Early Timing Estimate. On the Early Timing Estimate
page, the following options are available:
■
The Realistic option, which is the default, generates delay estimates that are
similar to the results of a full compilation.
■
The Optimistic option uses delay estimates that are likely lower than those
achieved by a full compilation, which results in an optimistic performance
estimate.
■
The Pessimistic option uses delay estimates that are likely higher than those
achieved by a full compilation, which results in a pessimistic performance
estimate.
All three options offer the same reduction in compilation time.
You can view the critical paths in your design by locating these paths in the Chip
Planner from the TimeQuest Timing Report panel. Then, if necessary, you can add or
modify floorplan constraints such as LogicLock regions, or make other changes to the
design. You can then rerun the Early Timing Estimate to quickly assess the impact of
any floorplan assignments or logic changes, enabling you to try different design
variations and find the best solution.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 12: Reducing Compilation Time
Compilation Time Optimization Techniques
12–7
Reducing Placement Time
The time required to place a design depends on two factors: the number of ways the
logic in your design can be placed in the device and the settings that control how hard
the Placer works to find a good placement. You can reduce the placement time in two
ways:
■
Change the settings for the placement algorithm
■
Use incremental compilation to preserve the placement for parts of your design
Sometimes there is a trade-off between placement time and routing time. Routing
time can increase if the placer does not run long enough to find a good placement.
When you reduce placement time, make sure that it does not increase routing time
and negate the overall time reduction.
Fitter Effort Setting
The highest Fitter effort setting, Standard Fit, requires the most runtime, but does not
always yield a better result than using the default Auto Fit. For designs with very
tight timing requirements, both Auto Fit and Standard Fit use the maximum effort
during optimization. Altera recommends using Auto Fit for reducing compilation
time. If you are certain that your design has only easy-to-meet timing constraints, you
can select Fast Fit for an even greater runtime savings.
Placement Effort Multiplier Settings
You can control the amount of time the Fitter spends in placement by reducing one
aspect of placement effort with the Placement Effort Multiplier option. On the
Assignments menu, click Settings. Select Fitter Settings, and click More Settings.
Under Existing Option Settings, select Placement Effort Multiplier. The default is
1.0. Legal values must be greater than 0 and can be non-integer values. Numbers
between 0 and 1 can reduce fitting time, but also can reduce placement quality and
design performance. Numbers higher than 1 increase placement time and placement
quality, but can reduce routing time for designs with routing congestion. For example,
a value of 4 increases placement time by approximately 2 to 4 times, but might result
in better placement, which can result in reduced routing time.
Final Placement Optimization Levels
The Final Placement Optimization Level option specifies whether the Fitter performs
final placement optimizations. You can set this option to Always, Never, or
Automatically. Performing optimizations can improve register-to-register timing and
fitting, but might require longer compilation times. You can use the default setting of
Automatically with the Auto Fit Fitter Effort Level (also the default) to enable the
Fitter to decide whether these optimizations should run based on the routability and
timing requirements of your design.
Setting the Final Placement Optimization Level to Never often reduces your
compilation time, but affects routability negatively and reduces timing performance.
To change the Final Placement Optimization Level, on the Assignments menu, click
Settings. The Settings dialog box appears. From the Category list, select Fitter
Settings, and then click the More Settings button. Select Final Placement
Optimization Level, and then from the list, select the required setting.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
12–8
Chapter 12: Reducing Compilation Time
Compilation Time Optimization Techniques
Physical Synthesis Effort Settings
You can use the physical synthesis options to optimize your post-synthesis netlist and
improve your timing performance. These options, which affect placement, can
significantly increase compilation time.
If your design meets your performance requirements without physical synthesis
options, turn them off to save time. You also can use the Physical synthesis effort
setting on the Physical Synthesis Optimizations page under Compilation Process
Settings in the Category list to reduce the amount of extra compilation time that these
optimizations use. The Fast setting directs the Quartus II software to use a lower level
of physical synthesis optimization that, compared to the Normal physical synthesis
effort level, can cause a smaller increase in compilation time. However, the lower level
of optimization can result in a smaller increase in design performance.
Limit to One Fitting Attempt
This option causes the software to quit after one fitting attempt, instead of repeating
placement and routing with increased effort. For hard-to-fit designs, consider
increasing the Placement Effort Multiplier setting and the Limit to One Fitting
Attempt setting. Increasing the Placement Effort Multiplier and the Limit to One
Fitting Attempt settings saves you time, because if your design is hard to fit and does
not result in a valid fit, the compilation stops after the first attempt.
From the Assignments menu, select Settings. On the Fitter Settings page, turn on
Limit to one fitting attempt.
f For more details about this option, refer to “Limit to One Fitting Attempt” in the Area
and Timing Optimization chapter in volume 2 of the Quartus II Handbook.
Preserving Placement with Incremental Compilation
Preserving information about previous placements can make future placements faster.
The incremental compilation feature provides an easy-to-use methodology for
preserving placement results. For more information, refer to “Using Incremental
Compilation” on page 12–3.
Reducing Routing Time
The time required to route a design depends on three factors: the device architecture,
the placement of your design in the device, and the connectivity between different
parts of your design. The routing time is usually not a significant amount of the
compilation time. If your design requires a long time to route, perform one or more of
the following actions:
■
Check for routing congestion
■
Let the placer run longer to find a more routable placement
■
Use incremental compilation to preserve routing information for parts of your
design
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 12: Reducing Compilation Time
Compilation Time Optimization Techniques
12–9
Identifying Routing Congestion in the Chip Planner
To identify areas of routing congestion in your design, open the Chip Planner. On the
Tools menu, click Chip Planner. To view the routing congestion in the Chip Planner,
click the Layers icon located next to the Task menu. Under Background Color Map,
select Routing Utilization. Even if average congestion is not very high, your design
may have areas where congestion is very high in a specific type of routing. You can
use the Chip Planner to identify areas of high congestion for specific interconnect
types. You can change the connections in your design to reduce routing congestion. If
the area with routing congestion is in a LogicLock region or between LogicLock
regions, change or remove the LogicLock regions and recompile your design. If the
routing time remains the same, the time is a characteristic of your design and the
placement. If the routing time decreases, consider changing the size, location, or
contents of LogicLock regions to reduce congestion and decrease routing time.
Sometimes, routing congestion may be a result of the HDL coding style used in your
design. After you identity congested areas using the Chip Planner, review the HDL
code for the blocks placed in those areas to determine whether you can reduce
interconnect usage by code changes.
The Quartus II compilation messages contain information about average and peak
interconnect usage. Peak interconnect usage over 75%, or average interconnect usage
over 60%, could be an indication that it might be difficult to fit your design. Similarly,
peak interconnect usage over 90%, or average interconnect usage over 75%, are likely
to have increased chances of not getting a valid fit.
f For information about identifying areas of congested routing using the Chip Planner,
refer to the “Viewing Routing Congestion” subsection in the Analyzing and Optimizing
the Design Floorplan chapter in volume 2 of the Quartus II Handbook.
Placement Effort Multiplier Setting
Some designs might be time consuming and difficult to route because the placement
is not optimal. In such cases, you can increase the Placement Effort Multiplier to get a
better placement. Increasing the Placement Effort Multiplier might increase the
placement time, but sometimes it can reduce the routing time, and even overall
compilation time.
Preserving Routing with Incremental Compilation
Preserving the previous routing results for part of your design can reduce future
routing time. Incremental compilation provides an easy-to-use methodology that
preserves placement and routing results. For more information, refer to “Using
Incremental Compilation” on page 12–3 and the references listed in the section.
Reducing Static Timing Analysis Time
If you are performing timing-driven synthesis, the Quartus II software runs the
TimeQuest analyzer during Analysis and Synthesis. The Quartus II Fitter also runs
the TimeQuest analyzer during placement and routing. If there are incorrect
constraints in the .sdc file, the Quartus II software may spend time processing
constraints unnecessarily several times. If you do not specify false paths and
multicycle paths in your design, the TimeQuest analyzer may spend time analyzing
paths that are not relevant to your design. Also, if you redefine constraints in the .sdc
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
12–10
Chapter 12: Reducing Compilation Time
Conclusion
files, the TimeQuest analyzer may spend additional time processing them. In the
compilation messages, look for indications that Synopsis design constraints are
redefined, and update the .sdc file to avoid this situation. Also, ensure that you
provide the correct timing constraints to your design, because the software cannot
assume design intent, such as which paths to consider as false paths or multicycle
paths. When you specify these assignments correctly, the TimeQuest analyzer skips
analysis for those paths, and the Fitter does not spend additional time optimizing
those paths.
Setting Process Priority
It might be necessary to reduce the computing resources allocated to the compilation
at the expense of increased compilation time. It can be convenient to reduce the
resource allocation to the compilation with single processor machines if you must run
other tasks at the same time.
h For more information about setting process priority, refer to Processing Page (Options
Dialog Box) in Quartus II Help.
Conclusion
The Quartus II software provides many features to reduce compilation time and
achieve optimal results. Using the recommended techniques described in this chapter
can help you reduce compilation time.
Document Revision History
Table 12–1 shows the revision history for this chapter.
Table 12–1. Document Revision History
Date
Version
May 2011
11.0.0
December 2010
July 2010
10.1.0
10.0.0
Changes
■
Updated “Using Parallel Compilation with Multiple Processors” on page 12–2.
■
Updated “Identifying Routing Congestion in the Chip Planner” on page 12–9.
■
General editorial changes throughout the chapter.
■
Template update.
■
Added details about peak and average interconnect usage.
■
Added new section “Reducing Static Timing Analysis Time” on page 12–9.
■
Minor changes throughout chapter.
Initial release.
f For previous versions of the Quartus II Handbook, refer to the Quartus II Handbook
Archive.
f Take an online survey to provide feedback about this handbook chapter.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
13. Area and Timing Optimization
May 2011
QII52005-11.0.0
QII52005-11.0.0
This chapter describes techniques to reduce resource usage and improve timing
performance when designing for Altera® devices.
Good optimization techniques are essential for achieving the best results when
designing for programmable logic devices (PLDs). The optimization features
available in the Quartus® II software allow you to meet design requirements by
applying these techniques at multiple points in the design process.
This chapter also explains how and when to use some of the features described in
other chapters of the Quartus II Handbook.
This chapter includes the following topics:
■
“Optimizing Your Design”
■
“Design Analysis” on page 13–9
■
“Resource Utilization Optimization Techniques (LUT-Based Devices)” on
page 13–15
■
“Timing Optimization Techniques (LUT-Based Devices)” on page 13–26
■
“Resource Utilization Optimization Techniques (Macrocell-Based CPLDs)” on
page 13–42
■
“Timing Optimization Techniques (Macrocell-Based CPLDs)” on page 13–48
■
“Scripting Support” on page 13–53
The application of these techniques varies from design to design. Applying each
technique does not always improve results. Settings and options in the Quartus II
software have default values that generally provide the best trade-off between
compilation time, resource utilization, and timing performance. You can adjust these
settings to determine whether other settings provide better results for your design.
You can use the optimization flow described in this chapter to explore various
compiler settings and determine the techniques that provide the best results.
Optimizing Your Design
The first stage in the optimization process is to perform an initial compilation of your
design. “Initial Compilation: Required Settings” on page 13–2 provides guidelines for
some of the settings and assignments that are recommended for your initial
compilation.“Initial Compilation: Optional Fitter Settings” on page 13–5 describes
settings that you might turn on based on your design requirements. “Design
Analysis” on page 13–9 explains how to analyze the compilation results.
1
You can use incremental compilation in the optimization process. Incremental
compilation can preserve timing to aid in timing closure, as well as compilation time
reduction; however, it can cause a slight increase in resource utilization.
© 2011Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off.
and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at
www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but
reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any
information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device
specifications before relying on any published information and before placing orders for products or services.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011
Subscribe
13–2
Chapter 13: Area and Timing Optimization
Optimizing Your Design
f For more details about Quartus II incremental compilation flow, refer to the Quartus II
Incremental Compilation for Hierarchical and Team-Based Design chapter in volume 1 of
the Quartus II Handbook.
h To view information about timing analysis results, refer to Viewing Timing Analysis
Results (TimeQuest Timing Analyzer) in Quartus II Help.
After you have analyzed the results from an initial compilation, perform the
optimization stages in the recommended order, as described in this chapter.
For LUT-based devices (FPGAs, MAX® II series devices), perform optimizations in
the following order:
1. If your design does not fit, refer to “Resource Utilization Optimization Techniques
(LUT-Based Devices)” on page 13–15 before trying to optimize I/O timing or
register-to-register timing.
2. If your design does not meet the required I/O timing performance, refer to “I/O
Timing Optimization Techniques (LUT-Based Devices)” on page 13–55 before
trying to optimize register-to-register timing.
3. If your design does not meet the required slack on any of the clock domains in the
design, refer to “Register-to-Register Timing Optimization Techniques (LUT-Based
Devices)” on page 13–55.
For macrocell-based devices (MAX 7000 and MAX 3000 CPLDs), perform
optimizations in the following order:
1. If your design does not fit, refer to“Resource Utilization Optimization Techniques
(Macrocell-Based CPLDs)” on page 13–42 before trying to optimize I/O timing or
register-to-register timing.
2. If your timing performance requirements are not met, refer to “Timing
Optimization Techniques (Macrocell-Based CPLDs)” on page 13–48.
f For device-independent techniques to reduce compilation time, refer to the
“Compilation-Time Optimization Techniques” section in the Reducing Compilation
Time chapter in volume 2 of the Quartus II Handbook.
You can use these techniques in the GUI or with Tcl commands. For more information
about scripting techniques, refer to “Scripting Support” on page 13–53.
Initial Compilation: Required Settings
This section describes the basic assignments and settings for your initial compilation.
Check the following settings before compiling the design in the Quartus II software.
Significantly varied compilation results can occur depending on the assignments you
set.
Verify the following settings:
■
“Device Settings” on page 13–3
■
“I/O Assignments”
■
“Timing Requirement Settings”
■
“Device Migration Settings” on page 13–5
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Optimizing Your Design
■
13–3
“Partitions and Floorplan Assignments for Incremental Compilation” on
page 13–5
Device Settings
Specific device assignments determine the timing model that the Quartus II software
uses during compilation. Choose the correct speed grade to obtain accurate results
and the best optimization. The device size and the package determine the device
pin-out and the number of resources available in the device.
I/O Assignments
The I/O standards and drive strengths specified for a design affect I/O timing.
Specify I/O assignments so that the Quartus II software uses accurate I/O timing
delays in timing analysis and Fitter optimizations.
The Quartus II software can select pin locations automatically. If your pin locations
are not fixed due to PCB layout requirements, leave pin locations unconstrained. If
your pin locations are already fixed, make pin assignments to constrain the
compilation appropriately. “Resource Utilization Optimization Techniques
(Macrocell-Based CPLDs)” on page 13–42 includes recommendations for making pin
assignments that can have a large effect on your results in smaller macrocell-based
architectures.
Use the Assignment Editor and Pin Planner to assign I/O standards and pin locations.
f For more information about I/O standards and pin constraints, refer to the
appropriate device handbook. For information about planning and checking I/O
assignments, refer to the I/O Management chapter in volume 2 of the Quartus II
Handbook.
h For information about using the Assignment Editor, refer to About the Assignment
Editor in Quartus II Help.
Timing Requirement Settings
You must use comprehensive timing requirement settings to achieve the best results
for the following reasons:
■
Correct timing assignments allow the software to work hardest to optimize the
performance of the timing-critical parts of the design and make trade-offs for
performance. This optimization can also save area or power utilization in
non-critical parts of the design.
■
The Quartus II software performs physical synthesis optimizations based on
timing requirements (refer to “Physical Synthesis Optimizations” on page 13–35
for more information).
■
Depending on the Fitter Effort setting, the Quartus II Fitter can reduce runtime
considerably if your timing requirements are being met.
For a description of the different effort levels, refer to “Fitter Effort Setting” on
page 13–7.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–4
Chapter 13: Area and Timing Optimization
Optimizing Your Design
Use your real requirements to get the best results. If you apply more demanding
timing requirements than you actually need, increased resource usage, higher power
utilization, increased compilation time, or all of these may result.
The Quartus II TimeQuest Timing Analyzer checks your design against the timing
constraints. The Compilation Report and timing analysis reporting commands show
whether timing requirements are met and provide detailed timing information about
paths that violate timing requirements.
To create timing constraints for the TimeQuest analyzer, create a Synopsys Design
Constraints File (.sdc). You can also enter constraints in the TimeQuest GUI. Use the
write_sdc command, or, on the Constraints menu in the TimeQuest analyzer, click
Write SDC File to write your constraints to an .sdc. You can add an .sdc to your
project on the Quartus II Settings page under Timing Analysis Settings.
1
If you already have an .sdc in your project, using the write_sdc command from the
command line or using the Write SDC File option from the TimeQuest GUI enables
you to create a new .sdc, combining the constraints from your current .sdc and any
new constraints added through the GUI or command window, or overwriting the
existing .sdc with your newly applied constraints.
Ensure that every clock signal has an accurate clock setting constraint. If clocks arrive
from a common oscillator, they can be considered related. Ensure that all related or
derived clocks are set up correctly in the constraints. All I/O pins that require I/O
timing optimization must be constrained. Specify both minimum and maximum
timing constraints as applicable. If there is more than one clock or there are different
I/O requirements for different pins, make multiple clock settings and individual I/O
assignments instead of using a global constraint.
Make any complex timing assignments required in the design, including false path
and multicycle path assignments. Common situations for these types of assignments
include reset or static control signals, cases in which it is not important how long it
takes a signal to reach a destination, and paths that can operate in more than one clock
cycle. These assignments allow the Quartus II software to make appropriate trade-offs
between timing paths and can enable the Compiler to improve timing performance in
other parts of the design.
f For more information about timing assignments and timing analysis, refer to The
Quartus II TimeQuest Timing Analyzer chapter in volume 3 of the Quartus II Handbook
and the Quartus II TimeQuest Timing Analyzer Cookbook.
1
To ensure that constraints or assignments have been applied to all design nodes, you
can report all unconstrained paths in your design.
While using the Quartus II TimeQuest analyzer, you can report all the unconstrained
paths in your design with the Report Unconstrained Paths command in the Task
pane or the report_ucp Tcl command.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Optimizing Your Design
13–5
Device Migration Settings
If you anticipate a change to the target device later in the design cycle, either because
of changes in the design or other considerations, plan for it at the beginning of your
design cycle. Whenever you select a target device, you can also list any other
compatible devices you can migrate to by clicking on the Migration Devices button in
the Device dialog box. If you plan to move your design to a HardCopy® device, make
sure to select the device from the HardCopy list under Companion device in the
Device dialog box.
Selecting the migration device and companion device early in the design cycle helps
to minimize changes to the design at a later stage.
Partitions and Floorplan Assignments for Incremental Compilation
The Quartus II incremental compilation feature enables hierarchical and team-based
design flows in which you can compile parts of your design while other parts of the
design remain unchanged, or import parts of your design from separate Quartus II
projects.
Using incremental compilation for your design with good design partitioning
methodology can help to achieve timing closure. Creating design partitions on some
of the major blocks in your design, and assigning them to not too restrictive
LogicLock™ regions generally reduces Fitter time, and improves the quality and
repeatability of results. Using incremental compilation can help you achieve timing
closure block by block, and preserve the timing performance between iterations,
which helps achieve timing closure for the entire design. Using incremental
compilation may also help reduce compilation times.
f For more information, refer to the “Incremental Compilation” section in the Reducing
Compilation Time chapter in volume 2 of the Quartus II Handbook.
If you want to take advantage of incremental compilation for a team-based design
flow to reduce your compilation times, or to improve the timing performance of your
design during iterative compilation runs, make meaningful design partitions and
create a floorplan for your design partitions.
1
If you plan to use incremental compilation, you must create a floorplan for your
design. If you are not using incremental compilation, this step is optional.
f For guidelines about how to create partition and floorplan assignments for your
design, refer to the Best Practices for Incremental Compilation Partitions and Floorplan
Assignments chapter in volume 1 of the Quartus II Handbook.
Initial Compilation: Optional Fitter Settings
This section describes optional Fitter settings that can help optimize your design. You
can selectively set all the optional settings that help to improve performance. These
settings vary between designs and there is no standard set that applies to all designs.
Significantly different compilation results can occur depending on the assignments
you have set.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–6
Chapter 13: Area and Timing Optimization
Optimizing Your Design
The following settings are optional:
■
“Optimize Hold Timing”
■
“Optimize Multi-Corner Timing” on page 13–7
■
“Fitter Effort Setting” on page 13–7
■
“Limit to One Fitting Attempt” on page 13–9
■
“Design Assistant” on page 13–9
To turn on these settings, follow these steps:
1. On the Assignments menu, click Settings.
2. In the Category list, select Fitter Settings. The Fitter Settings page appears.
3. Turn on the appropriate options.
Optimize Hold Timing
The Optimize Hold Timing option directs the Quartus II software to optimize
minimum delay timing constraints. This option is available for all Altera device
families except MAX 3000 and MAX 7000 series devices. By default, the Quartus II
software optimizes hold timing for all paths for designs using devices newer than
Arria GX, Stratix III, and Cyclone III. By default, the Quartus II software optimizes
hold timing only for I/O paths and minimum TPD paths for older devices.
When you turn on Optimize Hold Timing, the Quartus II software adds delay to
paths to guarantee that the minimum delay requirements are satisfied. In the Fitter
Settings pane, if you select I/O Paths and Minimum TPD Paths (the default choice
for older devices such as Cyclone II and Stratix II devices if you turn on Optimize
Hold Timing), the Fitter works to meet the following criteria:
■
Hold times (tH) from device input pins to registers
■
Minimum delays from I/O pins to I/O registers or from I/O registers to I/O pins
■
Minimum clock-to-out time (tCO) from registers to output pins
If you select All Paths, the Fitter also works to meet hold requirements from registers
to registers, as in Figure 13–1, where a derived clock generated with logic causes a
hold time problem on another register. However, if your design has internal hold time
violations between registers, correct the problems by making changes to your design,
such as using a clock enable signal instead of a derived or gated clock.
Figure 13–1. Optimize Hold Timing Option Fixing an Internal Hold Time Violation
f For design practices that can help eliminate internal hold time violations, refer to the
Recommended Design Practices chapter in volume 1 of the Quartus II Handbook.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Optimizing Your Design
13–7
Optimize Multi-Corner Timing
Historically, FPGA timing analysis has been performed using only delays from the
slow corner timing model. However, due to process variation and changes in the
operating conditions, delays on some paths can be significantly smaller than those in
the slow corner timing model. This can result in hold time violations on those paths,
and in rare cases, additional setup time violations.
Also, because of the small process geometries of the Cyclone III, Stratix III, and newer
device families, the slowest circuit performance of designs targeting these devices
does not necessarily occur at the highest operating temperature. The temperature at
which the circuit is slowest depends on the selected device, the design, and
compilation results. Therefore, the Quartus II software provides the Cyclone III series,
Stratix III, and newer device families with three different timing corners—Slow 85°C
corner, Slow 0°C corner, and Fast 0°C corner. For other device families, two timing
corners are available—Fast 0° C and Slow 85° C corner.
By default, the Fitter optimizes constraints using only the slow corner timing model.
You can turn on the Optimize multi-corner timing option to instruct the Fitter to also
optimize constraints considering all available timing corners, at the cost of a slight
increase in runtime. By optimizing for all timing corners, you can create a design
implementation that is more robust across process, temperature, and voltage
variations. While optimizing for multi-corner timing, the Fitter chooses one of the two
slow corners that is known to have more critical timing (depending on the chosen
device), along with the fast corner. This option is available only for Arria, Cyclone,
HardCopy, MAX II, MAX V, and Stratix series devices.
Using the different timing models can be important to account for process, voltage,
and temperature variations for each device. Turning this option on increases
compilation time by approximately 10%.
For designs with external memory interfaces such as DDR and QDR, Altera
recommends that you turn on the Optimize multi-corner timing setting.
Fitter Effort Setting
Fitter effort refers to the amount of effort the Quartus II software uses to fit your
design. To set the Fitter effort, on the Assignments menu, click Settings. In the
Category list, select Fitter Settings. The Fitter effort settings are Auto Fit, Standard
Fit, and Fast Fit. The default setting depends on the device family specified. Auto Fit
is the default Fitter effort setting for all devices for which this option is available.
Auto Fit
The Auto Fit option (available for Arria, Cyclone, HardCopy, MAX II, MAX V, and
Stratix series devices) focuses the full Fitter effort only on those aspects of the design
that require further optimization. Auto Fit can significantly reduce compilation time
relative to Standard Fit if your design has easy-to-meet timing requirements, low
routing resource utilization, or both. However, those designs that require full
optimization generally receive the same effort as is achieved by selecting Standard
Fit.
If you want the Fitter to attempt to exceed the timing requirements by a certain
margin instead of simply meeting them, specify a minimum slack in the Desired
worst case slack box.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–8
Chapter 13: Area and Timing Optimization
Optimizing Your Design
1
Specifying a minimum slack does not guarantee that the Fitter achieves the slack
requirement; it only guarantees that the Fitter applies full optimization unless the
target slack is exceeded.
In some designs with multiple clocks, it might be possible to improve the timing
performance on one clock domain while reducing the performance on other clock
domains by over-constraining the most important clock. If you use this technique,
perform a sweep over multiple seeds to ensure that any performance improvements
that you see are real gains. For more information, refer to “Fitter Seed” on page 13–39.
Over-constraining the clock for which you require maximum slack, while using the
Auto Fit option, increases the chances that the Fitter is able to meet this requirement.
The Auto Fit option also causes the Quartus II Fitter to optimize for shorter
compilation times instead of maximum possible performance if the design includes
easy to achieve timing requirements.
If your design has aggressive timing requirements or is hard to route, the placement
does not stop early and the compilation time is the same as using the Standard Fit
option.
It is possible for the Auto Fit option to increase routing utilization. This can lead to an
increase in dynamic power when compared to using the Standard Fit option, unless
the Extra effort option in the PowerPlay power optimization list is also enabled.
When you turn on Extra effort, Auto Fit continues to optimize for reduction of
routing usage even after meeting the register-to-register requirement, and there is no
adverse effect on the dynamic power consumption relative to using Standard Fit. If
dynamic power consumption is a concern, select Extra effort in both the Analysis &
Synthesis Settings and the Fitter Settings pages.
f For more details, refer to the “Power Driven Compilation” section in the Power
Optimization chapter in volume 2 of the Quartus II Handbook.
Standard Fit
Use the Standard Fit option to exceed specified timing requirements and achieve the
best possible timing results and lowest routing resource utilization for your design.
The Standard Fit setting usually increases compilation time relative to Auto Fit,
because it applies full optimization, regardless of the design requirement. In designs
with no timing assignments, on average, using the Standard Fit option results in a
fMAX about 10% higher than that achieved using the Auto Fit option. In designs where
timing requirements can be easily met, using the Standard Fit option can result in
considerably longer compilation times than using the Auto Fit option.
Fast Fit
The Fast Fit option reduces the amount of optimization effort for each algorithm
employed during fitting. This option reduces the compilation time by about 50%,
resulting in a fit that has, on average, 10% lower fMAX than that achieved using the
Standard Fit setting.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Design Assistant
13–9
Limit to One Fitting Attempt
A design might fail to fit for several reasons, such as logic overuse or illegal
assignments. For most failures, the Quartus II software informs you of the problem.
However, if the design uses too much routing, the Quartus II software makes up to
two additional attempts to fit your design, increasing the Placement Effort Multiplier
each time. Each of these fit attempts takes significantly longer than the previous
attempt.
For large designs, you might not want to wait for all three fitting attempts to be
completed. To have the Quartus II software issue an error message after the first failed
attempt, turn on Limit to one fitting attempt on the Fitter Settings page.
For instructions about how to lower the design’s routing utilization, so your design
can be made to fit into the target device if it fails to fit due to the lack of routing
resources, refer to “Routing” on page 13–23.
Design Assistant
You can run the Design Assistant to analyze the post-fitting results of your design
during a full compilation. The Design Assistant checks rules related to gated clocks,
reset signals, asynchronous design practices, and signal race conditions. This is
especially useful during the early stages of your design, so that you can work on any
areas of concern in your design before proceeding with design optimization.
h For more information about the Design Assistant, refer to About the Design Assistant
and Analyzing Designs with the Design Assistant in Quartus II Help.
Design Analysis
The initial compilation establishes whether the design achieves a successful fit and
meets the specified timing requirements. This section describes how to analyze your
design results in the Quartus II software.
Error and Warning Messages
After compiling your design, evaluate all error and warning messages to see if any
design or setting changes are required. If changes are required, make these changes
and recompile the design before proceeding with design optimization.
To suppress messages that you have already evaluated and do not want to see again,
right-click on the message in the Messages window and click Suppress.
f For more information about message suppression, refer to the “Message Suppression”
section in the Managing Quartus II Projects chapter in volume 2 of the Quartus II
Handbook.
Ignored Timing Constraints
The Quartus II software ignores illegal, obsolete, and conflicting constraints.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–10
Chapter 13: Area and Timing Optimization
Design Analysis
You can view a list of ignored constraints by clicking Report Ignored Constraints in
the Reports menu in the TimeQuest GUI or by typing the following command to
generate a list of ignored timing constraints:
report_sdc -ignored -panel_name "Ignored Constraints" r
If any constraints were ignored, analyze why they were ignored. If necessary, correct
the constraints and recompile the design before proceeding with design optimization.
f For more information about the report_sdc command and its options, refer to the
Quartus II TimeQuest Timing Analyzer chapter in volume 3 of the Quartus II Handbook.
Resource Utilization
Determining device utilization is important regardless of whether a successful fit is
achieved. If your compilation results in a no-fit error, resource utilization information
is important for analyzing the fitting problems in your design. If your fitting is
successful, review the resource utilization information to determine whether the
future addition of extra logic or other design changes might introduce fitting
difficulties. Also, review the resource utilization information to determine if it is
impacting timing performance.
To determine resource usage, refer to the Flow Summary section of the Compilation
Report. This section reports how many resources are used, including pins, memory
bits, digital signal processing, and phase-locked loops (PLLs). The Flow Summary
indicates whether the design exceeds the available device resources. More detailed
information is available by viewing the reports under Resource Section in the Fitter
section of the Compilation Report.
The Flow Summary shows the overall logic utilization, and also individual utilization
for combinational ALUTS, memory ALUTs, and registers. The overall logic utilization
could be higher than the numbers for combinational logic or register utilization
numbers may indicate. This is because the Fitter uses adaptive look-up tables
(ALUTs) in different ALMs—even when the logic can be placed within one ALM—to
achieve the best timing and routing results. The Fitter can spread logic throughout the
device, which may lead to higher overall utilization.
As the device fills up, the Fitter automatically searches for logic functions with
common inputs to place in one ALM. The number of partnered ALUTs and packed
registers also increases. Therefore, a design that has high overall utilization might still
have space for extra logic if logic and registers can be packed together more
aggressively.
The reports under Resource Section in the Fitter section of the Compilation Report
provide more detailed resource information. The Fitter Resource Usage Summary
report breaks down the logic utilization information, indicates the number of fully
and partially used ALMs, and provides other resource information, including the
number of bits in each type of memory block. This panel also contains a summary of
the usage of global clocks, PLLs, DSP blocks, and other device-specific resources.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Design Analysis
13–11
You can also view reports describing some of the optimizations that occurred during
compilation. For example, if you are using Quartus II integrated synthesis, the reports
in the Optimization Results folder in the Analysis & Synthesis section include
information about registers that were removed during synthesis. Use this report to
estimate device resource utilization for a partial design to ensure that registers were
not removed due to missing connections with other parts of the design.
If a specific resource usage is reported as less than 100% and a successful fit cannot be
achieved, either there are not enough routing resources, or some assignments are
illegal. In either case, a message appears in the Processing tab of the Messages
window describing the problem.
If the Fitter finishes unsuccessfully and runs much faster than on similar designs, a
resource might be over-utilized or there might be an illegal assignment. If the
Quartus II software seems to run for an excessively long time compared to runs on
similar designs, a legal placement or route probably cannot be found. In the
Compilation Report, look for errors and warnings that indicate these types of
problems.
For more information about how to get a quick error message on hard-to-fit designs,
refer to “Limit to One Fitting Attempt” on page 13–9.
You can use the Chip Planner to find areas of the device that have routing congestion
on specific types of routing resources. If you find areas with very high congestion,
analyze the cause of the congestion. Issues such as high fan-out nets not using global
resources, an improperly chosen optimization goal (speed versus area), very
restrictive floorplan assignments, or the coding style can cause routing congestion.
After you identify the cause, modify the source or settings to reduce routing
congestion.
h For information about how to view routing congestion, refer to Displaying Resources
and Information in Quartus II Help.
f For details about using the Chip Planner tool, refer to the Analyzing and Optimizing the
Design Floorplan chapter in volume 2 of the Quartus II Handbook and About the Chip
Planner in Quartus II Help.
I/O Timing (Including tPD)
TimeQuest analyzer supports the Synopsys Design Constraints (SDC) format for
constraining your design. When using the TimeQuest analyzer for timing analysis,
use the set_input_delay constraint to specify the data arrival time at an input port
with respect to a given clock. For output ports, use the set_output_delay command
to specify the data arrival time at an output port’s receiver with respect to a given
clock. You can use the report_timing Tcl command to generate the I/O timing
reports.
The I/O paths that do not meet the required timing performance are reported as
having negative slack and are highlighted in red in the TimeQuest analyzer Report
pane. In cases where you do not apply an explicit I/O timing constraint to an I/O pin,
the Quartus II timing analysis software still reports the Actual number, which is the
timing number that must be met for that timing parameter when the device runs in
your system.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–12
Chapter 13: Area and Timing Optimization
Design Analysis
f For more information about how timing numbers are calculated, refer to the
Quartus II TimeQuest Timing Analyzer chapter in volume 3 of the Quartus II Handbook.
Register-to-Register Timing
This section contains the following sections:
■
“Timing Analysis with the TimeQuest Timing Analyzer”
■
“Tips for Analyzing Failing Paths” on page 13–14
■
“Tips for Analyzing Failing Clock Paths that Cross Clock Domains” on page 13–14
Timing Analysis with the TimeQuest Timing Analyzer
If you are using the TimeQuest analyzer, analyze all valid register-to-register paths by
using appropriate constraints. Use the report_timing command to generate the
required timing reports for any register-to-register path. Your design meets timing
requirements when you do not have negative slack on any register-to-register path on
any of the clock domains.
When you select a path listed in the TimeQuest Report Timing pane, the tabs in the
corresponding path detail pane show a path summary of source and destination
registers and their timing, statistics about the path delay, detailed information about
the complete data path with all nodes in the path and the waveforms of the relevant
signals (Figure 13–2). To locate a selected path in the Chip Planner or the Technology
Map Viewer by using the shortcut menu, right-click on a path, point to Locate, and
click Locate in Chip Planner. The Chip Planner appears with the path highlighted.
Similarly, if you know that a path is not a valid path, you can set it to be a false path
using the shortcut menu.
To see the path details of any selected path, click on the Data Path tab in the path
details pane. This displays the details of the Data Arrival Path, as well as the Data
Required Path. For a graphical view of the information, click on the Waveform tab.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Design Analysis
13–13
You can locate critical paths in the Chip Planner from the TimeQuest timing analysis
report panel.
Figure 13–2. TimeQuest Analyzer GUI
f For more information about how timing analysis results are calculated, refer to the
Quartus II TimeQuest Timing Analyzer chapter in volume 3 of the Quartus II Handbook.
You also can see the logic in a particular path by locating the logic in the RTL Viewer
or Technology Map Viewer. These viewers allow you to see a gate-level or
technology-mapped representation of your design netlist. To locate a timing path in
one of the viewers, right-click on a path in the report, point to Locate, and click Locate
in RTL Viewer or Locate in Technology Map Viewer. When you locate a timing path
in the Technology Map Viewer, the annotated schematic displays the same delay
information that is shown when you use the List Paths command.
f For more information about netlist viewers, refer to the Analyzing Designs with
Quartus II Netlist Viewers chapter in volume 1 of the Quartus II Handbook.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–14
Chapter 13: Area and Timing Optimization
Design Analysis
Tips for Analyzing Failing Paths
When you are analyzing clock path failures, examine reports and waveforms to
determine if the correct constraints are being applied, and add multicycle or false
paths as appropriate.
Focus on improving the paths that show the worst slack. The Fitter works hardest on
paths with the worst slack. If you fix these paths, the Fitter might be able to improve
the other failing timing paths in the design.
Check for particular nodes that appear in many failing paths. Look for paths that have
common source registers, destination registers, or common intermediate
combinational nodes. In some cases, the registers might not be identical, but are part
of the same bus. In the timing analysis report panels, clicking on the From or To
column headings can be helpful to sort the paths by the source or destination
registers. Clicking first on From, then on To, uses the registers in the To column as the
primary sort and From as the secondary sort. If you see common nodes, these nodes
indicate areas of your design that might be improved through source code changes or
Quartus II optimization settings. Constraining the placement for just one of the paths
might decrease the timing performance for other paths by moving the common node
further away in the device.
Tips for Analyzing Failing Clock Paths that Cross Clock Domains
When analyzing clock path failures, check whether these paths cross between two
clock domains. This is the case if the From Clock and To Clock in the timing analysis
report are different. There can also be paths that involve a different clock in the
middle of the path, even if the source and destination register clock are the same. To
analyze these paths in more detail, right-click on the entry in the report and click List
Paths.
Expand the List Paths entry in the Messages window and analyze the largest
register-to-register requirement. Evaluate the setup relationship between the source
and destination (launch edge and latch edge) to determine if that is reducing the
available setup time. For example, the path can start at a rising edge and end at a
falling edge, which reduces the setup relationship by one half clock cycle.
Check to see if the PLL phase shift is reducing the setup requirement. You might be
able to adjust this using PLL parameters and settings.
Paths that cross clock domains are generally protected with synchronization logic (for
example, FIFOs or double-data synchronization registers) to allow asynchronous
interaction between the two clock domains. In such cases, you can ignore the timing
paths between registers in the two clock domains while running timing analysis, even
if the clocks are related.
The Fitter attempts to optimize all failing timing paths. If there are paths that can be
ignored for optimization and timing analysis, but the paths do not have constraints
that instruct the Fitter to ignore them, the Fitter tries to optimize those paths as well.
In some cases, optimizing unnecessary paths can prevent the Fitter from meeting the
timing requirements on timing paths that are critical to the design. It is beneficial to
specify all paths that can be ignored, so that the Fitter can put more effort into the
paths that must meet their timing requirements instead of optimizing paths that can
be ignored.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
13–15
f For more details about how to ignore timing paths that cross clock domains, refer to
the Quartus II TimeQuest Timing Analyzer chapter in volume 3 of the Quartus II
Handbook.
Evaluate the clock skew between the source clock and the destination clock to
determine if that is reducing the available setup time. You can check the shortest and
longest clock path reports to see what is causing the clock skew. Avoid using
combinational logic in clock paths because it contributes to clock skew. Differences in
the logic or in its routing between the source and destination can cause clock skew
problems and result in warnings during compilation.
Global Routing Resources
Global routing resources are designed to distribute high-fan-out, low-skew signals
(such as clocks) without consuming regular routing resources. Depending on the
device, these resources can span the entire chip, or some smaller portion, such as a
quadrant. The Quartus II software attempts to assign signals to global routing
resources automatically, but you might be able to make more suitable assignments
manually.
f For details about the number and types of global routing resources available, refer to
the relevant device handbook.
Check the global signal utilization in your design to ensure that appropriate signals
have been placed on global routing resources. In the Compilation Report, open the
Fitter report and click the Resource Section. Analyze the Global & Other Fast Signals
and Non-Global High Fan-out Signals reports to determine whether any changes are
required.
You might be able to reduce clock skew for high fan-out signals by placing them on
global routing resources. Conversely, you can reduce the insertion delay of low
fan-out signals by removing them from global routing resources. Doing so can
improve clock enable timing and control signal recovery/removal timing, but
increases clock skew. Use the Global Signal setting in the Assignment Editor to
control global routing resources.
Resource Utilization Optimization Techniques (LUT-Based Devices)
After design analysis, the next stage of design optimization is to improve resource
utilization. Complete this stage before proceeding to I/O timing optimization or
register-to-register timing optimization. Ensure that you have already set the basic
constraints described in“Initial Compilation: Required Settings” on page 13–2 before
proceeding with the resource utilization optimizations discussed in this section. If a
design does not fit into a specified device, use the techniques in this section to achieve
a successful fit. After you optimize resource utilization and your design fits in the
desired target device, optimize I/O timing as described in “I/O Timing Optimization
Techniques (LUT-Based Devices)” on page 13–55. These tips are valid for all FPGA
families and the MAX II family of CPLDs.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–16
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
Using the Resource Optimization Advisor
The Resource Optimization Advisor provides guidance in determining settings that
optimize the resource usage. To run the Resource Optimization Advisor, on the Tools
menu, point to Advisors, and click Resource Optimization Advisor.
The Resource Optimization Advisor provides step-by-step advice about how to
optimize the resource usage (logic element, memory block, DSP block, I/O, and
routing) of your design. Some of the recommendations in these categories might
conflict with each other. Altera recommends evaluating the options and choosing the
settings that best suit your requirements.
Resolving Resource Utilization Issues Summary
Resource utilization issues can be divided into the following three categories:
■
Issues relating to I/O pin utilization or placement, including dedicated I/O blocks
such as PLLs or LVDS transceivers (refer to“I/O Pin Utilization or Placement”).
■
Issues relating to logic utilization or placement, including logic cells containing
registers and look-up tables as well as dedicated logic, such as memory blocks and
DSP blocks (refer to“Logic Utilization or Placement” on page 13–17).
■
Issues relating to routing (refer to “Routing” on page 13–23).
I/O Pin Utilization or Placement
Use the suggestions in the following sections to help you resolve I/O resource
problems.
Use I/O Assignment Analysis
On the Processing menu, point to Start and click Start I/O Assignment Analysis to
help with pin placement. The Start I/O Assignment Analysis command allows you to
check your I/O assignments early in the design process. You can use this command to
check the legality of pin assignments before, during, or after compilation of your
design. If design files are available, you can use this command to accomplish more
thorough legality checks on your design’s I/O pins and surrounding logic. These
checks include proper reference voltage pin usage, valid pin location assignments,
and acceptable mixed I/O standards.
Common issues with I/O placement relate to the fact that differential standards have
specific pin pairings, and certain I/O standards might be supported only on certain
I/O banks.
If your compilation or I/O assignment analysis results in specific errors relating to
I/O pins, follow the recommendations in the error message. Right-click on the
message in the Messages window and click Help to open the Quartus II Help topic for
this message.
Modify Pin Assignments or Choose a Larger Package
If a design that has pin assignments fails to fit, compile the design without the pin
assignments to determine whether a fit is possible for the design in the specified
device and package. You can use this approach if a Quartus II error message indicates
fitting problems due to pin assignments.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
13–17
If the design fits when all pin assignments are ignored or when several pin
assignments are ignored or moved, you might have to modify the pin assignments for
the design or select a larger package.
If the design fails to fit because insufficient I/Os are available, a successful fit can
often be obtained by using a larger device package (which can be the same device
density) that has more available user I/O pins.
f For more information about I/O assignment analysis, refer to the I/O Management
chapter in volume 2 of the Quartus II Handbook.
Logic Utilization or Placement
Use the suggestions in the following subsections to help you resolve logic resource
problems, including logic cells containing registers and lookup tables (LUTs), as well
as dedicated logic such as memory blocks and DSP blocks.
Optimize Source Code
If your design does not fit because of logic utilization, evaluate if you can, and modify
the design at the source to achieve the desired results. You can often improve logic
significantly by making design-specific changes to your source code. This is typically
the most effective technique for improving the quality of your results.
If your design does not fit into available LEs or ALMs, but you have unused memory
or DSP blocks, check to see if you have code blocks in your design that describe
memory or DSP functions that are not being inferred and placed in dedicated logic.
You might be able to modify your source code to allow these functions to be placed
into dedicated memory or DSP resources in the target device.
Ensure that your state machines are recognized as state machine logic and optimized
appropriately in your synthesis tool. State machines that are recognized are generally
optimized better than if the synthesis tool treats them as generic logic. In the
Quartus II software, you can check for the State Machine report under Analysis &
Synthesis in the Compilation Report. This report provides details, including the state
encoding for each state machine that was recognized during compilation. If your state
machine is not being recognized, you might have to change your source code to
enable it to be recognized.
f For coding style guidelines, including examples of HDL code for inferring memory
and DSP functions, refer to the “Instantiating Altera Megafunctions” and the
“Inferring Multiplier and DSP Functions from HDL Code” sections of the
Recommended HDL Coding Styles chapter in volume 1 of the Quartus II Handbook. For
guidelines and sample HDL code for state machines, refer to the “General Coding
Guidelines” section of the Recommended HDL Coding Styles chapter in volume 1 of the
Quartus II Handbook.
f For additional HDL coding examples, refer to AN 584: Timing Closure Methodology for
Advanced FPGAs.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–18
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
Optimize Synthesis for Area, Not Speed
If your design fails to fit because it uses too much logic, resynthesize the design to
improve the area utilization. First, ensure that you have set your device and timing
constraints correctly in your synthesis tool. Particularly when area utilization of the
design is a concern, ensure that you do not over-constrain the timing requirements for
the design. Synthesis tools generally try to meet the specified requirements, which can
result in higher device resource usage if the constraints are too aggressive.
If resource utilization is an important concern, some synthesis tools offer an easy way
to optimize for area instead of speed. If you are using Quartus II integrated synthesis,
select Balanced or Area for the Optimization Technique. You can also specify this
logic option for specific modules in your design with the Assignment Editor in cases
where you want to reduce area using the Area setting (potentially at the expense of
register-to-register timing performance) while leaving the default Optimization
Technique setting at Balanced (for the best trade-off between area and speed for
certain device families) or Speed. You can also use the Speed Optimization
Technique for Clock Domains logic option to specify that all combinational logic in
or between the specified clock domain(s) is optimized for speed.
In some synthesis tools, not specifying an fMAX requirement can result in less resource
utilization.
1
In the Quartus II software, the Balanced setting typically produces utilization results
that are very similar to those produced by the Area setting, with better performance
results. The Area setting can give better results in some cases.
f For information about setting timing requirements and synthesis options in
Quartus II integrated synthesis and other synthesis tools, refer to the appropriate
chapter in Section III. Synthesis in volume 1 of the Quartus II Handbook, or your
synthesis software’s documentation.
The Quartus II software provides additional attributes and options that can help
improve the quality of your synthesis results.
Restructure Multiplexers
Multiplexers form a large portion of the logic utilization in many FPGA designs. By
optimizing your multiplexed logic, you can achieve a more efficient implementation
in your Altera device.
h For more information about this option, refer to Restructure Multiplexers logic option in
Quartus II Help.
f For design guidelines to achieve optimal resource utilization for multiplexer designs,
refer to the Recommended HDL Coding Styles chapter in volume 1 of the Quartus II
Handbook.
Perform WYSIWYG Primitive Resynthesis with Balanced or Area Setting
h For information about this logic option, refer to Perform WYSIWYG Primitive
Resynthesis logic option in Quartus II Help.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
1
13–19
The Balanced setting typically produces utilization results that are very similar to the
Area setting with better performance results. The Area setting can give better results
in some cases. Performing WYSIWYG resynthesis for area in this way typically
reduces register-to-register timing performance.
Use Register Packing
The Auto Packed Registers option implements the functions of two cells into one
logic cell by combining the register of one cell in which only the register is used with
the LUT of another cell in which only the LUT is used. Figure 13–3 shows register
packing and the gain of one logic cell in the design.
Figure 13–3. Register Packing
Registers can also be packed into DSP blocks (Figure 13–4).
Figure 13–4. Register Packing in DSP Blocks
The following list shows the most common cases in which register packing helps to
optimize a design:
May 2011
■
A LUT can be implemented in the same cell as an unrelated register with a single
data input
■
A LUT can be implemented in the same cell as the register that is fed by the LUT
■
A LUT can be implemented in the same cell as the register that feeds the LUT
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–20
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
■
A register can be packed into a RAM block
■
A register can be packed into a DSP block
■
A register can be packed into an I/O Element (IOE)
h For more information, refer to Auto Packed Registers logic option in Quartus Help.
Remove Fitter Constraints
A design with conflicting constraints or constraints that are difficult to meet may not
fit in the targeted device. This can occur when the location or LogicLock assignments
are too strict and not enough routing resources are available on the device.
In this case, use the Routing Congestion task in the Chip Planner to locate routing
problems in the floorplan, then remove any location or LogicLock region assignments
in that area. If your design still does not fit, the design is over-constrained. To correct
the problem, remove all location and LogicLock assignments and run successive
compilations, incrementally constraining the design before each compilation. You can
delete specific location assignments in the Assignment Editor or the Chip Planner. To
remove LogicLock assignments in the Chip Planner, in the LogicLock Regions
Window, or on the Assignments menu, click Remove Assignments. Turn on the
assignment categories you want to remove from the design in the Available
assignment categories list.
f For more information about the Routing Congestion task in the Chip Planner, refer to
Analyzing and Optimizing the Design Floorplan in volume 2 of the Quartus II Handbook.
Change State Machine Encoding
State machines can be encoded using various techniques. Using binary or gray code
encoding typically results in fewer state registers than one-hot encoding, which
requires one register for every state bit. If your design contains state machines,
changing the state machine encoding to one that uses the minimal number of registers
may reduce resource utilization. The effect of state machine encoding varies
depending on the way your design is structured.
If your design does not manually encode the state bits, you can specify the state
machine encoding in your synthesis tool. When using Quartus II integrated synthesis,
turn on the Minimal Bits setting for the State Machine Processing option.
h For more information, refer to State Machine Processing logic option in Quartus II Help.
You can also specify this logic option for specific modules or state machines in your
design with the Assignment Editor.
You can also use the following Tcl command in scripts to modify the state machine
encoding.
set_global_assignment -name state_machine_processing <value>
In this case, <value> can be AUTO, ONE-HOT, MINIMAL BITS, or USER-ENCODE.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
13–21
Flatten the Hierarchy During Synthesis
Synthesis tools typically provide the option of preserving hierarchical boundaries,
which can be useful for verification or other purposes. However, optimizing across
hierarchical boundaries allows the synthesis tool to perform the most logic
minimization, which can reduce area. Therefore, to achieve the best results, flatten
your design hierarchy whenever possible.
If you are using Quartus II incremental compilation, you cannot flatten your design
across design partitions. Incremental compilation always preserves the hierarchical
boundaries between design partitions. Follow Altera’s recommendations for design
partitioning, such as registering partition boundaries to reduce the effect of
cross-boundary optimizations.
f For more information about using incremental compilation and recommendations for
design partitioning, refer to the Quartus II Incremental Compilation for Hierarchical and
Team-Based Design chapter in volume 1 of the Quartus II Handbook.
Retarget Memory Blocks
If your design fails to fit because it runs out of device memory resources, your design
may require a certain type of memory the device does not have. For example, a design
that requires two M-RAM blocks cannot be targeted to a Stratix EP1S10 device, which
has only one M-RAM block. You might be able to obtain a fit by building one of the
memories with a different size memory block, such as an M4K memory block.
If the memory block was created with the MegaWizard™ Plug-In Manager, open the
MegaWizard Plug-In Manager and edit the RAM block type so it targets a new
memory block size.
ROM and RAM memory blocks can also be inferred from your HDL code, and your
synthesis software can place large shift registers into memory blocks by inferring the
ALTSHIFT_TAPS megafunction. This inference can be turned off in your synthesis
tool to cause the memory or shift registers to be placed in logic instead of in memory
blocks. Also, for improved timing performance, you can turn this inference off to
prevent registers from being moved into RAM.
h For more information, refer to Auto RAM Replacement logic option, Auto ROM
Replacement logic option, and Auto Shift Register Replacement logic option in Quartus II
Help.
Depending on your synthesis tool, you can also set the RAM block type for inferred
memory blocks. In Quartus II integrated synthesis, set the ramstyle attribute to the
desired memory type for the inferred RAM blocks, or set the option to logic, to
implement the memory block in standard logic instead of a memory block.
Consider the resource utilization by hierarchy in the report file, and determine
whether there is an unusually high register count in any of the modules. Some coding
styles can prevent the Quartus II software from inferring RAM blocks from the source
code because of their architectural implementation, and forces the software to
implement the logic in flipflops. As an example, a function such as an asynchronous
reset on a register bank might make it incompatible with the RAM blocks in the
device architecture, so that the register bank is implemented in flipflops. It is often
possible to move a large register bank into RAM by slight modification of associated
logic.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–22
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
f For more information about memory inference control in other synthesis tools, refer to
the appropriate chapter in Section III. Synthesis in volume 1 of the Quartus II Handbook,
or your synthesis software’s documentation. For more information about coding
styles and HDL examples that ensure memory inference, refer to the Recommended
HDL Coding Styles chapter in volume 1 of the Quartus II Handbook.
Use Physical Synthesis Options to Reduce Area
The physical synthesis options for fitting can help you decrease the resource usage.
When you enable these settings for physical synthesis for fitting, the Quartus II
software makes placement-specific changes to the netlist that reduce resource
utilization for a specific Altera device.
1
The compilation time might increase considerably when you use physical synthesis
options.
With the Quartus II software, you can apply physical synthesis options to specific
instances, which can reduce the impact on compilation time. Physical synthesis
instance assignments allow you to enable physical synthesis algorithms for specific
portions of their design.
The following physical synthesis optimizations for fitting are available:
■
Physical synthesis for combinational logic
■
Map logic into memory
h For more information, refer to Physical Synthesis Optimizations Page (Settings Dialog
Box) in Quartus II Help.
Retarget or Balance DSP Blocks
A design might not fit because it requires too many DSP blocks. All DSP block
functions can be implemented with logic cells, so you can retarget some of the DSP
blocks to logic to obtain a fit.
If the DSP function was created with the MegaWizard Plug-In Manager, open the
MegaWizard Plug-In Manager and edit the function so it targets logic cells instead of
DSP blocks. The Quartus II software uses the DEDICATED_MULTIPLIER_CIRCUITRY
megafunction parameter to control the implementation.
DSP blocks also can be inferred from your HDL code for multipliers, multiply-adders,
and multiply-accumulators. This inference can be turned off in your synthesis tool.
When you are using Quartus II integrated synthesis, you can disable inference by
turning off the Auto DSP Block Replacement logic option for your entire project. On
the Assignments menu, click Settings. In the Category list, select Analysis &
Synthesis Settings, click More Settings, and turn off Auto DSP Block Replacement.
Alternatively, you can disable the option for a specific block with the Assignment
Editor.
f For more information about disabling DSP block inference in other synthesis tools,
refer to the appropriate chapter in Section III. Synthesis in volume 1 of the Quartus II
Handbook, or your synthesis software’s documentation.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
13–23
The Quartus II software also offers the DSP Block Balancing logic option, which
implements DSP block elements in logic cells or in different DSP block modes. The
default Auto setting allows DSP block balancing to convert the DSP block slices
automatically as appropriate to minimize the area and maximize the speed of the
design. You can use other settings for a specific node or entity, or on a project-wide
basis, to control how the Quartus II software converts DSP functions into logic cells
and DSP blocks. Using any value other than Auto or Off overrides the
DEDICATED_MULTIPLIER_CIRCUITRY parameter used in megafunction variations.
h For more details about the Quartus II logic options described in this section, refer to
Auto DSP Block Replacement and DSP Block Balancing in Quartus II Help.
Use a Larger Device
If a successful fit cannot be achieved because of a shortage of LEs or ALMs, memory,
or DSP blocks, you might require a larger device.
Routing
Use the suggestions in the following subsections to help you resolve routing resource
problems.
Set Auto Packed Registers to Sparse or Sparse Auto
This option is useful for reducing LE or ALM count in a design. This option is
available for all Altera devices supported by the Quartus II software.
This option can be set in the Assignment Editor, or you can set this option by clicking
More Settings on the Fitter Settings page in the Settings dialog box
h For more information, refer to Auto Packed Registers in Quartus II Help.
Set Fitter Aggressive Routability Optimizations to Always
Use this option if your design does not fit due to excessive routing wire utilization.
h For more information, refer to Fitter Aggressive Routability Optimizations logic option in
Quartus II Help.
If there is a significant imbalance between placement and routing time (during the
first fitting attempt), it might be because of high wire utilization. By turning on this
option, you might be able to reduce your compilation time.
On average, this option can save up to 6% wire utilization, but can also reduce
performance by up to 4%, depending on the device.
These optimizations are used automatically when the Fitter performs more than one
fitting attempt, but turning the option on increases the optimization effort on the first
fitting attempt. This option also ensures that the Quartus II software uses maximum
optimization to reduce routability, even if the Fitter Effort is set to Auto Fit.
Increase Placement Effort Multiplier
Increasing the placement effort can improve the routability of the design, allowing the
software to route a design that otherwise requires too many routing resources.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–24
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
h For more information refer to Placement Effort Multiplier logic option in Quartus II Help.
Increased effort is used automatically when the Fitter performs more than one fitting
attempt. Setting a multiplier higher than one (before compilation) increases the
optimization effort on the first fitting attempt. The second and third fitting loops
increase the Placement Effort Multiplier to 4 and then to 16. These loops result in
increased compilation times, with possible improvement in the quality of placement.
You can modify the Placement Effort Multiplier using the following Tcl command:
set_global_assignment -name PLACEMENT_EFFORT_MULTIPLIER <value> r
<value> can be any positive, non-zero number.
Increasing placement effort is likely to reduce congestion during routing, and help fit
hard-to-route designs. Increasing the Placement Effort Multiplier and limiting the
Fitter to one fitting attempt for hard-to-fit designs can produce better Fitter results
with lower overall compilation time.
Increase Router Effort Multiplier
The Router Effort Multiplier controls how quickly the router tries to find a valid
solution. The default value is 1.0 and legal values must be greater than 0. Numbers
higher than 1 help designs that are difficult to route by increasing the routing effort.
Numbers closer to 0 (for example, 0.1) can reduce router runtime, but usually reduce
routing quality slightly. Experimental evidence shows that a multiplier of 3.0 reduces
overall wire usage by about 2%. Using a Router Effort Multiplier higher than the
default value could be beneficial for designs with complex datapaths with more than
five levels of logic. However, congestion in a design is primarily due to placement,
and increasing the Router Effort Multiplier does not necessarily reduce congestion.
h For more information, refer to Router Effort Multiplier logic option in Quartus II Help.
Remove Fitter Constraints
A design with conflicting constraints or constraints that are difficult to meet may not
fit the targeted device. This can occur when location or LogicLock assignments are too
strict and there are not enough routing resources.
In this case, use the Routing Congestion task in the Chip Planner to locate routing
problems in the floorplan, then remove all location and LogicLock region assignments
from that area. If the local constraints are removed, and the design still does not fit,
the design is over-constrained. To correct the problem, remove all location and
LogicLock assignments and run successive compilations, incrementally constraining
the design before each compilation. You can delete specific location assignments in
the Assignment Editor or the Chip Planner. Remove LogicLock assignments in the
Chip Planner, in the LogicLock Regions Window, or on the Assignments menu, click
Remove Assignments. Turn on the assignment categories you want to remove from
the design in the Available assignment categories list.
f For more information about the Routing Congestion task in the Chip Planner, refer to
the Analyzing and Optimizing the Design Floorplan chapter in volume 2 of the Quartus II
Handbook. You can also refer to About the Chip Planner in Quartus II Help.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (LUT-Based Devices)
13–25
Optimize Synthesis for Area, Not Speed
In some cases, resynthesizing the design to improve the area utilization can also
improve the routability of the design. First, ensure that you have set your device and
timing constraints correctly in your synthesis tool. Ensure that you do not
over-constrain the timing requirements for the design, particularly when the area
utilization of the design is a concern. Synthesis tools generally try to meet the
specified requirements, which can result in higher device resource usage if the
constraints are too aggressive.
If resource utilization is important to improving the routing results in your design,
some synthesis tools offer an easy way to optimize for area instead of speed. If you are
using Quartus II integrated synthesis, on the Assignments menu, click Settings. In the
Category list, select Analysis & Synthesis Settings, and select Balanced or Area
under Optimization Technique.
You can also specify this logic option for specific modules in your design with the
Assignment Editor in cases where you want to reduce area using the Area setting
(potentially at the expense of register-to-register timing performance). You can apply
the setting to specific modules while leaving the default Optimization Technique
setting at Balanced (for the best trade-off between area and speed for certain device
families) or Speed. You can also use the Speed Optimization Technique for Clock
Domains logic option to specify that all combinational logic in or between the
specified clock domain(s) is optimized for speed.
1
In the Quartus II software, the Balanced setting typically produces utilization results
that are very similar to those obtained with the Area setting, with better performance
results. The Area setting can yield better results in some unusual cases.
In some synthesis tools, not specifying an fMAX requirement can result in less resource
utilization, which can improve routability.
f For information about setting timing requirements and synthesis options in
Quartus II integrated synthesis and other synthesis tools, refer to the appropriate
chapter in Section III. Synthesis in volume 1 of the Quartus II Handbook, or your
synthesis software’s documentation.
Optimize Source Code
If your design does not fit because of routing problems and the methods described in
the preceding sections do not sufficiently improve the routability of the design,
modify the design at the source to achieve the desired results. You can often improve
results significantly by making design-specific changes to your source code, such as
duplicating logic or changing the connections between blocks that require significant
routing resources.
Use a Larger Device
If a successful fit cannot be achieved because of a shortage of routing resources, you
might require a larger device.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–26
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
Timing Optimization Techniques (LUT-Based Devices)
This section contains guidelines that might help you if your design does not meet its
timing requirements.
Debugging Timing Failures in the TimeQuest Analyzer
Beginning with the Quartus II software version 10.1, a new Report Timing Closure
Recommendations task is available in the Custom Reports section of the Tasks pane
of the TimeQuest analyzer. Use this report to get more information and help on the
failing paths in your design. This feature is available for Arria II GX, Arria II GZ,
Cyclone III, Cyclone IV, Stratix III, Stratix IV, and Stratix V device families.
Selecting the Report Timing Closure Recommendations task opens the Report
Design Analysis dialog box (Figure 13–5).
Figure 13–5. Report Design Analysis Dialog Box
When you run the Report Timing Closure Recommendations task, you get specific
recommendations about failing paths in your design and changes that you can make
to potentially fix the failing paths.
From the dialog box (Figure 13–5), you can select paths based on the clock domain,
filter by nodes on path, and choose the number of paths to analyze.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
13–27
After running this command in the TimeQuest analyzer, examine the reports in the
Report Timing Closure Recommendations folder in the Report pane of the TimeQuest
analyzer GUI. Each recommendation has star symbols (*) associated with it.
Recommendations with more stars are more likely to help you close timing on your
design.
Figure 13–6 shows an example report.
Figure 13–6. Example Report
The reports give you the most probable causes of failure for each path being analyzed.
The reports are organized into sections, depending on the type of issues found in the
design, such as large clock skew, restricted optimizations, unbalanced logic, skipped
optimizations, coding style that has too many levels of logic between registers, or
region or partition constraints specific to your project.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–28
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
You will see recommendations that may help you fix the failing paths. For a detailed
analysis of the critical paths, run the report_timing command on specified paths. In
the Extra Fitter Information tab of the Path report panel, you will also see detailed
Fitter-related information that may help you visualize the issue and take appropriate
action if user constraints cause a specific placement.
Timing Optimization Advisor
The Timing Optimization Advisor guides you in making settings that optimize your
design to meet your timing requirements. To run the Timing Optimization Advisor,
on the Tools menu, point to Advisors, and click on Timing Optimization Advisor.
This advisor describes many of the suggestions made in this section.
When you open the Timing Optimization Advisor after compilation, you can find
recommendations to improve the timing performance of your design. Some of the
recommendations in these advisors can contradict each other. Altera recommends
evaluating these options and choosing the settings that best suit the given
requirements.
The example in Figure 13–7 shows the Timing Optimization Advisor after compiling a
design that meets its frequency requirements, but requires setting changes to improve
the timing.
Figure 13–7. Timing Optimization Advisor
This button makes the recommended
changes automatically.
These options open the Settings dialog box or Assignment
Editor so you can manually change the settings.
When you expand one of the categories in the Advisor, such as Maximum Frequency
(fmax) or I/O Timing (tsu, tco, tpd), the recommendations are divided into stages.
The stages show the order in which to apply the recommended settings. The first
stage contains the options that are easiest to change, make the least drastic changes to
your design optimization, and have the least effect on compilation time. Icons indicate
whether each recommended setting has been made in the current project. In
Figure 13–7, the checkmark icons in the list of recommendations for Stage 1 indicate
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
13–29
recommendations that are already implemented. The warning icons indicate
recommendations that are not followed for this compilation. The information icons
indicate general suggestions. For these entries, the advisor does not report whether
these recommendations were followed, but instead explains how you can achieve
better performance. For a legend that provides more information for each icon, refer
to the “How to use” page in the Advisor.
There is a link from each recommendation to the appropriate location in the
Quartus II UI where you can change the settings. For example, consider the Synthesis
Netlist Optimizations page of the Settings dialog box or the Global Signals category
in the Assignment Editor. This approach provides the most control over which
settings are made and helps you learn about the settings in the software. In some
cases, you can also use the Correct the Settings button to automatically make the
suggested change to global settings.
For some entries in the advisor, a button appears that allows you to further analyze
your design and gives you more information. The advisor provides a table with the
clocks in the design and indicates whether they have been assigned a timing
constraint.
I/O Timing Optimization
The next stage of design optimization focuses on I/O timing. Ensure that you have
made the appropriate assignments as described in “Initial Compilation: Required
Settings” on page 13–2, and that the resource utilization is satisfactory before
proceeding with I/O timing optimization. The suggestions provided in this section
are applicable to all Altera FPGA families and to the MAX II family of CPLDs.
Because changes to the I/O paths affect the internal register-to-register timing,
complete this stage before proceeding to the register-to-register timing optimization
stage as described in the “Register-to-Register Timing Optimization Techniques
(LUT-Based Devices)” on page 13–33.
The options presented in this section address how to improve I/O timing, including
the setup delay (tSU), hold time (tH), and clock-to-output (tCO) parameters.
Improving Setup and Clock-to-Output Times Summary
Table 13–1 shows the recommended order in which to use techniques to reduce tSU
and tCO times. Checkmarks indicate which timing parameters are affected by each
technique. Reducing tSU times increases hold (tH) times.
Table 13–1. Improving Setup and Clock-to-Output Times
(Note 1) (Part 1 of 2)
Technique
Affects tSU
Affects tCO
Ensure that the appropriate constraints are set for the failing I/Os (page 13–3)
v
v
Use timing-driven compilation for I/O (page 13–30)
v
v
Use fast input register (page 13–31)
v
—
Use fast output register, fast output enable register, and fast OCT register (page 13–31)
—
v
Decrease the value of Input Delay from Pin to Input Register or set Decrease Input Delay to
Input Register = ON
v
—
Decrease the value of Input Delay from Pin to Internal Cells, or set Decrease Input Delay to
Internal Cells = ON
v
—
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–30
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
Table 13–1. Improving Setup and Clock-to-Output Times
(Note 1) (Part 2 of 2)
Affects tSU
Affects tCO
Decrease the value of Delay from Output Register to Output Pin, or set Increase Delay to
Output Pin = OFF (page 13–32)
—
v
Increase the value of Input Delay from Dual-Purpose Clock Pin to Fan-Out Destinations
(page 13–32)
v
—
Use PLLs to shift clock edges (page 13–32)
v
v
Use the Fast Regional Clock (page 13–33)
—
v
For MAX II series devices, set Guarantee I/O paths to zero, Hold Time at Fast Timing Corner
to OFF, or when tSU and tPD constraints permit (page 13–33)
v
—
Increase the value of Delay to output enable pin or set Increase delay to output enable pin
(page 13–32)
—
v
Technique
Note to Table 13–1:
(1) These options may not apply to all device families.
Timing-Driven Compilation
This option moves registers into I/O elements if required to meet tSU or tCO
assignments, duplicating the register if necessary (as in the case in which a register
fans out to multiple output locations). This option is turned on by default and is a
global setting. The option does not apply to MAX II series devices because they do not
contain I/O registers.
The Optimize IOC Register Placement for Timing option affects only pins that have
a tSU or tCO requirement. Using the I/O register is possible only if the register directly
feeds a pin or is fed directly by a pin. This setting does not affect registers with any of
the following characteristics:
■
Have combinational logic between the register and the pin
■
Are part of a carry or cascade chain
■
Have an overriding location assignment
■
Use the asynchronous load port and the value is not 1 (in device families where the
port is available)
Registers with the characteristics listed are optimized using the regular Quartus II
Fitter optimizations.
h For more information, refer to Optimize IOC Register Placement for Timing logic option in
Quartus II Help.
Fast Input, Output, and Output Enable Registers
You can place individual registers in I/O cells manually by making fast I/O
assignments with the Assignment Editor. For an input register, use the Fast Input
Register option; for an output register, use the Fast Output Register option; and for
an output enable register, use the Fast Output Enable Register option. Stratix II
devices also support the Fast OCT (on-chip termination) Register option. In MAX II
series devices, which have no I/O registers, these assignments lock the register into
the LAB adjacent to the I/O pin if there is a pin location assignment for that I/O pin.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
13–31
If the fast I/O setting is on, the register is always placed in the I/O element. If the fast
I/O setting is off, the register is never placed in the I/O element. This is true even if
the Optimize IOC Register Placement for Timing option is turned on. If there is no
fast I/O assignment, the Quartus II software determines whether to place registers in
I/O elements if the Optimize IOC Register Placement for Timing option is turned
on.
You can also use the four fast I/O options (Fast Input Register, Fast Output Register,
Fast Output Enable Register, and Fast OCT Register) to override the location of a
register that is in a LogicLock region, and force it into an I/O cell. If you apply this
assignment to a register that feeds multiple pins, the register is duplicated and placed
in all relevant I/O elements. In MAX II series devices, the register is duplicated and
placed in each distinct LAB location that is next to an I/O pin with a pin location
assignment.
Programmable Delays
You can use various programmable delay options to minimize the tSU and tCO times.
For Arria, Cyclone, MAX II, MAX V, and Stratix series devices, the Quartus II
software automatically adjusts the applicable programmable delays to help meet
timing requirements. Programmable delays are advanced options to use only after
you compile a project, check the I/O timing, and determine that the timing is
unsatisfactory. For detailed information about the effect of these options, refer to the
device family handbook or data sheet.
After you have made a programmable delay assignment and compiled the design,
you can view the implemented delay values for every delay chain for every I/O pin in
the Delay Chain Summary section of the Compilation Report.
You can assign programmable delay options to supported nodes with the Assignment
Editor. You can also view and modify the delay chain setting for the target device with
the Chip Planner and Resource Property Editor. When you use the Resource Property
Editor to make changes after performing a full compilation, recompiling the entire
design is not necessary; you can save changes directly to the netlist. Because these
changes are made directly to the netlist, the changes are not made again automatically
when you recompile the design. The change management features allow you to
reapply the changes on subsequent compilations.
Although the programmable delays in newer devices are user-controllable, Altera
recommends their use for advanced users only. However, the Quartus II software
might use the programmable delays internally during the Fitter phase.
f For more details about Stratix III programmable delays, refer to the Stratix III Device
Handbook and AN 474: Implementing Stratix III Programmable I/O Delay Settings in the
Quartus II Software. For more information about using the Chip Planner and Resource
Property Editor, refer to the Engineering Change Management with the Chip Planner
chapter in volume 2 of the Quartus II Handbook.
h For details about the programmable delay logic options available for Altera devices,
refer to the following Quartus II Help topics:
May 2011
■
Decrease Input Delay to Input Register
■
Input Delay from Pin to Input Register
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–32
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
■
Decrease Input Delay to Internal Cells
■
Input Delay from Pin to Internal Cells
■
Decrease Input Delay to Output Register
■
Increase Delay to Output Enable Pin
■
Output Enable Pin Delay
■
Increase Delay to Output Pin
■
Delay from Output Register to Output Pin
■
Increase Input Clock Enable Delay
■
Input Delay from Dual-Purpose Clock Pin to Fan-Out Destinations
■
Increase Output Clock Enable Delay
■
Increase Output Enable Clock Enable Delay
■
Increase tzx Delay to Output Pin
Use PLLs to Shift Clock Edges
Using a PLL typically improves I/O timing automatically. If the timing requirements
are still not met, most devices allow the PLL output to be phase shifted to change the
I/O timing. Shifting the clock backwards gives a better tH at the expense of tSU, while
shifting it forward gives a better tSU at the expense of tH (refer to Figure 13–8). This
technique can be used only in devices that offer PLLs with the phase shift option.
Figure 13–8. Shift Clock Edges Forward to Improve tSU at the Expense of tH
You can achieve the same type of effect in certain devices by using the programmable
delay called Input Delay from Dual Purpose Clock Pin to Fan-Out Destinations.
h For more information, refer to Input Delay from Dual-Purpose Clock Pin to Fan-Out
Destinations in Quartus II Help.
Use Fast Regional Clock Networks and Regional Clocks Networks
Altera devices have a variety of hierarchical clock structures. These include dedicated
global clock networks (GCLKs), regional clock networks (RCLKs), fast regional clock
networks (FCLK) and periphery clock networks (PCLKs). The available resources
differ between various Altera device families.
f For the number of various clocking resources available in your target device, refer to
the appropriate device handbook.
In general, fast regional clocks have less delay to I/O elements than regional and
global clocks, and are used for high fan-out control signals. Regional clocks provide
the lowest clock delay and skew for logic contained in a single quadrant. Placing
clocks on these low-skew and low-delay clock nets provides better tCO performance.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
13–33
Change How Hold Times are Optimized for MAX II Devices
For MAX II series devices, you can use the Guarantee I/O paths have zero hold time
at Fast Timing Corner option to control how hold time is optimized by the Quartus II
software.
h For details, refer to Guarantee I/O Paths Have Zero Hold Time at Fast Corner logic option in
Quartus II Help.
Register-to-Register Timing Optimization Techniques (LUT-Based Devices)
The next stage of design optimization is to improve register-to-register (fMAX) timing.
The following sections provide available options if the performance requirements are
not achieved after compilation.
Coding style affects the performance of your design to a greater extent than other
changes in settings. Always evaluate your code and make sure to use synchronous
design practices.
f For more details about synchronous design practices and coding styles, refer to the
Recommended Design Practices chapter in volume 1 of the Quartus II Handbook.
1
When using the TimeQuest analyzer, register-to-register timing optimization is the
same as maximizing the slack on the clock domains in your design. You can use the
techniques described in this section to improve the slack on different timing paths in
your design.
Before optimizing your design, understand the structure of your design as well as the
type of logic affected by each optimization. An optimization can decrease
performance if the optimization does not benefit your logic structure.
Optimize Source Code
In many cases, optimizing the design’s source code can have a very significant effect
on your design performance. In fact, optimizing your source code is typically the most
effective technique for improving the quality of your results, and is often a better
choice than using LogicLock or location assignments.
Be aware of the number of logic levels needed to implement your logic while you are
coding. Too many levels of logic between registers could result in critical paths failing
timing. Try restructuring the design to use pipelining or more efficient coding
techniques. Also, try limiting high fan-out signals in the source code. When possible,
duplicate and pipeline control signals. Make sure the duplicate registers are protected
by a preserve attribute, to avoid merging during synthesis.
If the critical path in your design involves memory or DSP functions, check whether
you have code blocks in your design that describe memory or functions that are not
being inferred and placed in dedicated logic. You might be able to modify your source
code to cause these functions to be placed into high-performance dedicated memory
or resources in the target device. When using RAM/DSP blocks, enable the optional
input and output registers.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–34
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
Ensure that your state machines are recognized as state machine logic and optimized
appropriately in your synthesis tool. State machines that are recognized are generally
optimized better than if the synthesis tool treats them as generic logic. In the
Quartus II software, you can check for the State Machine report under Analysis &
Synthesis in the Compilation Report. This report provides details, including the state
encoding for each state machine that was recognized during compilation. If your state
machine is not being recognized, you might have to change your source code to
enable it to be recognized.
f For coding style guidelines including examples of HDL code for inferring memory,
functions, guidelines, and sample HDL code for state machines, refer to the
Recommended HDL Coding Styles chapter in volume 1 of the Quartus II Handbook.
f For additional HDL coding examples. refer to AN 584: Timing Closure Methodology for
Advanced FPGAs.
Improving Register-to-Register Timing Summary
The choice of options and settings to improve the timing margin (slack) or to improve
register-to-register timing depends on the failing paths in the design. To achieve the
results that best approximate your performance requirements, apply the following
techniques and compile the design after each step:
1. Ensure that your timing assignments are complete and correct. For details, refer to
“Timing Requirement Settings” on page 13–3.
2. Ensure that you have reviewed all warning messages from your initial
compilation and check for ignored timing assignments. Refer to “Design Analysis”
on page 13–9 for details and fix any of these problems before proceeding with
optimization.
3. Apply netlist synthesis optimization options.
Apply the following synthesis options to optimize for speed:
■
“Optimize Synthesis for Speed, Not Area” on page 13–36
■
“Flatten the Hierarchy During Synthesis” on page 13–37
■
“Set the Synthesis Effort to High” on page 13–37
■
“Change State Machine Encoding” on page 13–38
■
“Prevent Shift Register Inference” on page 13–38
■
“Use Other Synthesis Options Available in Your Synthesis Tool” on page 13–39
4. Apply the following options for physical synthesis optimization:
■
Perform physical synthesis for combinational logic
■
Perform automatic asynchronous signal pipelining
■
Perform register duplication
■
Perform register retiming
■
Perform logic to memory mapping
5. Try different Fitter seeds (page 13–39). You can omit this step if a large number of
critical paths are failing, or if the paths are failing badly.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
13–35
6. Make LogicLock assignments (page 13–40) to control placement.
7. Make design source code modifications to fix areas of the design that are still
failing timing requirements by significant amounts (page 13–33).
8. Make location assignments, or as a last resort, perform manual placement by
back-annotating the design (page 13–41).
You can use the Design Space Explorer (DSE) to automate the process of running
several different compilations with different settings.
h For more information, refer to About Design Space Explorer in Quartus II Help.
If these techniques do not achieve performance requirements, additional design
source code modifications might be required (page 13–33).
Physical Synthesis Optimizations
The Quartus II software offers physical synthesis optimizations that can help improve
the performance of many designs regardless of the synthesis tool used. Physical
synthesis optimizations can be applied both during synthesis and during fitting.
Physical synthesis optimizations that occur during the synthesis stage of the
Quartus II compilation operate either on the output from another EDA synthesis tool
or as an intermediate step in Quartus II integrated synthesis. These optimizations
make changes to the synthesis netlist to improve either area or speed, depending on
your selected optimization technique and effort level.
To view and modify the synthesis netlist optimization options, on the Assignments
menu, click Settings. In the Category list, expand Compilation Process Settings and
select Physical Synthesis Optimizations.
If you use a third-party EDA synthesis tool and want to determine if the Quartus II
software can remap the circuit to improve performance, you can use the Perform
WYSIWYG Primitive Resynthesis option. This option directs the Quartus II software
to unmap the LEs in an atom netlist to logic gates and then map the gates back to
Altera-specific primitives. Using Altera-specific primitives enables the Fitter to remap
the circuits using architecture-specific techniques.
h For more information, refer to Perform WYSIWYG Primitive Resynthesis logic option in
Quartus II Help.
The Quartus II technology mapper optimizes the design for Speed, Area, or
Balanced, according to the setting of the Optimization Technique option. Set this
option to Speed or Balanced.
h For more information, refer to Optimization Technique logic option in Quartus II Help.
The physical synthesis optimizations occur during the Fitter stage of the Quartus II
compilation. Physical synthesis optimizations make placement-specific changes to the
netlist that improve speed performance results for a specific Altera device.
The following physical synthesis optimizations are available during the Fitter stage
for improving performance:
■
May 2011
Altera Corporation
Physical synthesis for combinational logic
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–36
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
1
■
Automatic asynchronous signal pipelining
■
Physical synthesis for registers
■
Register duplication
■
Register retiming
You can apply physical synthesis options on specific instances if you want the
performance gain from physical synthesis only on parts of your design.
h For more information, refer to Physical Synthesis Optimizations Page (Settings Dialog
Box) in Quartus II Help.
To apply physical synthesis assignments for fitting on a per instance basis, use the
Quartus II Assignment Editor. The following assignments are available as instance
assignments:
■
Perform physical synthesis for combinational logic
■
Perform register duplication for performance
■
Perform register retiming for performance
■
Perform automatic asynchronous signal pipelining
Follow these steps:
1. In the Assignment Editor, indicate the module instance you want to apply to the
specific physical synthesis setting in the To tab.
2. Select the required physical synthesis assignment in the Assignment Name tab.
3. In the Value tab, select ON.
4. In the Enabled tab, select Yes.
Turn Off Extra-Effort Power Optimization Settings
If PowerPlay power optimization settings are set to Extra Effort, your design
performance can be affected. If improving timing performance is more important than
reducing power use, set the PowerPlay power optimization setting to Normal.
h For more information, refer to PowerPlay Power Optimization logic option in Quartus II
Help.
f For more information about reducing power use, refer to the Power Optimization
chapter in volume 2 of the Quartus II Handbook.
Optimize Synthesis for Speed, Not Area
The manner in which the design is synthesized has a large impact on design
performance. Design performance varies depending on the way the design is coded,
the synthesis tool used, and the options specified when synthesizing. Change your
synthesis options if a large number of paths are failing, or if specific paths are failing
badly and have many levels of logic.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
13–37
Set your device and timing constraints in your synthesis tool. Synthesis tools are
timing-driven and optimized to meet specified timing requirements. If you do not
specify target frequency, some synthesis tools optimize for area.
Some synthesis tools offer an easy way to instruct the tool to focus on speed instead of
area.
h For more information, refer to Optimization Technique logic option in Quartus II Help
You can also specify this logic option for specific modules in your design with the
Assignment Editor while leaving the default Optimization Technique setting at
Balanced (for the best trade-off between area and speed for certain device families) or
Area (if area is an important concern). You can also use the Speed Optimization
Technique for Clock Domains option in the Assignment Editor to specify that all
combinational logic in or between the specified clock domain(s) is optimized for
speed.
To achieve best performance with push-button compilation, follow the
recommendations in the following sections for other synthesis settings. You can use
the DSE to experiment with different Quartus II synthesis options to optimize your
design for the best performance.
f For information about setting timing requirements and synthesis options in
Quartus II integrated synthesis and third-party synthesis tools, refer to the
appropriate chapter in Section III. Synthesis in volume 1 of the Quartus II Handbook, or
refer to your synthesis software documentation.
h For more information about the Design Space Explorer, refer to About Design Space
Explorer in Quartus II Help.
Flatten the Hierarchy During Synthesis
Synthesis tools typically let you preserve hierarchical boundaries, which can be useful
for verification or other purposes. However, the best optimization results generally
occur when the synthesis tool optimizes across hierarchical boundaries, because
doing so often allows the synthesis tool to perform the most logic minimization,
which can improve performance. Whenever possible, flatten your design hierarchy to
achieve the best results. If you are using Quartus II incremental compilation, you
cannot flatten your design across design partitions. Incremental compilation always
preserves the hierarchical boundaries between design partitions. Follow Altera’s
recommendations for design partitioning, such as registering partition boundaries to
reduce the effect of cross-boundary optimizations.
f For more information about using incremental compilation and recommendations for
design partitioning, refer to the Quartus II Incremental Compilation for Hierarchical and
Team-Based Design chapter in volume 1 of the Quartus II Handbook.
Set the Synthesis Effort to High
Some synthesis tools offer varying synthesis effort levels to trade off compilation time
with synthesis results. Set the synthesis effort to high to achieve best results when
applicable.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–38
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
Change State Machine Encoding
State machines can be encoded using various techniques. One-hot encoding, which
uses one register for every state bit, usually provides the best performance. If your
design contains state machines, changing the state machine encoding to one-hot can
improve performance at the cost of area.
h For more information, refer to State Machine Processing logic option in Quartus II Help.
Duplicate Logic for Fan-Out Control
Duplicating logic or registers can help improve timing in cases where moving a
register in a failing timing path to reduce routing delay creates other failing paths, or
where there are timing problems due to the fan-out of the registers. Most often, timing
failures occur not because of the high fan-out registers, but because of the location of
those registers. Duplicating registers, where source and destination registers are
physically close, can help improve slack on critical paths.
Many synthesis tools support options or attributes that specify the maximum fan-out
of a register. When using Quartus II integrated synthesis, you can set the Maximum
Fan-Out logic option in the Assignment Editor to control the number of destinations
for a node so that the fan-out count does not exceed a specified value. You can also use
the maxfan attribute in your HDL code. The software duplicates the node as required
to achieve the specified maximum fan-out.
Logic duplication using Maximum Fan-Out assignments normally increases resource
utilization and can potentially increase compilation time, depending on the placement
and the total resource usage within the selected device. The improvement in timing
performance that results because of Maximum Fan-Out assignments is very
design-specific. This is because when you use the Maximum Fan-Out assignment,
although the Fitter duplicates the source logic to limit the fan-out, it may not be able
to control the destinations that each of the duplicated sources drive. Since the
Maximum Fan-Out destination does not specify which of the destinations the
duplicated source should drive, it is possible that it might still be driving logic located
all around the device. To avoid this situation, you could use the Manual Logic
Duplication logic option.
If you are using Maximum Fan-Out assignments, Altera recommends benchmarking
your design with and without these assignments to evaluate whether they give the
expected improvement in timing performance. Use the assignments only when you
get improved results.
You can manually duplicate registers in the Quartus II software regardless of the
synthesis tool used. To duplicate a register, apply the Manual Logic Duplication logic
option to the register with the Assignment Editor.
h For more information, refer to Manual Logic Duplication logic option in Quartus II Help.
Prevent Shift Register Inference
In some cases, turning off the inference of shift registers increases performance. Doing
so forces the software to use logic cells to implement the shift register instead of
implementing the registers in memory blocks using the ALTSHIFT_TAPS
megafunction. If you implement shift registers in logic cells instead of memory, logic
utilization is increased.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
13–39
Use Other Synthesis Options Available in Your Synthesis Tool
With your synthesis tool, experiment with the following options if they are available:
■
Turn on register balancing or retiming
■
Turn on register pipelining
■
Turn off resource sharing
These options can increase performance, but typically increase the resource utilization
of your design.
Fitter Seed
The Fitter seed affects the initial placement configuration of the design. Changing the
seed value changes the Fitter results, because the fitting results change whenever
there is a change in the initial conditions. Each seed value results in a somewhat
different fit, and you can experiment with several different seeds to attempt to obtain
better fitting results and timing performance.
When there are changes in your design, there is some random variation in
performance between compilations. This variation is inherent in placement and
routing algorithms—there are too many possibilities to try them all and get the
absolute best result, so the initial conditions change the compilation result.
1
Any design change that directly or indirectly affects the Fitter has the same type of
random effect as changing the seed value. This includes any change in source files,
Analysis & Synthesis Settings, Fitter Settings, or Timing Analyzer Settings. The
same effect can appear if you use a different computer processor type or different
operating system, because different systems can change the way floating point
numbers are calculated in the Fitter.
If a change in optimization settings slightly affects the register-to-register timing or
number of failing paths, you cannot always be certain that your change caused the
improvement or degradation, or whether it could be due to random effects in the
Fitter. If your design is still changing, running a seed sweep (compiling your design
with multiple seeds) determines whether the average result has improved after an
optimization change and whether a setting that increases compilation time has
benefits worth the increased time (such as setting the Physical Synthesis Effort to
Extra). The sweep also shows the amount of random variation to expect for your
design.
If your design is finalized, you can compile your design with different seeds to obtain
one optimal result. However, if you subsequently make any changes to your design,
you might need to perform seed sweep again.
On the Assignments menu, select Fitter Settings to control the initial placement with
the seed. You can use the DSE to perform a seed sweep easily.
You can use the following Tcl command from a script to specify a Fitter seed:
set_global_assignment -name SEED <value> r
h For more information about compiling your design with different seeds using the
Design Space Explorer (DSE seed sweep), refer to About Design Space Explorer in
Quartus II Help.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–40
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
Set Maximum Router Timing Optimization Level
To improve routability in designs where the router did not pick up the optimal
routing lines, set the Router Timing Optimization Level to Maximum. This setting
determines how aggressively the router tries to meet timing requirements. Setting this
option to Maximum can increase design speed slightly at the cost of increased
compilation time. Setting this option to Minimum can reduce compilation time at the
cost of slightly reduced design speed. The default value is Normal.
h For more information, refer to Router Timing Optimization Level logic option in
Quartus II Help.
LogicLock Assignments
Using LogicLock assignments to improve timing performance is only recommended
for older Altera devices, such as the MAX II family. For other device families,
especially for larger devices such as Arria and Stratix series devices, using LogicLock
assignments to improve timing performance is not recommended. For these devices,
the LogicLock feature is intended to be used for performance preservation and to
floorplan your design.
LogicLock assignments do not always improve the performance of the design. In
many cases, you cannot improve upon results from the Fitter by making location
assignments. If there are existing LogicLock assignments in your design, remove the
assignments if your design methodology permits it. Recompile the design to see if the
assignments are making the performance worse.
When making LogicLock assignments, it is important to consider how much
flexibility to give the Fitter. LogicLock assignments provide more flexibility than hard
location assignments. Assignments that are more flexible require higher Fitter effort,
but reduce the chance of design over-constraint. The following types of LogicLock
assignments are available, listed in the order of decreasing flexibility:
■
Auto size, floating location regions
■
Fixed size, floating location regions
■
Fixed size, locked location regions
f For more information about using LogicLock regions, refer to the Analyzing and
Optimizing the Design Floorplan chapter in volume 2 of the Quartus II Handbook.
To determine what to put into a LogicLock region, refer to the timing analysis results
and analyze the critical paths in the Chip Planner. The register-to-register timing
paths in the Timing Analyzer section of the Compilation Report help you recognize
patterns.
The following sections describe cases in which LogicLock regions can help to
optimize a design.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (LUT-Based Devices)
13–41
Hierarchy Assignments
For a design with the hierarchy shown in Figure 13–9, which has failing paths in the
timing analysis results similar to those shown in Table 13–2, mod_A is probably a
problem module. In this case, a good strategy to fix the failing paths is to place the
mod_A hierarchy block in a LogicLock region so that all the nodes are closer together in
the floorplan.
Figure 13–9. Design Hierarchy
Table 13–2 shows the failing paths connecting two regions together within mod_A
listed in the timing analysis report.
Table 13–2. Failing Paths in a Module Listed in Timing Analysis
From
To
|mod_A|reg1
|mod_A|reg9
|mod_A|reg3
|mod_A|reg5
|mod_A|reg4
|mod_A|reg6
|mod_A|reg7
|mod_A|reg10
|mod_A|reg0
|mod_A|reg2
Hierarchical LogicLock regions are also important if you are using an incremental
compilation flow. Place each design partition for incremental compilation in a
separate LogicLock region to reduce conflicts and ensure good results as the design
develops. You can use auto size and floating location regions to find a good design
floorplan, but fix the size and placement to achieve the best results in future
compilations.
f For more information about using incremental compilation and recommendations for
creating a design floorplan using LogicLock regions, refer to the Quartus II Incremental
Compilation for Hierarchical and Team-Based Design and Best Practices for Incremental
Compilation and Floorplan Assignments chapters in volume 1 of the Quartus II Handbook,
and Analyzing and Optimizing the Design Floorplan chapter in volume 2 of the
Quartus II Handbook.
Location Assignments and Back-Annotation
If a small number of paths are failing to meet their timing requirements, you can use
hard location assignments to optimize placement. Location assignments are less
flexible for the Quartus II Fitter than LogicLock assignments. In some cases, when you
are familiar with your design, you can enter location constraints in a way that
produces better results.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–42
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (Macrocell-Based CPLDs)
1
Improving fitting results, especially for larger devices, such as Arria and Stratix series
devices, can be difficult. Location assignments do not always improve the
performance of the design. In many cases, you cannot improve upon the results from
the Fitter by making location assignments.
Metastability Analysis and Optimization Techniques
Metastability problems can occur when a signal is transferred between circuitry in
unrelated or asynchronous clock domains, because the designer cannot guarantee that
the signal will meet its setup and hold time requirements. The mean time between
failure (MTBF) is an estimate of the average time between instances when
metastability could cause a design failure.
f For more information about metastability and MTBF, refer to the Understanding
Metastability in FPGAs white paper.
You can use the Quartus II software to analyze the average MTBF due to metastability
when a design synchronizes asynchronous signals, and optimize the design to
improve the MTBF. These metastability features are supported only for designs
constrained with the TimeQuest analyzer, and for select device families.
If the MTBF of your design is low, refer to the Metastability Optimization section in
the Timing Optimization Advisor, which suggests various settings that can help
optimize your design in terms of metastability.
f For details about the metastability features in the Quartus II software, refer to the
Managing Metastability with the Quartus II Software chapter in volume 1 of the
Quartus II Handbook. This chapter describes how to enable metastability analysis and
identify the register synchronization chains in your design, provides details about
metastability reports, and provides additional guidelines for managing metastability.
Resource Utilization Optimization Techniques (Macrocell-Based CPLDs)
The following recommendations help you take advantage of the macrocell-based
architecture in the MAX 7000 and MAX 3000 devices to yield maximum speed,
reliability, and device resource utilization while minimizing fitting difficulties.
After design analysis, the first stage of design optimization is to improve resource
utilization. Complete this stage before proceeding to timing optimization. First,
ensure that you have set the basic constraints described in “Initial Compilation:
Required Settings” on page 13–2. If your design is not fitting into a specified device,
use the techniques in this section to achieve a successful fit.
Use Dedicated Inputs for Global Control Signals
MAX 7000 and MAX 3000 devices have four dedicated inputs that can be used for
global register control. Because the global register control signals can bypass the logic
cell array and directly feed registers, product terms can be preserved for primary
logic. Also, because each signal has a dedicated path into the LAB, global signals also
can bypass logic and data path interconnect resources.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (Macrocell-Based CPLDs)
13–43
Because the dedicated input pins are designed for high fan-out control signals and
provide low skew, always assign global signals (such as clock, clear, and output
enable) to the dedicated input pins.
You can use logic-generated control signals for global control signals instead of
dedicated inputs. However, the following list shows the disadvantages of using
logic-generated control signals:
■
More resources are required (logic cells, interconnect).
■
More data skew is introduced.
■
If the logic-generated control signals have high fan-out, the design can be more
difficult to fit.
By default, the Quartus II software uses dedicated inputs for global control signals
automatically. You can assign control signals to dedicated input pins in one of the
following ways:
■
In the Assignment Editor, select one of the two following methods:
■
Assign pins to dedicated pin locations.
■
Assign a Global Signal setting to the pins.
■
On the Assignments menu, click Settings. On the Analysis & Synthesis Settings
page, click More Settings, and in the Existing Option settings section, select Auto
Global Register Control Signals.
■
Insert a GLOBAL primitive after the pins.
■
If you have already assigned pins for the design in the MAX+PLUS® II software,
on the Assignments menu, click Import Assignments.
Reserve Device Resources
Because pin and logic option assignments can be necessary for board layout and
performance requirements, and because full utilization of the device resources can
increase the difficulty of fitting the design, Altera recommends that you leave 10% of
the logic cells and 5% of the I/O pins unused to accommodate future design
modifications. Following the Altera-recommended device resource reservation
guidelines for macrocell-based CPLDs increases the chance that the Quartus II
software can fit the design during recompilation after changes or assignments have
been made.
Pin Assignment Guidelines and Procedures
Sometimes user-specified pin assignments are necessary for board layout. This section
discusses pin assignment guidelines and procedures.
To minimize fitting issues with pin assignments, follow these guidelines:
May 2011
■
Assign speed-critical control signals to dedicated inputs.
■
Assign output enables to appropriate locations.
■
Estimate fan-in to assign output pins to the appropriate LAB.
■
Assign output pins that require parallel expanders to macrocells numbered 4 to 16.
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–44
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (Macrocell-Based CPLDs)
1
Altera recommends that you allow the Quartus II software to select pin assignments
automatically when possible. You can use the Quartus II Pin Advisor feature
(accessible from the Tools menu) for pin connection guidelines.
h For more information about the Pin Advisor, refer to Pin Advisor Command in
Quartus II Help.
Control Signal Pin Assignments
Assign speed-critical control signals to dedicated input pins. Every MAX 7000 and
MAX 3000 device has four dedicated input pins (GCLK1, OE2/GCLK2, OE1, and GCLRn).
You can assign clocks to global clock dedicated inputs (GCLK1 and OE2/GCLK2), clear to
the global clear dedicated input (GCLRn), and speed-critical output enable to global OE
dedicated inputs (OE1 and OE2/GCLK2).
Output Enable Pin Assignments
Occasionally, because the total number of required output enable pins is more than
the dedicated input pins, output enable signals must be assigned to I/O pins.
f To minimize possible fitting errors when assigning the output enable pins for
MAX 7000 and MAX 3000 devices, refer to Pin-Out Files for Altera Devices on the Altera
website (www.altera.com).
Estimate Fan-In When Assigning Output Pins
Macrocells with high fan-in can cause more placement problems for the Quartus II
Fitter than those with low fan-in. The maximum fan-in per LAB should not exceed 36
in MAX 7000 and MAX 3000 devices. Therefore, estimate the fan-in of logic (such as
an x-input AND gate) that feeds each output pin. If the total fan-in of logic that feeds
each output pin in the same LAB exceeds 36, compilation can fail. To save resources
and prevent compilation errors, avoid assigning pins that have high fan-in.
Outputs Using Parallel Expander Pin Assignments
Figure 13–10 illustrates how parallel expanders are used within a LAB. MAX 7000 and
MAX 3000 devices contain chains that can lend or borrow parallel expanders. The
Quartus II Fitter places macrocells in a location that allows them to lend and borrow
parallel expanders appropriately.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (Macrocell-Based CPLDs)
13–45
As shown in Figure 13–10, only macrocells 2 through 16 can borrow parallel
expanders. Therefore, assign output pins that might require parallel expanders to pins
adjacent to macrocells 4 through 16. Altera recommends using macrocells 4 through
16 because they can borrow the largest number of parallel expanders.
Figure 13–10. LAB Macrocells and Parallel Expander Associations
Macrocell 1 cannot borrow
any parallel expanders.
Macrocell 3 borrows up to ten
parallel expanders from
Macrocells 1 and 2.
LAB A
Macrocell 1
Macrocell 2 borrows up to five parallel
expanders from Macrocell 1.
Macrocell 2
Macrocell 3
Macrocell 4
Macrocell 5
Macrocell 6
Macrocell 7
Macrocell 8
Macrocell 9
Macrocell 10
Macrocells 4 through 16 borrow
up to 15 parallel expanders from the
three immediately-preceding macrocells.
Macrocell 11
Macrocell 12
Macrocell 13
Macrocell 14
Macrocell 15
Macrocell 16
Resolving Resource Utilization Problems
Two common Quartus II compilation fitting issues cause errors: excessive macrocell
usage and lack of routing resources. Macrocell usage errors occur when the total
number of macrocells in the design exceed the available macrocells in the device.
Routing errors occur when the available routing resources are insufficient to
implement the design. Check the Message window for the compilation results.
1
May 2011
Messages in the Messages window are also copied in the Report Files. Right-click on a
message and click Help for more information.
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–46
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (Macrocell-Based CPLDs)
Resolving Macrocell Usage Issues
Occasionally, a design requires more macrocell resources than are available in the
selected device, which results in the design not fitting. The following list provides tips
for resolving macrocell usage issues as well as tips to minimize the number of
macrocells used:
1
■
On the Assignments menu, click Settings. In the Category list, select Analysis &
Synthesis Settings, click More Settings, and turn off Auto Parallel Expanders. If
the design’s clock frequency (fMAX) is not an important design requirement, turn
off parallel expanders for all or part of the project. The design usually requires
more macrocells if parallel expanders are turned on.
■
Change Optimization Technique from Speed to Area. Selecting Area instructs the
compiler to give preference to area utilization rather than speed (fMAX). On the
Assignments menu, click Settings. In the Category list, change the Optimization
Technique option in the Analysis & Synthesis Settings page.
■
Use D-type flipflops instead of latches. Altera recommends that you always use
D-type flipflops instead of latches in your design because D-type flipflops can
reduce the macrocell fan-in, and thus reduce macrocell usage. The Quartus II
software uses extra logic to implement latches in MAX 7000 and MAX 3000
designs because MAX 7000 and MAX 3000 macrocells contain D-type flipflops
instead of latches.
■
Use asynchronous clear and preset instead of synchronous clear and preset. To
reduce the product term usage, use asynchronous clear and preset in your design
whenever possible. Using other control signals such as synchronous clear
produces macrocells and pins with higher fan-out.
After following the suggestions in this section, if your project still does not fit the
targeted device, consider using a larger device. When upgrading to a different
density, the vertical package-migration feature of the MAX 7000 and MAX 3000
device families allows pin assignments to be maintained.
Resolving Routing Issues
Routing is another resource that can cause design fitting issues. For example, if the
total fan-in into a LAB exceeds the maximum allowed, a no-fit error can occur during
compilation. If your design does not fit the targeted device because of routing issues,
consider the following suggestions:
■
Use dedicated inputs/global signals for high fan-out signals. The dedicated inputs
in MAX 7000 and MAX 3000 devices are designed for speed-critical and high
fan-out signals. Always assign high fan-out signals to dedicated inputs/global
signals.
■
Change the Optimization Technique option from Speed to Area. This option can
resolve routing resource and macrocell usage issues. Refer to “Resolving Macrocell
Usage Issues” on page 13–46.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Resource Utilization Optimization Techniques (Macrocell-Based CPLDs)
13–47
■
Reduce the fan-in per cell. If you are not limited by the number of macrocells used
in the design, you can use the Fan-in per cell (%) option to reduce the fan-in per
cell. The allowable values are 20–100%; the default value is 100%. Reducing the
fan-in can reduce localized routing congestion but increase the macrocell count.
You can set this logic option in the Assignment Editor or under More Settings in
the Analysis & Synthesis Settings page of the Settings dialog box.
■
On the Assignments menu, click Settings. In the Category list, select Analysis &
Synthesis Settings, click More Options, and turn off Auto Parallel Expanders. By
turning off the parallel expanders, you give the Quartus II software more fitting
flexibility for each macrocell, allowing macrocells to be relocated. For example,
each macrocell (previously grouped together in the same LAB) can be moved to a
different LAB to reduce routing constraints.
■
Insert logic cells. Inserting logic cells reduces fan-in and shared expanders used
per macrocell, increasing routability. By default, the Quartus II software
automatically inserts logic cells when necessary. Otherwise, Auto Logic Cell can
be disabled as follows. On the Assignments menu, click Settings. In the Category
list, select Analysis & Synthesis Settings. Under More Settings, turn off Auto
Logic Cell Insertion. Refer to “Using LCELL Buffers to Reduce Required
Resources” for more information.
■
Change pin assignments. If you want to discard your pin assignments, you can let
the Quartus II Fitter ignore some or all of the assignments.
1
If you prefer reassigning pins to increase routing efficiency, refer to “Pin
Assignment Guidelines and Procedures” on page 13–43.
Using LCELL Buffers to Reduce Required Resources
Complex logic, such as multilevel XOR gates, are often implemented with more than
one macrocell. When this occurs, the Quartus II software automatically allocates
shareable expanders—or additional macrocells (called synthesized logic cells)—to
supplement the logic resources that are available in a single macrocell. You can also
break down complex logic by inserting logic cells in the project to reduce the average
fan-in and the total number of shareable expanders required. Manually inserting logic
cells can provide greater control over speed-critical paths.
Instead of using the Quartus II software’s Auto Logic Cell Insertion option, you can
manually insert logic cells. However, Altera recommends that you use the Auto Logic
Cell Insertion option unless you know which part of the design is causing the
congestion.
A good location to manually insert LCELL buffers is where a single complex logic
expression feeds multiple destinations in your design. You can insert an LCELL buffer
just after the complex expression; the Quartus II Fitter extracts this complex
expression and places it in a separate logic cell. Rather than duplicate all the logic for
each destination, the Quartus II software feeds the single output from the logic cell to
all destinations.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–48
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (Macrocell-Based CPLDs)
To reduce fan-in and prevent no-fit compilations caused by routing resource issues,
insert an LCELL buffer after a NOR gate (Figure 13–11). The design in Figure 13–11
was compiled for a MAX 7000AE device. Without the LCELL buffer, the design
requires two macrocells and eight shareable expanders, and the average fan-in is 14.5
macrocells. However, with the LCELL buffer, the design requires three macrocells and
eight shareable expanders, and the average fan-in is just 6.33 macrocells.
Figure 13–11. Reducing the Average Fan-In by Inserting LCELL Buffers
Timing Optimization Techniques (Macrocell-Based CPLDs)
After resource optimization, design optimization focuses on timing. Ensure that you
have made the appropriate assignments as described in “Initial Compilation:
Required Settings” on page 13–2, and that the resource utilization is satisfactory
before proceeding with timing optimization.
The following five timing parameters are primarily responsible for a design’s
performance:
■
Setup time (tSU)—the propagation time for input data signals
■
Hold time (tH)—the propagation time for input data signals
■
Clock-to-output time (tCO)—the propagation time for output signals
■
Pin-to-pin delays (tPD)—the time required for a signal from an input pin to
propagate through combinational logic and appear at an external output pin
■
Maximum clock frequency (fMAX)—the internal register-to-register performance
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (Macrocell-Based CPLDs)
13–49
This section provides guidelines to improve the timing if the timing requirements are
not met. Figure 13–12 shows the parts of the design that determine the tSU, tH, tCO, tPD,
and fMAX timing parameters.
Figure 13–12. Main Timing Parameters that Determine the System’s Performance
Setup and Hold Time
DFF
Input
Logic
D
DFF
PRN
Q
Logic
CLRN
D
PRN
Q
Clock-to-Output Time
Output
Logic
CLRN
Clock Frequency
Input
When you are analyzing a design to improve performance, be sure to consider the two
major contributors to long delay paths:
■
Excessive levels of logic
■
Excessive loading (high fan-out)
When a MAX 7000 or MAX 3000 device signal drives more than one LAB, the
programmable interconnect array (PIA) delay increases by 0.1 ns per additional LAB
fan-out. Therefore, to minimize the added delay, concentrate the destination
macrocells into fewer LABs, minimizing the number of LABs that are driven. The
main cause of long delays in circuit design is excessive levels of logic.
Improving Setup Time
Sometimes the tSU timing reported by the Quartus II Fitter does not meet your timing
requirements. To improve the tSU timing, refer to the following guidelines:
May 2011
■
Turn on the Fast Input Register option using the Assignment Editor. The Fast
Input Register option allows input pins to directly drive macrocell registers via
the fast-input path, thus minimizing the pin-to-register delay. This option is useful
when a pin drives a D-type flipflop and there is no combinational logic between
the pin and the register.
■
Reduce the amount of logic between the input and the register. Excessive logic
between the input pin and register causes more delays. To improve setup time,
Altera recommends that you reduce the amount of logic between the input pin
and the register whenever possible.
■
Reduce fan-out. The delay from input pins to macrocell registers increases when
the fan-out of the pins increases. To improve the setup time, minimize the fan-out.
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–50
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (Macrocell-Based CPLDs)
Improving Clock-to-Output Time
To improve a design’s clock-to-output time, minimize the register-to-output-pin
delay. To improve the tCO timing, refer to the following guidelines:
■
Use the global clock. In addition to minimizing the delay from a register to an
output pin, minimizing the delay from the clock pin to the register can also
improve tCO timing. Always use the global clock for low-skew and speed-critical
signals.
■
Reduce the amount of logic between the register and output pin. Excessive logic
between the register and the output pin causes more delay. Always minimize the
amount of logic between the register and output pin for faster clock-to-output
time.
Table 13–3 shows the timing results for an EPM7064AETC100-4 device when a
combination of the Fast Input Register option, global clock, and minimal logic is
used. When the Fast Input Register option is turned on, the tSU timing is improved
(tSU decreases from 1.6 ns to 1.3 ns and from 2.8 ns to 2.5 ns). The tCO timing is
improved when the global clock is used for low-skew and speed-critical signals (tCO
decreases from 4.3 ns to 3.1 ns). However, if there is additional logic used between the
input pin and the register or the register and the output pin, the tSU and tCO delays
increase.
Table 13–3. EPM7064AETC100-4 Device Timing Results
Additional Logic Between:
Number of
Registers
tSU
(ns)
tH
(ns)
tCO
(ns)
Global
Clock Used
Fast Input
Register
Option
D Input
Location
Q Output
Location
D Input
Location &
Register
Register & Q
Output
Location
1
1.3
1.2
4.3
—
On
LAB A
LAB A
—
—
1
1.6
0.3
4.3
—
Off
LAB A
LAB A
—
—
1
2.5
0
3.1
v
On
LAB A
LAB A
—
—
1
2.8
0
3.1
v
Off
LAB A
LAB A
—
—
1
3.6
0
3.1
v
Off
LAB A
LAB A
v
—
1
2.8
0
7.0
v
Off
LAB D
LAB A
—
v
16 with the
same D and
clock inputs
2.8
0
All
6.2
v
Off
LAB D
LAB A, B
—
—
32 with the
same D and
clock inputs
2.8
0
All
6.4
v
Off
LAB C
LAB A, B, C
—
—
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Timing Optimization Techniques (Macrocell-Based CPLDs)
13–51
Improving Propagation Delay (tPD)
Achieving fast propagation delay (tPD) timing is required in many system designs.
However, if there are long delay paths through complex logic, achieving fast
propagation delays can be difficult. To improve your design’s tPD, refer to the
following guidelines:
■
On the Assignments menu, click Settings. In the Category list, select Analysis &
Synthesis Settings, and turn on Auto Parallel Expanders. Turning on the parallel
expanders for individual nodes or sub-designs can increase the performance of
complex logic functions. However, if the project’s pin or logic cell assignments use
parallel expanders placed physically together with macrocells (which can reduce
routability), parallel expanders can cause the Quartus II Fitter to have difficulties
finding and optimizing a fit. Additionally, the number of macrocells required to
implement the design increases and results in a no-fit error during compilation if
the device resources are limited. For more information about turning on the Auto
Parallel Expanders option, refer to “Resolving Macrocell Usage Issues” on
page 13–46.
■
Set the Optimization Technique to Speed. By default, the Quartus II software sets
the Optimization Technique option to Speed for MAX 7000 and MAX 3000
devices. Reset the Optimization Technique option to Speed only if you
previously set it to Area. On the Assignments menu, click Settings. In the
Category list, select Analysis & Synthesis Settings, and turn on Speed under
Optimization Technique.
Improving Maximum Frequency (fMAX)
Maintaining the system clock at or above a certain frequency is a major goal in circuit
design. For example, if you have a fully synchronous system that must run at
100 MHz, the longest delay path from the output of any register to the inputs of the
registers it feeds must be less than 10 ns. Maintaining the system clock speed can be
difficult if there are long delay paths through complex logic. Altera recommends that
you perform the following guidelines to increase your design’s clock speed (fMAX):
May 2011
■
On the Assignments menu, click Settings. In the Category list, select Analysis &
Synthesis Settings, click More Settings, and turn on Auto Parallel Expanders.
Turning on the parallel expanders for individual nodes or subdesigns can increase
the performance of complex logic functions. However, if the project’s pin or logic
cell assignments use parallel expanders placed physically together with macrocells
(which can reduce routability), parallel expanders can cause the Quartus II
compiler to have difficulties finding and optimizing a fit. Additionally, the number
of macrocells required to implement the design also increases and can result in a
no-fit error during compilation if the device’s resources are limited. For more
information about using the Auto Parallel Expanders option, refer to “Resolving
Macrocell Usage Issues” on page 13–46.
■
Use global signals or dedicated inputs. Altera MAX 7000 and MAX 3000 devices
have dedicated inputs that provide low skew and high speed for high fan-out
signals. Minimize the number of control signals in the design and use the
dedicated inputs to implement them.
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–52
Chapter 13: Area and Timing Optimization
Other Optimization Resources
■
Set the Optimization Technique to Speed. By default, the Quartus II software sets
the Optimization Technique option to Speed for MAX 7000 and MAX 3000
devices. Reset the Optimization Technique option to Speed only if you have
previously set it to Area. You can reset the Optimization Technique option. In the
Category list, select Analysis & Synthesis Settings, and turn on Speed under
Optimization Technique.
■
Pipeline the design. Pipelining, which increases clock frequency (fMAX), refers to
dividing large blocks of combinational logic by inserting registers. When using
RAM or DSP blocks, always enable the optional input and output registers.
Optimizing Source Code—Pipelining for Complex Register Logic
If the methods described in the preceding sections do not sufficiently improve your
results, modify the design at the source to achieve the desired results. Using
additional register stages (pipeline registers) consumes more device resources, but it
also lowers the propagation delay between registers, allowing you to maintain high
system clock speed.
f Refer to the application note AN 584: Timing Closure Methodology for Advanced FPGA
Designs for more information about pipelining registers and other examples of
optimizing source code.
Other Optimization Resources
The Quartus II software has additional resources to help you optimize your design for
resource, performance, compilation time, and power.
Design Space Explorer
The DSE automates the process of running multiple compilations with different
settings. You can use the DSE to try the techniques described in this chapter. The DSE
utility helps automate the process of finding the best set of options for your design.
The DSE explores the design space by applying various optimization techniques and
analyzing the results.
h For more information, refer to About Design Space Explorer in Quartus II Help.
Other Optimization Advisors
The Power Optimization Advisor provides guidance for reducing power
consumption. In addition, the Incremental Compilation Advisor provides suggestions
to improve your results when partitioning your design for a hierarchical or
team-based design flow using the Quartus II incremental compilation feature.
f For more information about using the Power Optimization Advisor, refer to the Power
Optimization chapter in volume 2 of the Quartus II Handbook. For more information
about using the Incremental Compilation Advisor, refer to the Quartus II Incremental
Compilation for Hierarchical and Team-Based Design chapter in volume 1 of the Quartus II
Handbook.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Scripting Support
13–53
Scripting Support
You can run procedures and make settings described in this chapter in a Tcl script.
You can also run some procedures at a command prompt. For detailed information
about scripting command options, refer to the Quartus II command-line and Tcl API
Help browser. To run the Help browser, type the following command at the command
prompt:
quartus_sh --qhelp r
f For more information about Tcl scripting, refer to the Tcl Scripting chapter in volume 2
of the Quartus II Handbook. For more information about all settings and constraints in
the Quartus II software, refer to the Quartus II Settings File Manual. For more
information about command-line scripting, refer to the Command-Line Scripting
chapter in volume 2 of the Quartus II Handbook.
You can specify many of the options described in this section either in an instance, or
at a global level, or both.
Use the following Tcl command to make a global assignment:
set_global_assignment -name <.qsf variable name> <value> r
Use the following Tcl command to make an instance assignment:
set_instance_assignment -name <.qsf variable name> <value> \
-to <instance name> r
1
If the <value> field includes spaces (for example, “Standard Fit”), you must enclose
the value in straight double quotation marks.
Initial Compilation Settings
The Quartus II Settings File (.qsf) variable name is used in the Tcl assignment to make
the setting along with the appropriate value. The Type column indicates whether the
setting is supported as a global setting, an instance setting, or both.
Table 13–4 lists the .qsf variable name and applicable values for the settings discussed
in “Initial Compilation: Required Settings” on page 13–2. Table 13–5 shows the list of
advanced compilation settings.
Table 13–4. Initial Compilation Settings
Setting Name
.qsf File Variable Name
Values
Type
Device Setting
DEVICE
<device part number>
Global
Use Smart Compilation
SPEED_DISK_USAGE_TRADEOFF
SMART, NORMAL
Global
Optimize IOC Register
Placement For Timing
OPTIMIZE_IOC_REGISTER_
PLACEMENT_FOR_TIMING
ON, OFF
Global
Optimize Hold Timing
OPTIMIZE_HOLD_TIMING
OFF, IO PATHS AND MINIMUM TPD PATHS,
ALL PATHS
Global
Fitter Effort
FITTER_EFFORT
STANDARD FIT, FAST FIT, AUTO FIT
Global
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–54
Chapter 13: Area and Timing Optimization
Scripting Support
Table 13–5. Advanced Compilation Settings
Setting Name
.qsf File Variable Name
Values
Type
Router Effort
Multiplier
ROUTER_EFFORT_MULTIPLIER
Any positive, non-zero value
Global
Router Timing
Optimization level
ROUTER_TIMING_OPTIMIZATION_LEVEL
NORMAL, MINIMUM, MAXIMUM
Global
Final Placement
Optimization
FINAL_PLACEMENT_OPTIMIZATION
ALWAYS, AUTOMATICALLY, NEVER
Global
Resource Utilization Optimization Techniques (LUT-Based Devices)
Table 13–6 lists the .qsf file variable name and applicable values for the settings
discussed in “Resource Utilization Optimization Techniques (LUT-Based Devices)”
on page 13–15.
Table 13–6. Resource Utilization Optimization Settings (Part 1 of 2)
Setting Name
.qsf File Variable Name
Values
Type
Auto Packed
Registers (1)
AUTO_PACKED_REGISTERS_<device family name>
OFF, NORMAL, MINIMIZE
AREA, MINIMIZE AREA
WITH CHAINS, AUTO
Global,
Instance
Perform WYSIWYG
Primitive
Resynthesis
ADV_NETLIST_OPT_SYNTH_WYSIWYG_REMAP
ON, OFF
Global,
Instance
Physical Synthesis
for Combinational
Logic for Reducing
Area
PHYSICAL_SYNTHESIS_COMBO_LOGIC_FOR_AREA
ON, OFF
Global,
Instance
Physical Synthesis
for Mapping Logic
to Memory
PHYSICAL_SYNTHESIS_MAP_LOGIC_TO_MEMORY_FOR AREA ON, OFF
Global,
Instance
Optimization
Technique
<device family name>_OPTIMIZATION_TECHNIQUE
AREA, SPEED, BALANCED
Global,
Instance
Speed Optimization
Technique for Clock
Domains
SYNTH_CRITICAL_CLOCK
ON, OFF
Instance
State Machine
Encoding
STATE_MACHINE_PROCESSING
AUTO, ONE-HOT, MINIMAL
BITS, USER-ENCODE
Global,
Instance
Auto RAM
Replacement
AUTO_RAM_RECOGNITION
ON, OFF
Global,
Instance
Auto ROM
Replacement
AUTO_ROM_RECOGNITION
ON, OFF
Global,
Instance
Auto Shift Register
Replacement
AUTO_SHIFT_REGISTER_RECOGNITION
ON, OFF
Global,
Instance
Auto Block
Replacement
AUTO_DSP_RECOGNITION
ON, OFF
Global,
Instance
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Scripting Support
13–55
Table 13–6. Resource Utilization Optimization Settings (Part 2 of 2)
Setting Name
Number of
Processors for
Parallel Compilation
.qsf File Variable Name
Values
Type
Integer between 1 and 4
inclusive, or ALL
NUM_PARALLEL_PROCESSORS
Global
Note to Table 13–6:
(1) Allowed values for this setting depend on the device family that is selected.
I/O Timing Optimization Techniques (LUT-Based Devices)
Table 13–7 lists the .qsf file variable name and applicable values for the I/O timing
optimization settings.
Table 13–7. I/O Timing Optimization Settings
Setting Name
.qsf File Variable Name
Values
Type
Optimize IOC Register Placement
For Timing
OPTIMIZE_IOC_REGISTER_PLACEMENT_FOR_TIMING
ON, OFF
Global
Fast Input Register
FAST_INPUT_REGISTER
ON, OFF
Instance
Fast Output Register
FAST_OUTPUT_REGISTER
ON, OFF
Instance
Fast Output Enable Register
FAST_OUTPUT_ENABLE_REGISTER
ON, OFF
Instance
Fast OCT Register
FAST_OCT_REGISTER
ON, OFF
Instance
Register-to-Register Timing Optimization Techniques (LUT-Based Devices)
Table 13–8 lists the .qsf file variable name and applicable values for the settings
discussed in “Register-to-Register Timing Optimization Techniques (LUT-Based
Devices)” on page 13–33.
Table 13–8. Register-to-Register Timing Optimization Settings
Setting Name
(Part 1 of 2)
.qsf File Variable Name
Values
Type
Perform WYSIWYG
Primitive Resynthesis
ADV_NETLIST_OPT_SYNTH_WYSIWYG_REMAP
ON, OFF
Global,
Instance
Perform Physical Synthesis
for Combinational Logic
PHYSICAL_SYNTHESIS_COMBO_LOGIC
ON, OFF
Global,
Instance
Perform Register
Duplication
PHYSICAL_SYNTHESIS_REGISTER_DUPLICATION
ON, OFF
Global,
Instance
Perform Register Retiming
PHYSICAL_SYNTHESIS_REGISTER_RETIMING
ON, OFF
Global,
Instance
Perform Automatic
Asynchronous Signal
Pipelining
PHYSICAL_SYNTHESIS_ASYNCHRONOUS_SIGNAL_PIPELINING ON, OFF
Global,
Instance
Physical Synthesis Effort
PHYSICAL_SYNTHESIS_EFFORT
NORMAL, EXTRA,
Global
FAST
Fitter Seed
SEED
<integer>
Global
Maximum Fan-Out
MAX_FANOUT
<integer>
Instance
Manual Logic Duplication
DUPLICATE_ATOM
<node name>
Instance
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–56
Chapter 13: Area and Timing Optimization
Conclusion
Table 13–8. Register-to-Register Timing Optimization Settings
Setting Name
(Part 2 of 2)
.qsf File Variable Name
Values
Type
Optimize Power during
Synthesis
OPTIMIZE_POWER_DURING_SYNTHESIS
NORMAL, OFF
EXTRA_EFFORT
Global
Optimize Power during
Fitting
OPTIMIZE_POWER_DURING_FITTING
NORMAL, OFF
EXTRA_EFFORT
Global
Conclusion
Using the recommended techniques described in this chapter can help you close
timing quickly on complex designs, reduce iterations by providing more intelligent
and better links between analysis and assignment tools, and balance multiple design
constraints including multiple clocks, routing resources, and area constraints.
The Quartus II software provides many features to achieve optimal results. Follow the
techniques presented in this chapter to efficiently optimize a design for area or timing
performance, or to reduce compilation time.
Document Revision History
Table 13–9 shows the revision history for this chapter.
Table 13–9. Document Revision History (Part 1 of 3)
Date
May 2011
December 2010
August 2010
July 2010
Version
11.0.0
10.1.0
10.0.1
10.0.0
Changes
■
Reorganized sections in “Initial Compilation: Optional Fitter Settings” section
■
Added new information to “Resource Utilization” section
■
Added new information to “Duplicate Logic for Fan-Out Control” section
■
Added links to Help
■
Additional edits and updates throughout chapter
■
Added links to Help
■
Updated device support
■
Added “Debugging Timing Failures in the TimeQuest Analyzer” section
■
Removed Classic Timing Analyzer references
■
Other updates throughout chapter
Corrected link
■
Moved Compilation Time Optimization Techniques section to new Reducing Compilation
Time chapter
■
Removed references to Timing Closure Floorplan
■
‘Moved Smart Compilation Setting and Early Timing Estimation sections to new
Reducing Compilation Time chapter
■
Added Other Optimization Resources section
■
Removed outdated information
■
Changed references to DSE chapter to Help links
■
Linked to Help where appropriate
■
Removed Referenced Documents section
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 13: Area and Timing Optimization
Document Revision History
13–57
Table 13–9. Document Revision History (Part 2 of 3)
Date
November 2009
March 2009
May 2011
Version
9.1.0
Changes
■
Removed unsupported Timing Closure Floorplan references
■
Removed references to unsupported device families
■
Added several notes
■
Minor text edits
■
Was chapter 8 in the 8.1.0 release.
■
Updated the following sections:
■
“Timing Analysis with the TimeQuest Timing Analyzer” on page 10–14
■
“Perform WYSIWYG Resynthesis with Balanced or Area Setting” on page 10–22
■
“Use Physical Synthesis Options to Reduce Area” on page 10–26
■
“Metastability Analysis and Optimization Techniques” on page 10–32
■
“Use Fast Regional Clock Networks and Regional Clocks Networks” on page 10–39
■
“Register-to-Register Timing Optimization Techniques (LUT-Based Devices)” on
page 10–40
■
“Physical Synthesis Optimizations” on page 10–41
■
“Duplicate Logic for Fan-Out Control” on page 10–45
■
“LogicLock Assignments” on page 10–49
■
“Enable Beneficial Skew Optimization” on page 10–48
■
“Use Multiple Processors for Parallel Compilation” on page 10–65
9.0.0
Altera Corporation
■
Removed “Analyze Your Design for Megastability”
■
Updated Table 10–11 and Table 10–9
■
Removed Tables 8-1, 8-2, 8-3, 8-6, and 8-7 from version 8.1
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
13–58
Chapter 13: Area and Timing Optimization
Document Revision History
Table 13–9. Document Revision History (Part 3 of 3)
Date
Version
November 2008
May 2008
Changes
■
Changed document to 8½” × 11” page size.
■
Updated the following sections:
8.1.0
■
“Optimizing Your Design” on page 10–2
■
“Timing Requirement Settings” on page 10–4
■
“Optimize Hold Timing” on page 10–8
■
“Limit to One Fitting Attempt” on page 10–9
■
“Auto Fit” on page 10–10
■
“Fast Fit” on page 10–11
■
“Ignored Timing Assignments” on page 10–12
■
“I/O Timing (Including tPD)” on page 10–13
■
“Register-to-Register Timing” on page 10–14
■
“Timing Analysis with the TimeQuest Timing Analyzer” on page 10–14
■
“Use I/O Assignment Analysis” on page 10–20
■
“Flatten the Hierarchy During Synthesis” on page 10–25
■
“Retarget Memory Blocks” on page 10–25
■
“Use Physical Synthesis Options to Reduce Area” on page 10–26
■
“Increase Placement Effort Multiplier” on page 10–30
■
“Metastability Analysis and Optimization Techniques” on page 10–32
■
“Synthesis Netlist Optimizations and Physical Synthesis Optimizations” on
page 10–43
■
“Incremental Compilation” on page 10–65
■
“Use Multiple Processors for Parallel Compilation” on page 10–66
■
Updated Table 10–9 on page 10–73 and Table 10–11 on page 10–75.
■
Updated links
■
Updated the following sections:
8.0.0
■
Other Optimization Resources]
■
Setting Process Priority
■
Location Assignment and Back-Annotation
■
Fitter Effort Setting
■
Synthesis Netlist Optimizations and Physical Synthesis Optimizations
■
Fast Fit
■
Added Metastability Analysis
■
Added Enable Beneficial Skew Optimization and Analyze Your Design for Metastability
■
Removed figures from “Optimizing Source Code—Pipelining for Complex Register Logic
■
Updated Table 8-5
f For previous versions of the Quartus II Handbook, refer to the Quartus II Handbook
Archive.
f Take an online survey to provide feedback about this handbook chapter.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
14. Power Optimization
December 2010
QII52016-10.0.1
QII52016-10.0.1
The Quartus® II software offers power-driven compilation to fully optimize device
power consumption. Power-driven compilation focuses on reducing your design’s
total power consumption using power-driven synthesis and power-driven
place-and-route. This chapter describes the power-driven compilation feature and
flow in detail, as well as low power design techniques that can further reduce power
consumption in your design. The techniques primarily target Arria® GX, Stratix® and
Cyclone® series of devices, and HardCopy® II devices. These devices utilize a low-k
dielectric material that dramatically reduces dynamic power and improves
performance. Arria series, Stratix II, Stratix III, Stratix IV, and Stratix V device families
include efficient logic structures called adaptive logic modules (ALMs) that obtain
maximum performance while minimizing power consumption. Cyclone device
families offer the optimal blend of high performance and low power in a low-cost
FPGA.
f For more information about a device-specific architecture, refer to the device
handbook, available from the Literature and Technical Documentation page on the
Altera website.
Altera provides the Quartus II PowerPlay Power Analyzer to aid you during the
design process by delivering fast and accurate estimations of power consumption.
You can minimize power consumption, while taking advantage of the industry’s
leading FPGA performance, by using the tools and techniques described in this
chapter.
f For more information about the PowerPlay Power Analyzer, refer to the PowerPlay
Power Analysis chapter in volume 3 of the Quartus II Handbook.
Total FPGA power consumption is comprised of I/O power, core static power, and
core dynamic power. This chapter focuses on design optimization options and
techniques that help reduce core dynamic power and I/O power. In addition to these
techniques, there are additional power optimization techniques available for
Stratix III and Stratix IV devices. These techniques include:
■
Selectable Core Voltage (available only for Stratix III devices)
■
Programmable Power Technology
■
Device Speed Grade Selection
f For more information about power optimization techniques available for Stratix III
devices, refer to AN 437: Power Optimization in Stratix III FPGAs. For more information
about power optimization techniques available for Stratix IV devices, refer to AN 514:
Power Optimization in Stratix IV FPGAs.
© 2010 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off.
and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at
www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but
reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any
information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device
specifications before relying on any published information and before placing orders for products or services.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010
Subscribe
14–2
Chapter 14: Power Optimization
Power Dissipation
Power Dissipation
This section describes the sources of power dissipation in Stratix III and Cyclone III
devices. You can refine techniques that reduce power consumption in your design by
understanding the sources of power dissipation.
Figure 14–1 shows the power dissipation of Stratix III and Cyclone III devices in
different designs. All designs were analyzed at a fixed clock rate of 100 MHz and
exhibited varied logic resource utilization across available resources.
Figure 14–1. Average Core Dynamic Power Dissipation
Average Core Dynamic Power Dissipation by Block
Type in Stratix III Devices at a 12.5% Toggle Rate (1)
Global Clock Routing
14%
Average Core Dynamic Power Dissipation by Block
Type in Cyclone III Devices at a 12.5% Toggle Rate (2)
Global Clock Routing
16%
Routing
30%
Memory
21%
Routing
29%
Memory
20%
DSP Blocks
1% (3)
Combinational Logic
16%
Combinational Logic
11%
Multipliers
1% (3)
Registered Logic
18%
Registered Logic
23%
Notes to Figure 14–1:
(1) 103 different designs were used to obtain these results.
(2) 96 different designs were used to obtain these results.
(3) In designs using DSP blocks, DSPs consumed 5% of core dynamic power.
As shown in Figure 14–1, a significant amount of the total power is dissipated in
routing for both Stratix III and Cyclone III devices, with the remaining power
dissipated in logic, clock, and RAM blocks.
In Stratix and Cyclone device families, a series of column and row interconnect wires
of varying lengths provide signal interconnections between logic array blocks (LABs),
memory block structures, and digital signal processing (DSP) blocks or multiplier
blocks. These interconnects dissipate the largest component of device power.
FPGA combinational logic is another source of power consumption. The basic
building block of logic in the latest Stratix series devices is the ALM, and in
Cyclone II, Cyclone III and Cyclone IV GX devices, it is the logic element (LE).
f For more information about ALMs and LEs in Cyclone II, Cyclone III, Cyclone IV GX,
Stratix II, Stratix III, Stratix IV, and Stratix V, devices, refer to the respective device
handbook.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Design Space Explorer
14–3
Memory and clock resources are other major consumers of power in FPGAs. Stratix II
devices feature the TriMatrix memory architecture. TriMatrix memory includes
512-bit M512 blocks, 4-Kbit M4K blocks, and 512-Kbit M-RAM blocks, which are
configurable to support many features. Stratix IV and Stratix III TriMatrix on-chip
memory is an enhancement based upon the Stratix II FPGA TriMatrix memory and
includes three sizes of memory blocks: MLAB blocks, M9K blocks, and M144K blocks.
Stratix III, Stratix IV, and Stratix V devices feature Programmable Power Technology,
an advanced architecture that enables a smooth trade-off between speed and power.
The core of each Stratix III, Stratix IV, and Stratix V device is divided into tiles, each of
which may be put into a high-speed or low-power mode. The primary benefit of
Programmable Power Technology is to reduce static power, with a secondary benefit
being a small reduction in dynamic power. Cyclone II devices have 4-Kbit M4K
memory blocks, and Cyclone III and Cyclone IV GX devices have 9-Kbit M9K
memory blocks.
Design Space Explorer
Design Space Explorer (DSE) is a simple, easy-to-use, design optimization utility that
is included in the Quartus II software. DSE explores and reports optimal Quartus II
software options for your design, targeting either power optimization, design
performance, or area utilization improvements. You can use DSE to implement the
techniques described in this chapter.
Figure 14–2 shows the DSE user interface. The Settings tab is divided into Project
Settings and Exploration Settings.
Figure 14–2. Design Space Explorer User Interface
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–4
Chapter 14: Power Optimization
Power-Driven Compilation
The Search for Lowest Power option, under Exploration Settings, uses a predefined
exploration space that targets overall design power improvements. This setting
focuses on applying different options that specifically reduce total design thermal
power.
By default, the Quartus II PowerPlay Power Analyzer is run for every exploration
performed by the DSE when the Search for Lowest Power option is selected. This
helps you debug your design and determine trade-offs between power requirements
and performance optimization.
h For more information about the DSE, refer to About Design Space Explorer in Quartus II
Help.
Power-Driven Compilation
The standard Quartus II compilation flow consists of Analysis and Synthesis,
placement and routing, Assembly, and Timing Analysis. Power-driven compilation
takes place at the Analysis and Synthesis and Place-and-Route stages.
Quartus II software settings that control power-driven compilation are located in the
PowerPlay power optimization list on the Analysis & Synthesis Settings page, and
the PowerPlay power optimization list on the Fitter Settings page. The following
sections describes these power optimization options at the Analysis and Synthesis
and Fitter levels.
Power-Driven Synthesis
Synthesis netlist optimization occurs during the synthesis stage of the compilation
flow. The optimization technique makes changes to the synthesis netlist to optimize
your design according to the selection of area, speed, or power optimization. This
section describes power optimization techniques at the synthesis level.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Power-Driven Compilation
14–5
The Analysis & Synthesis Settings page allows you to specify logic synthesis
options. The PowerPlay power optimization option is available for all devices
supported by the Quartus II software except MAX® 3000 and MAX 7000 devices.
(Figure 14–3).
Figure 14–3. Analysis & Synthesis Settings Page
Table 14–1 shows the settings in the PowerPlay power optimization list. You can
apply these settings on a project or entity level.
Table 14–1. Optimize Power During Synthesis Options
Settings
Description
Off
No netlist, placement, or routing optimizations are performed to minimize
power.
Normal compilation Low compute effort algorithms are applied to minimize power through netlist
(Default)
optimizations as long as they are not expected to reduce design performance.
Extra effort
High compute effort algorithms are applied to minimize power through netlist
optimizations. Max performance might be impacted.
The Normal compilation setting is turned on by default. This setting performs
memory optimization and power-aware logic mapping during synthesis.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–6
Chapter 14: Power Optimization
Power-Driven Compilation
Memory blocks can represent a large fraction of total design dynamic power as
described in “Reducing Memory Power Consumption” on page 14–14. Minimizing
the number of memory blocks accessed during each clock cycle can significantly
reduce memory power. Memory optimization involves effective movement of
user-defined read/write enable signals to associated read-and-write clock enable
signals for all memory types (Figure 14–4).
Figure 14–4. Memory Transformation
Data
Data
VCC
Wr Clk
Enable
Rd Clk
Enable
Wren
Write
Enable
Read
Enable
Write
Address
Read
Address
Switch
Write
Address
Q
Data
Data
VCC
Wren
Wr Clk
Enable
Rd Clk
Enable
Rden
Rden
VCC
Write
Enable
Read
Enable
VCC
Write
Address
Read
Address
Q
Read
Address
Switch
Write
Address
Clock
Q
Q
Read
Address
Clock
Figure 14–4 shows a default implementation of a simple dual-port memory block in
which write-clock enable signals and read-clock enable signals are connected to VCC,
making both read and write memory ports active during each clock cycle. Memory
transformation effectively moves the read-enable and write-enable signals to the
respective read-clock enable and write-clock enable signals. By using this technique,
memory ports are shut down when they are not accessed. This significantly reduces
your design’s memory power consumption. For more information about clock enable
signals, refer to “Reducing Memory Power Consumption” on page 14–14. For
Stratix III, Stratix IV, and Stratix V devices, the memory transformation takes place at
the Fitter level by selecting the Normal compilation settings for the power
optimization option.
In Stratix III, Cyclone III, Cyclone IV GX, and Stratix III devices, the specified
read-during-write behavior can significantly impact the power of single-port and
bidirectional dual-port RAMs. It is best to set the read-during-write parameter to
“Don’t care” (at the HDL level), as it allows an optimization whereby the read-enable
signal can be set to the inversion of the existing write-enable signal (if one exists).
This allows the core of the RAM to shut down (that is, not toggle), which saves a
significant amount of power.
The other type of power optimization that takes place with the Normal compilation
setting is power-aware logic mapping. The power-aware logic mapping reduces
power by rearranging the logic during synthesis to eliminate nets with high toggle
rates.
The Extra effort setting performs the functions of the Normal compilation setting and
other memory optimizations to further reduce memory power by shutting down
memory blocks that are not accessed. This level of memory optimization can require
extra logic, which can reduce design performance.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Power-Driven Compilation
14–7
The Extra effort setting also performs power-aware memory balancing. Power-aware
memory balancing automatically chooses the best memory configuration for your
memory implementation and provides optimal power saving by determining the
number of memory blocks, decoder, and multiplexer circuits required. If you have not
previously specified target-embedded memory blocks for your design’s memory
functions, the power-aware balancer automatically selects them during memory
implementation.
Figure 14–5 shows an example of a 4k × 4 (4k deep and 4 bits wide) memory
implementation in two different configurations using M4K memory blocks available
in Stratix II devices. The minimum logic area implementation uses M4K blocks
configured as 4k × 1. This implementation is the default in the Quartus II software
because it has the minimum logic area (0 logic cells) and the highest speed. However,
all four M4K blocks are active on each memory access in this implementation, which
increases RAM power. The minimum RAM power implementation is created by
selecting Extra effort in the PowerPlay power optimization list. This implementation
automatically uses four M4K blocks configured as 1k × 4 for optimal power saving.
An address decoder is implemented by the RAM megafunction to select which of the
four M4K blocks should be activated on a given cycle, based on the state of the top
two user address bits. The RAM megafunction automatically implements a
multiplexer to feed the downstream logic by choosing the appropriate M4K output.
This implementation reduces RAM power because only one M4K block is active on
any cycle, but it requires extra logic cells, costing logic area and potentially impacting
design performance.
There is a trade-off between power saved by accessing fewer memories and power
consumed by the extra decoder and multiplexor logic. The Quartus II software
automatically balances the power savings against the costs to choose the lowest
power configuration for each logical RAM. The benchmark data shows that the
power-driven synthesis can reduce memory power consumption by as much as 60%
in Stratix devices.
Figure 14–5. 4K × 4 Memory Implementation Using Multiple M4K Blocks
4K Words Deep &
4 Bits Wide
Minimum RAM Power
(Power Efficient)
Addr[10:11]
Minimum Logic Area
(Power Inefficient)
Addr
Decoder
4K Deep × 1 Wide
M4K RAM
1K Deep × 4 Wide
M4K RAM
Addr[0:11]
Addr[0:9]
4
Data[0:3]
Addr[10:11]
Data[0:3]
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–8
Chapter 14: Power Optimization
Power-Driven Compilation
Memory optimization options can also be controlled by the Low_Power_Mode
parameter in the Default Parameters page of the Settings dialog box. The settings for
this parameter are None, Auto, and ALL. None corresponds to the Off setting in the
PowerPlay power optimization list. Auto corresponds to the Normal compilation
setting and ALL corresponds to the Extra effort setting, respectively. You can apply
PowerPlay power optimization either on a compiler basis or on individual entities.
The Low_Power_Mode parameter always takes precedence over the Optimize Power
for Synthesis option for power optimization on memory.
You can also set the MAXIMUM_DEPTH parameter manually to configure the memory for
low power optimization. This technique is the same as the power-aware memory
balancer, but it is manual rather than automatic like the Extra effort setting in the
PowerPlay power optimization list. You can set the MAXIMUM_DEPTH parameter for
memory modules manually in the megafunction instantiation or in the MegaWizard™
Plug-In Manager for power optimization as described in “Reducing Memory Power
Consumption” on page 14–14. The MAXIMUM_DEPTH parameter always takes
precedence over the Optimize Power for Synthesis options for power optimization
on memory optimization.
h For step-by-step instructions on how to perform power-driven synthesis, refer to
Running a Power-Optimized Compilation in Quartus II Help.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Power-Driven Compilation
14–9
Power-Driven Fitter
The Fitter Settings page enables you to specify options for fitting (Figure 14–6). The
PowerPlay power optimization option is available for Arria GX, Arria II GX,
Cyclone II, Cyclone III, Cyclone IV, HardCopy series, Stratix II, Stratix II GX,
Stratix III, Stratix IV, and Stratix V devices.
Figure 14–6. Fitter Settings Page
Table 14–2 lists the settings in the PowerPlay power optimization list. These settings
can only be applied on a project-wide basis. The Extra effort setting for the Fitter
requires extensive effort to optimize the design for power and can increase the
compilation time.
Table 14–2. Power-Driven Fitter Option
Settings
Off
Description
No netlist, placement, or routing optimizations are performed to minimize power.
Normal compilation Low compute effort algorithms are applied to minimize power through placement and routing
(Default)
optimizations as long as they are not expected to reduce design performance.
Extra effort
December 2010
High compute effort algorithms are applied to minimize power through placement and routing
optimizations. Max performance might be impacted.
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–10
Chapter 14: Power Optimization
Power-Driven Compilation
The Normal compilation setting is selected by default and performs DSP
optimization by creating power-efficient DSP block configurations for your DSP
functions. For Stratix III, Stratix IV, and Stratix V devices, this setting, which is based
on timing constraints entered for the design, enables the Programmable Power
Technology to configure tiles as high-speed mode or low-power mode. Programmable
Power Technology is always turned ON even when the OFF setting is selected for the
Fitter PowerPlay power optimization option. Tiles are the combination of LAB and
MLAB pairs (including the adjacent routing associated with LAB and MLAB), which
can be configured to operate in high-speed or low-power mode. This level of power
optimization does not have any affect on the fitting, timing results, or compile time.
Also, for Stratix III devices, this setting enables the memory transformation as
described in “Power-Driven Synthesis” on page 14–4.
f For more information about Stratix III power optimization, refer to AN 437: Power
Optimization in Stratix III FPGAs. For more information about Stratix IV power
optimization, refer to AN 514: Power Optimization in Stratix IV FPGAs.
The Extra effort setting performs the functions of the Normal compilation setting and
other place-and-route optimizations during fitting to fully optimize the design for
power. The Fitter applies an extra effort to minimize power even after timing
requirements have been met by effectively moving the logic closer during placement
to localize high-toggling nets, and using routes with low capacitance. However, this
effort can increase the compilation time.
The Extra effort setting uses a Value Change Dump File (.vcd) that guides the Fitter to
fully optimize the design for power, based on the signal activity of the design. The
best power optimization during fitting results from using the most accurate signal
activity information. Signal activities from full post-fit netlist (timing) simulation
provide the highest accuracy because all node activities reflect the actual design
behavior, provided that supplied input vectors are representative of typical design
operation. If you do not have a .vcd file, the Quartus II software uses assignments,
clock assignments, and vectorless estimation values (PowerPlay Power Analyzer Tool
settings) to estimate the signal activities. This information is used to optimize your
design for power during fitting. The benchmark data shows that the power-driven
Fitter technique can reduce power consumption by as much as 19% in Stratix devices.
On average, you can reduce core dynamic power by 16% with the Extra effort
synthesis and Extra effort fitting settings, as compared to the Off settings in both
synthesis and Fitter options for power-driven compilation.
1
Only the Extra effort setting in the PowerPlay power optimization list for the Fitter
option uses the signal activities (from .vcd files) during fitting. The settings made in
the PowerPlay Power Analyzer Settings page in the Settings dialog box are used to
calculate the signal activity of your design.
f For more information about .vcd files and how to create them, refer to the PowerPlay
Power Analysis chapter in volume 3 of the Quartus II Handbook.
h For step-by-step instructions on how to perform power-driven fitting, refer to
Running a Power-Optimized Compilation in Quartus II Help.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Power-Driven Compilation
14–11
Area-Driven Synthesis
Using area optimization rather than timing or delay optimization during synthesis
saves power because you use fewer logic blocks. Using less logic usually means less
switching activity. The Quartus II integrated synthesis tool provides Speed, Balanced,
or Area for the Optimization Technique option. You can also specify this logic option
for specific modules in your design with the Assignment Editor in cases where you
want to reduce area using the Area setting (potentially at the expense of register-toregister timing performance) while leaving the default Optimization Technique
setting at Balanced (for the best trade-off between area and speed for certain device
families). The Speed Optimization Technique can increase the resource usage of your
design if the constraints are too aggressive, and can also result in increased power
consumption.
The benchmark data shows that the area-driven technique can reduce power
consumption by as much as 31% in Stratix devices and as much as 15% in Cyclone
devices.
Gate-Level Register Retiming
You can also use gate-level register retiming to reduce circuit switching activity.
Retiming shuffles registers across combinational blocks without changing design
functionality. The Perform gate-level register retiming option in the Quartus II
software enables the movement of registers across combinational logic to balance
timing, allowing the software to trade off the delay between timing critical and
noncritical timing paths.
Retiming uses fewer registers than pipelining. Figure 14–7 shows an example of
gate-level register retiming, where the 10 ns critical delay is reduced by moving the
register relative to the combinational logic, resulting in the reduction of data depth
and switching activity.
Figure 14–7. Gate-Level Register Retiming
Before
D
Q
10 ns
D
Q
5 ns
D
Q
8 ns
D
Q
After
D
December 2010
Altera Corporation
Q
7 ns
D
Q
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–12
Chapter 14: Power Optimization
Design Guidelines
1
Gate-level register retiming makes changes at the gate level. If you are using an atom
netlist from a third-party synthesis tool, you must also select the Perform WYSIWYG
primitive resynthesis option to undo the atom primitives to gates mapping (so that
register retiming can be performed), and then to remap gates to Altera primitives.
When using Quartus II integrated synthesis, retiming occurs during synthesis before
the design is mapped to Altera primitives. The benchmark data shows that the
combination of WYSIWYG remapping and gate-level register retiming techniques can
reduce power consumption by as much as 6% in Stratix devices and as much as 21%
in Cyclone devices.
f For more information about register retiming, refer to the Netlist Optimizations and
Physical Synthesis chapter in volume 2 of the Quartus II Handbook.
Design Guidelines
Several low-power design techniques can reduce power consumption when applied
during FPGA design implementation. This section provides detailed design
techniques for Cyclone II, Cyclone III, Cyclone IV GX, Stratix II, and Stratix III devices
that affect overall design power. The results of these techniques might be different
from design to design.
Clock Power Management
Clocks represent a significant portion of dynamic power consumption due to their
high switching activity and long paths. Figure 14–1 on page 14–2 shows a 14%
average contribution to power consumption for global clock routing in Stratix III
devices and 16% in Cyclone III devices. Actual clock-related power consumption is
higher than this because the power consumed by local clock distribution within logic,
memory, and DSP or multiplier blocks is included in the power consumption for the
respective blocks.
Clock routing power is automatically optimized by the Quartus II software, which
enables only those portions of the clock network that are required to feed downstream
registers. Power can be further reduced by gating clocks when they are not required.
It is possible to build clock-gating logic, but this approach is not recommended
because it is difficult to generate a glitch free clock in FPGAs using ALMs or LEs.
Arria GX, Arria II GX, Cyclone III, Cyclone IV, Stratix II, Stratix III, Stratix IV, and
Stratix V devices use clock control blocks that include an enable signal. A clock
control block is a clock buffer that lets you dynamically enable or disable the clock
network and dynamically switch between multiple sources to drive the clock
network. You can use the Quartus II MegaWizard Plug-In Manager to create this clock
control block with the ALTCLKCTRL megafunction. Arria GX, Arria II GX,
Cyclone III, Cyclone IV, Stratix II, Stratix III, Stratix IV, and Stratix V devices provide
clock control blocks for global clock networks. In addition, Stratix II, Stratix III,
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Design Guidelines
14–13
Stratix IV, and Stratix V devices have clock control blocks for regional clock networks.
The dynamic clock enable feature lets internal logic control the clock network. When a
clock network is powered down, all the logic fed by that clock network does not
toggle, thereby reducing the overall power consumption of the device. Figure 14–8
shows a 4-input clock control block diagram.
Figure 14–8. Clock Control Block Diagram
ena
inclk 3×
inclk 2×
inclk 1×
inclk 0×
outclk
clkselect[1..0]
The enable signal is applied to the clock signal before being distributed to global
routing. Therefore, the enable signal can either have a significant timing slack (at least
as large as the global routing delay) or it can reduce the fMAX of the clock signal.
f For more information about using clock control blocks, refer to the Clock Control Block
Megafunction User Guide (ALTCLKCTRL).
Another contributor to clock power consumption is the LAB clock that distributes a
clock to the registers within a LAB. LAB clock power can be the dominant contributor
to overall clock power. For example, in Cyclone III devices, each LAB can use two
clocks and two clock enable signals, as shown in Figure 14–9. Each LAB’s clock signal
and clock enable signal are linked. For example, an LE in a particular LAB using the
labclk1 signal also uses the labclkena1 signal.
Figure 14–9. LAB-Wide Control Signals
Dedicated
LAB Row
Clocks
6
Local
Interconnect
Local
Interconnect
Local
Interconnect
Local
Interconnect
labclkena1
labclk1
December 2010
Altera Corporation
labclkena2
labclk2
labclr1
syncload
synclr
labclr2
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–14
Chapter 14: Power Optimization
Design Guidelines
To reduce LAB-wide clock power consumption without disabling the entire clock tree,
use the LAB-wide clock enable to gate the LAB-wide clock. The Quartus II software
automatically promotes register-level clock enable signals to the LAB-level. All
registers within an LAB that share a common clock and clock enable are controlled by
a shared gated clock. To take advantage of these clock enables, use a clock enable
construct in the relevant HDL code for the registered logic.
LAB-Wide Clock Enable Example
The VHDL code in Example 14–1 makes use of a LAB-wide clock enable. This
clock-gating logic is automatically turned into an LAB-level clock enable signal.
Example 14–1.
IF clk'event AND clock = '1' THEN
IF logic_is_enabled = '1' THEN
reg <= value;
ELSE
reg <= reg;
END IF;
END IF;
f For more information about LAB-wide control signals, refer to the Stratix II
Architecture, Cyclone III Device Family Overview, or Cyclone II Architecture chapters in
the respective device handbook.
Reducing Memory Power Consumption
The memory blocks in FPGA devices can represent a large fraction of typical core
dynamic power. Memory consumes approximately 20% of the core dynamic power in
typical Cyclone III and Stratix III device designs. Memory blocks are unlike most
other blocks in the device because most of their power is tied to the clock rate, and is
insensitive to the toggle rate on the data and address lines.
When a memory block is clocked, there is a sequence of timed events that occur
within the block to execute a read or write. The circuitry controlled by the clock
consumes the same amount of power regardless of whether or not the address or data
has changed from one cycle to the next. Thus, the toggle rate of input data and the
address bus have no impact on memory power consumption.
The key to reducing memory power consumption is to reduce the number of memory
clocking events. You can achieve this through clock network-wide gating described in
“Clock Power Management” on page 14–12, or on a per-memory basis through use of
the clock enable signals on the memory ports. Figure 14–10 shows the logical view of
the internal clock of the memory block. Use the appropriate enable signals on the
memory to make use of the clock enable signal instead of gating the clock.
Figure 14–10. Memory Clock Enable Signal
1
Enable
0
Internal Memory Clk
Clk
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Design Guidelines
14–15
Using the clock enable signal enables the memory only when necessary and shuts it
down for the rest of the time, reducing the overall memory power consumption. You
can use the MegaWizard Plug-In Manager to create these enable signals by selecting
the Clock enable signal option for the appropriate port when generating the memory
block function (Figure 14–11).
Figure 14–11. MegaWizard Plug-In Manager RAM 2-Port Clock Enable Signal Selectable Option
For example, consider a design that contains a 32-bit-wide M4K memory block in
ROM mode that is running at 200 MHz. Assuming that the output of this block is only
required approximately every four cycles, this memory block will consume 8.45 mW
of dynamic power according to the demands of the downstream logic. By adding a
small amount of control logic to generate a read clock enable signal for the memory
block only on the relevant cycles, the power can be cut 75% to 2.15 mW.
You can also use the MAXIMUM_DEPTH parameter in your memory megafunction to save
power in Cyclone II, Cyclone III, Cyclone IV GX, Stratix II, Stratix III, Stratix IV, and
Stratix V devices; however, this approach might increase the number of LEs required
to implement the memory and affect design performance.
You can set the MAXIMUM_DEPTH parameter for memory modules manually in the
megafunction instantiation or in the MegaWizard Plug-In Manager (Figure 14–12).
The Quartus II software automatically chooses the best design memory configuration
for optimal power, as described in “Power-Driven Compilation” on page 14–4.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–16
Chapter 14: Power Optimization
Design Guidelines
Figure 14–12. MegaWizard Plug-In Manager RAM 2-Port Maximum Depth Selectable Option
Memory Power Reduction Example
Table 14–3 shows power usage measurements for a 4K × 36 simple dual-port memory
implemented using multiple M4K blocks in a Stratix II EP2S15 device. For each
implementation, the M4K blocks are configured with a different memory depth.
Table 14–3. 4K × 36 Simple Dual-Port Memory Implemented Using Multiple M4K Blocks
M4K Configuration
Number of M4K Blocks
ALUTs
4K × 1 (Default setting)
36
0
2K × 2
36
40
1K × 4
36
62
512 × 9
32
143
256 × 18
32
302
128 × 36
32
633
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Design Guidelines
14–17
Figure 14–13 shows the amount of power saved using the MAXIMUM_DEPTH parameter.
For all implementations, a user-provided read enable signal is present to indicate
when read data is required. Using this power-saving technique can reduce power
consumption by as much as 60%.
Power Savings
Figure 14–13. Power Savings Using the MAXIMUM_DEPTH Parameter
70%
60%
50%
40%
30%
20%
10%
0%
4K × 1
2K × 2
1K × 4
512 × 9
M4K Configuration
256 × 18
128 × 36
As the memory depth becomes more shallow, memory dynamic power decreases
because unaddressed M4K blocks can be shut off using a decoded combination of
address bits and the read enable signal. For a 128-deep memory block, power used by
the extra LEs starts to outweigh the power gain achieved by using a more shallow
memory block depth. The power consumption of the memory blocks and associated
LEs depends on the memory configuration.
Pipelining and Retiming
Designs with many glitches consume more power because of faster switching activity.
Glitches cause unnecessary and unpredictable temporary logic switches at the output
of combinational logic. A glitch usually occurs when there is a mismatch in input
signal timing leading to unequal propagation delay.
For example, consider an input change on one input of a 2-input XOR gate from 1 to 0,
followed a few moments later by an input change from 0 to 1 on the other input. For a
moment, both inputs become 1 (high) during the state transition, resulting in 0 (low)
at the output of the XOR gate. Subsequently, when the second input transition takes
place, the XOR gate output becomes 1 (high). During signal transition, a glitch is
produced before the output becomes stable, as shown in Figure 14–14. This glitch can
propagate to subsequent logic and create unnecessary switching activity, increasing
power consumption. Circuits with many XOR functions, such as arithmetic circuits or
cyclic redundancy check (CRC) circuits, tend to have many glitches if there are several
levels of combinational logic between registers.
Figure 14–14. XOR Gate Showing Glitch at the Output
A
A
B
Q
B
Glitch
XOR (Exclusive OR) Gate
Q
t
Timing Diagram for the 2-Input XOR Gate
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–18
Chapter 14: Power Optimization
Design Guidelines
Pipelining can reduce design glitches by inserting flipflops into long combinational
paths. Flipflops do not allow glitches to propagate through combinational paths.
Therefore, a pipelined circuit tends to have less glitching. Pipelining has the
additional benefit of generally allowing higher clock speed operations, although it
does increase the latency of a circuit (in terms of the number of clock cycles to a first
result). Figure 14–15 shows an example where pipelining is applied to break up a long
combinational path.
Figure 14–15. Pipelining Example
Non-Pipelined
Combinational
Logic
D
Q
Long Logic
Depth
D
Q
Pipelined
Combinational
Logic
D
Q
Short Logic
Depth
Combinational
Logic
D
Q
Short Logic
Depth
D
Q
Pipelining is very effective for glitch-prone arithmetic systems because it reduces
switching activity, resulting in reduced power dissipation in combinational logic.
Additionally, pipelining allows higher-speed operation by reducing logic-level
numbers between registers. The disadvantage of this technique is that if there are not
many glitches in your design, pipelining can increase power consumption by adding
unnecessary registers. Pipelining can also increase resource utilization. The
benchmark data shows that pipelining can reduce dynamic power consumption by as
much as 30% in Cyclone and Stratix devices.
Architectural Optimization
You can use design-level architectural optimization by taking advantage of specific
device architecture features. These features include dedicated memory and DSP or
multiplier blocks available in FPGA devices to perform memory or arithmetic-related
functions. You can use these blocks in place of LUTs to reduce power consumption.
For example, you can build large shift registers from RAM-based FIFO buffers instead
of building the shift registers from the LE registers.
The Stratix device family allows you to efficiently target small, medium, and large
memories with the TriMatrix memory architecture. Each TriMatrix memory block is
optimized for a specific function. The M512 memory blocks available in Stratix II
devices are useful for implementing small FIFO buffers, DSP, and clock domain
transfer applications. M512 memory blocks are more power-efficient than the
distributed memory structures in some competing FPGAs. The M4K memory blocks
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Design Guidelines
14–19
are used to implement buffers for a wide variety of applications, including processor
code storage, large look-up table implementation, and large memory applications.
The M-RAM blocks are useful in applications where a large volume of data must be
stored on-chip. Effective utilization of these memory blocks can have a significant
impact on power reduction in your design.
The latest Stratix and Cyclone device families have configurable M9K memory blocks
that provide various memory functions such as RAM, FIFO buffers, and ROM.
f For more information about using DSP and memory blocks efficiently, refer to the
Area and Timing Optimization chapter in volume 2 of the Quartus II Handbook.
I/O Power Guidelines
Nonterminated I/O standards such as LVTTL and LVCMOS have a rail-to-rail output
swing. The voltage difference between logic-high and logic-low signals at the output
pin is equal to the VCCIO supply voltage. If the capacitive loading at the output pin is
known, the dynamic power consumed in the I/O buffer can be calculated as shown in
Equation 14–1:
Equation 14–1. Capacitive loading at the output pin
P = 0.5  F  C  V
2
In this equation, F is the output transition frequency and C is the total load
capacitance being switched. V is equal to VCCIO supply voltage. Because of the
quadratic dependence on VCCIO, lower voltage standards consume significantly less
dynamic power.
Transistor-to-transistor logic (TTL) I/O buffers consume very little static power. As a
result, the total power consumed by a LVTTL or LVCMOS output is highly dependent
on load and switching frequency.
When using resistively terminated I/O standards like SSTL and HSTL, the output
load voltage swings by a small amount around some bias point. The same dynamic
power equation is used, where V is the actual load voltage swing. Because this is
much smaller than VCCIO, dynamic power is lower than for nonterminated I/O under
similar conditions. These resistively terminated I/O standards dissipate significant
static (frequency-independent) power, because the I/O buffer is constantly driving
current into the resistive termination network. However, the lower dynamic power of
these I/O standards means they often have lower total power than LVCMOS or
LVTTL for high-frequency applications. Use the lowest drive strength I/O setting that
meets your speed and waveform requirements to minimize I/O power when using
resistively terminated standards.
You can save a small amount of static power by connecting unused I/O banks to the
lowest possible VCCIO voltage of 1.2 V.
Table 14–4 shows the total supply and thermal power consumed by outputs using
different I/O standards for Stratix II devices. The numbers are for an I/O pin
transmitting random data clocked at 200 MHz with a 10 pF capacitive load.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–20
Chapter 14: Power Optimization
Design Guidelines
For this configuration, nonterminated standards generally use less power, but this is
not always the case. If the frequency or the capacitive load is increased, the power
consumed by nonterminated outputs increases faster than the power of terminated
outputs.
Table 14–4. I/O Power for Different I/O Standards in Stratix II Devices
Standard
Total Supply Current Drawn from
VCCIO Supply (mA)
Total On-Chip Thermal Power
Dissipation (mW)
3.3-V LVTTL
2.42
9.87
2.5-V LVCMOS
1.9
6.69
1.8-V LVCMOS
1.34
4.18
1.5-V LVCMOS
1.18
3.58
3.3-V PCI
2.47
10.23
SSTL-2 class I
6.07
4.42
SSTL-2 class II
10.72
5.1
SSTL-18 class I
5.33
3.28
SSTL-18 class II
8.56
4.06
HSTL-15 class I
6.06
3.49
HSTL-15 class II
11.08
4.87
HSTL-18 class I
6.87
4.09
HSTL-18 class II
12.33
5.82
f For more information about I/O standards, refer to the Selectable I/O Standards in
Stratix II Devices and Stratix II GX Devices chapter in volume 2 of the Stratix II Device
Handbook, the Stratix III Device I/O Features chapter in volume 1 of the Stratix III Device
Handbook, the I/O Features in Stratix IV Devices in volume 1 of the Stratix IV Device
Handbook, or the Selectable I/O Standards in Cyclone II Devices chapter in the Cyclone II
Device Handbook, the Cyclone III Device Handbook, or the Cyclone IV GX Handbook.
When calculating I/O power, the PowerPlay Power Analyzer uses the default
capacitive load set for the I/O standard in the Capacitive Loading page of the Device
and Pin Options dialog box. For Stratix II devices, if Enable Advanced I/O Timing is
turned on, I/O power is measured using an equivalent load calculated as the sum of
the near capacitance, the transmission line distributed capacitance, and the far-end
capacitance as defined in the Board Trace Model page of the Device and Pin Options
dialog box or the Board Trace Model view in the Pin Planner. Any other components
defined in the board trace model are not taken into account for the power
measurement.
For Cyclone III, Cyclone IV GX, Stratix III, Stratix IV, and Stratix V, devices, Advanced
I/O Timing, which uses the full board trace model, is always used.
f For information about using Advanced I/O Timing and configuring a board trace
model, refer to the I/O Management chapter in volume 2 of the Quartus II Handbook.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Design Guidelines
14–21
Dynamically Controlled On-Chip Terminations
Stratix V, Stratix IV and Stratix III FPGAs offer dynamic on-chip termination (OCT).
Dynamic OCT enables series termination (RS) and parallel termination (RT) to
dynamically turn on/off during the data transfer. This feature is especially useful
when Stratix V, Stratix IV and Stratix III FPGAs are used with external memory
interfaces, such as interfacing with DDR memories.
Compared to conventional termination, dynamic OCT reduces power consumption
significantly as it eliminates the constant DC power consumed by parallel termination
when transmitting data. Parallel termination is extremely useful for applications that
interface with external memories where I/O standards, such as HSTL and SSTL, are
used. Parallel termination supports dynamic OCT, which is useful for bidirectional
interfaces (see Figure 14–16).
Figure 14–16. Stratix III On-Chip Parallel Termination
Stratix III OCT
VCCIO
100
Zo = 50
VREF
100
Transmitter
GND
Receiver
The following is an example of power saving for a DDR3 interface using on-chip
parallel termination.
The static current consumed by parallel OCT is equal to the VCCIO voltage divided by
100  . For DDR3 interfaces that use SSTL-15, the static current is 1.5 V/100  = 15
mA per pin. Therefore, the static power is 1.5 V ×15 mA = 22.5 mW. For an interface
with 72 DQ and 18 DQS pins, the static power is 90 pins × 22.5 mW = 2.025 W.
Dynamic parallel OCT disables parallel termination during write operations, so if
writing occurs 50% of the time, the power saved by dynamic parallel OCT is 50% ×
2.025 W = 1.0125 W.
f For more information about dynamic OCT in Stratix IV and Stratix III devices, refer to
the Stratix III Device I/O Features chapter in the Stratix III Device Handbook and the
Stratix IV Device I/O Features chapter in the Stratix IV Device Handbook, respectively.
Power Optimization Advisor
The Quartus II software includes the Power Optimization Advisor, which provides
specific power optimization advice and recommendations based on the current
design project settings and assignments. The advisor covers many of the suggestions
listed in this chapter. The following example shows how to reduce your design power
with the Power Optimization Advisor.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–22
Chapter 14: Power Optimization
Design Guidelines
Power Optimization Advisor Example
After compiling your design, run the PowerPlay Power Analyzer to determine your
design power and to see where power is dissipated in your design. Based on this
information, you can run the Power Optimization Advisor to implement
recommendations that can reduce design power. Figure 14–17 shows the Power
Optimization Advisor after compiling a design that is not fully optimized for power.
Figure 14–17. Power Optimization Advisor
The Power Optimization Advisor shows the recommendations that can reduce power
in your design. The recommendations are split into stages to show the order in which
you should apply the recommended settings. The first stage shows mostly CAD
setting options that are easy to implement and highly effective in reducing design
power. An icon indicates whether each recommended setting is made in the current
project. In Figure 14–17, the checkmark icons for Stage 1 shows the recommendations
that are already implemented. The warning icons indicate recommendations that are
not followed for this compilation. The information icon shows the general
suggestions. Each recommendation includes the description, summary of the effect of
the recommendation, and the action required to make the appropriate setting.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Design Guidelines
14–23
There is a link from each recommendation to the appropriate location in the
Quartus II user interface where you can change the setting. You can change the
Power-Driven Synthesis setting by clicking Open Settings dialog box - Analysis &
Synthesis Settings page (Figure 14–18). The Settings dialog box is shown with the
Analysis & Synthesis Settings page selected, where you can change the PowerPlay
power optimization settings.
Figure 14–18. Analysis & Synthesis Settings Page
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–24
Chapter 14: Power Optimization
Document Revision History
After making the recommended changes, recompile your design. The Power
Optimization Advisor indicates with green check marks that the recommendations
were implemented successfully (Figure 14–19). You can use the PowerPlay Power
Analyzer to verify your design power results.
Figure 14–19. Implementation of Power Optimization Advisor Recommendations
The recommendations listed in Stage 2 generally involve design changes, rather than
CAD settings changes as in Stage 1. You can use these recommendations to further
reduce your design power consumption. Altera recommends that you implement
Stage 1 recommendations first, then the Stage 2 recommendations.
Conclusion
The combination of a smaller process technology, the use of low-k dielectric material,
and reduced supply voltage significantly reduces dynamic power consumption in the
latest FPGAs. To further reduce your dynamic power, use the design
recommendations presented in this chapter to optimize resource utilization and
minimize power consumption.
Document Revision History
Table 14–5 shows the revision history for this chapter.
Table 14–5. Document Revision History (Part 1 of 2)
Date
December 2010
July 2010
Version
10.0.1
10.0.0
Changes
Template update.
■
Was chapter 11 in the 9.1.0 release
■
Updated Figures 14-2, 14-3, 14-6, 14-18, 14-19, and 14-20
■
Updated device support
■
Minor editorial updates
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 14: Power Optimization
Document Revision History
14–25
Table 14–5. Document Revision History (Part 2 of 2)
Date
Version
November 2009
March 2009
9.0.0
November 2008
May 2008
9.1.0
8.1.0
8.0.0
Changes
■
Updated Figure 11-1 and associated references
■
Updated device support
■
Minor editorial update
■
Was chapter 9 in the 8.1.0 release
■
Updated for the Quartus II software release
■
Added benchmark results
■
Removed several sections
■
Updated Figure 14–1, Figure 14–17, Figure 14–18, and Figure 14–19
■
Changed to 8½” × 11” page size
■
Changed references to altsyncram to RAM
■
Minor editorial updates
■
Added support for Stratix IV devices
■
Updated Table 9–1 and 9–9
■
Updated “Architectural Optimization” on page 9–22
■
Added “Dynamically-Controlled On-Chip Terminations” on page 9–26
■
Updated “Referenced Documents” on page 9–29
■
Updated references
f For previous versions of the Quartus II Handbook, refer to the Quartus II Handbook
Archive.
f Take an online survey to provide feedback about this handbook chapter.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
14–26
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
Chapter 14: Power Optimization
Document Revision History
December 2010 Altera Corporation
15. Analyzing and Optimizing the Design
Floorplan with the Chip Planner
May 2011
QII52006-11.0.0
QII52006-11.0.0
As FPGA designs grow larger in density, the ability to analyze the design for
performance, routing congestion, and logic placement to meet the design
requirements becomes critical. This chapter discusses how to analyze the design
floorplan with the Chip Planner.
You can perform design analysis and create and optimize the design floorplan with
the Chip Planner. To make I/O assignments, use the Pin Planner.
f For information about the Pin Planner, refer to the I/O Management chapter in
volume 2 of the Quartus II Handbook.
f You can use the Design Partition Planner with the Chip Planner to customize the
floorplan of your design. For more information, refer to the Quartus II Incremental
Compilation for Hierarchical and Team-Based Design and the Best Practices for Incremental
Compilation Partitions and Floorplan Assignments chapters in volume 1 of the Quartus II
Handbook.
This chapter includes the following topics:
■
“Chip Planner Overview”
■
“LogicLock Regions” on page 15–3
■
“Using LogicLock Regions in the Chip Planner” on page 15–10
■
“Design Floorplan Analysis Using the Chip Planner” on page 15–11
■
“Scripting Support” on page 15–20
h For a list of devices supported by the Chip Planner, refer to About the Chip Planner in
Quartus II Help.
f For more information about the Chip Planner, refer to the Altera Training page of the
Altera website.
Chip Planner Overview
The Chip Planner provides a visual display of chip resources. The Chip Planner can
show logic placement, LogicLock regions, relative resource usage, detailed routing
information, fan-in and fan-out connections between nodes, timing paths between
registers, delay estimates for paths, and routing congestion information.
© 2011 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off.
and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at
www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but
reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any
information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device
specifications before relying on any published information and before placing orders for products or services.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011
Subscribe
15–2
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Chip Planner Overview
You can also make assignment changes with the Chip Planner, such as creating and
deleting resource assignments, and you can perform post-compilation changes such
as creating, moving, and deleting logic cells and I/O atoms. With the Chip Planner,
you can view and create assignments for a design floorplan, perform power and
design analyses, and implement ECOs. With the Chip Planner and Resource Property
Editor, you can change connections between resources and make post-compilation
changes to the properties of logic cells, I/O elements, PLLs, and RAM and digital
signal processing (DSP) blocks.
f For details about how to implement ECOs in your design using the Chip Planner in
the Quartus II software, refer to the Engineering Change Management with the Chip
Planner chapter in volume 2 of the Quartus II Handbook.
Starting the Chip Planner
To start the Chip Planner, on the Tools menu, click Chip Planner (Floorplan & Chip
Editor). You can also start the Chip Planner by the following methods:
■
Click the Chip Planner icon on the Quartus II software toolbar
■
On the Shortcut menu in the following tools, click Locate and then click Locate in
Chip Planner (Floorplan and Chip Editor):
■
Design Partition Planner
■
Compilation Report
■
LogicLock Regions window
■
Technology Map Viewer
■
Project Navigator window
■
RTL source code
■
Node Finder
■
Simulation Report
■
RTL Viewer
■
Report Timing panel of the TimeQuest Timing Analyzer
Chip Planner Toolbar
The Chip Planner provides powerful tools for design analysis with a GUI. You can
access Chip Planner commands from the View menu and the Shortcut menu, or by
clicking the icons on the toolbar.
Chip Planner Tasks, Layers, and Editing Modes
The Chip Planner models types of resource objects as unique display layers, and uses
tasks— which are predefined sets of layer settings—to control the display of
resources. The Chip Planner provides a set of default tasks, and you can create custom
tasks to customize the display for your particular needs. The Basic, Detailed, and
Floorplan Editing tasks provided with the Chip Planner are useful for general ECO
and assignment-related activities, while the Partition Planner, Power, and Routing
Congestion tasks are optimized for specific activities.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
LogicLock Regions
15–3
The Chip Planner has two editing modes, which determine the types of operations
that you can perform. The Assignment editing mode allows you to make assignment
changes that are applied by the Fitter during the next place and route operation. The
ECO editing mode allows you to make post-compilation changes, commonly referred
to as engineering change orders (ECOs).
You should choose the editing mode appropriate for the work that you want to
perform, and a task that displays the resources that you want to view, in a level of
detail appropriate for your design.
Locate History Window
As you optimize your design floorplan, you might have to locate a path or node in the
Chip Planner many times. The Locate History window lists all the nodes and paths
you have displayed using a Locate in Chip Planner (Floorplan and Chip Editor)
command, providing easy access to the nodes and paths of interest to you. If you
locate a required path from the TimeQuest Timing Analyzer Report Timing pane, the
Locate History window displays the required clock path. If you locate an arrival path
from the TimeQuest Timing Analyzer Report Timing pane, the Locate History
window displays the path from the arrival clock to the arrival data. Double-clicking a
node or path in the Locate History window displays the selected node or path in the
Chip Planner.
f For more information about the Chip Planner, refer to About the Chip Planner and
Layers Settings Dialog Box in Quartus II Help. For more information about the ECO
editing mode, refer to the Engineering Change Management with the Chip Planner
chapter in volume 2 of the Quartus II Handbook.
LogicLock Regions
LogicLock regions are floorplan location constraints that help you place logic on the
target device. When you assign entity instances or nodes to a LogicLock region, you
direct the Fitter to place those entity instances or nodes within the region during
fitting. Your floorplan can contain several LogicLock regions.
A LogicLock region is defined by its height, width, and location; you can specify the
size or location of a region, or both, or the Quartus II software can generate these
properties automatically. The Quartus II software bases the size and location of a
region on the contents of the region and the timing requirements of the module.
Table 15–1 describes the options for creating LogicLock regions.
Table 15–1. Types of LogicLock Regions (Part 1 of 2)
Property
Value
Behavior
State
Floating (1),
Locked
Floating allows the Quartus II software to determine the location of the region on the device.
Floating regions are shown with a dashed boundary in the floorplan. Locked allows you to
specify the location of the region. Locked regions are shown with a solid boundary in the
floorplan. A locked region must have a fixed size.
Size
Auto (1),
Fixed
Allows the Quartus II software to determine the appropriate size of a region given its contents.
Fixed regions have a shape and size that you define.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–4
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
LogicLock Regions
Table 15–1. Types of LogicLock Regions (Part 2 of 2)
Property
Value
Behavior
Reserved
Off (1),
On
Allows you to define whether the Fitter can use the resources within a region for entities that are
not assigned to the region. If the reserved property is turned on, only items assigned to the
region can be placed within its boundaries.
Origin
Any
Floorplan
Location
Specifies the location of the LogicLock region on the floorplan. For Arria series, Stratix,
Cyclone series, MAX II, and MAX V devices, the origin is located in the lower left corner of the
LogicLock region. For other Altera® device families, the origin is located in the upper left corner
of the LogicLock region.
Note to Table 15–1:
(1) Default value.
1
The Quartus II software cannot automatically define the size of a region if the location
is locked. Therefore, if you want to specify the exact location of the region, you must
also specify the size.
f You can use the Design Partition Planner in conjunction with LogicLock regions to
create a floorplan for your design. For more information about using the Design
Partition Planner, refer to the Quartus II Incremental Compilation for Hierarchical and
Team-Based Designs and the Best Practices for Incremental Compilation Partition and
Floorplan Assignments chapters in volume 1 of the Quartus II Handbook.
Creating LogicLock Regions
You can create LogicLock Regions with the Project Navigator, the LogicLock Regions
window, the Design Partition Planner, the Chip Planner, and with Tcl commands.
Creating LogicLock Regions with the Project Navigator
After you perform either a full compilation or analysis and elaboration on the design,
the Quartus II software displays the hierarchy of the design. On the View menu, click
Project Navigator. With the hierarchy of the design fully expanded, right-click on any
design entity in the design, and click Create New LogicLock Region to create a
LogicLock region and assign the entity to the new region.
Creating LogicLock Regions with the LogicLock Regions window
To create a LogicLock region with the LogicLock Regions window, on the
Assignments menu, click LogicLock Regions Window. In the LogicLock Regions
window, click <<new>>.
Creating LogicLock Regions with the Design Partition Planner
To create a LogicLock region and assign a partition to it with the Design Partition
Planner, right-click the partition and then click Create LogicLock Region.
Creating LogicLock Regions with the Chip Planner
To create a LogicLock region in the Chip Planner, click the Create LogicLock Region
command on the View menu, then click and drag on the Chip Planner floorplan to
create a region of your preferred location and size.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
LogicLock Regions
15–5
Creating Nonrectangular LogicLock Regions
When you create a floorplan for your design, you may want to create nonrectangular
LogicLock regions to exclude certain resources from the LogicLock region. You might
also create a nonrectangular LogicLock region to place certain parts of your design
around specific device resources to improve performance.
To create a nonrectangular region with the Merge LogicLock Region command,
follow these steps:
1. In the Chip Planner, create two or more contiguous or non-contiguous rectangular
regions as described in “Creating LogicLock Regions” on page 15–4.
2. Arrange the regions that you have created into the locations where you want the
nonrectangular region to be.
3. Select all the individual regions that you want to merge by clicking each of them
while pressing the Shift key.
4. Right-click the title bar of any of the LogicLock regions that you want to merge,
point to LogicLock Regions, and then click Merge LogicLock Region. The
individual regions that you select merge to create a single new region.
By default, the new LogicLock region has the same name as the component region
containing the greatest number of resources; however, you can rename the new
region. In the LogicLock Regions Window, the new region is shown as having a
Custom Shape.
Figure 15–1 illustrates using the Merge LogicLock Region command to form a
nonrectangular LogicLock region by merging two rectangular LogicLock regions.
Figure 15–1. Using the Merge LogicLock Region command to create a nonrectangular region
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–6
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
LogicLock Regions
Hierarchical (Parent and Child) LogicLock Regions
To further constrain module locations, you can define a hierarchy for a group of
regions by declaring parent and child regions. The Quartus II software places a child
region completely within the boundaries of its parent region; a child region must be
placed entirely within the boundary of its parent. Additionally, parent and child
regions allow you to further improve the performance of a module by constraining
nodes in the critical path of a module.
To make one LogicLock region a child of another LogicLock region, in the LogicLock
Regions window, select the new child region and drag and drop the new child region
into its new parent region.
1
The LogicLock region hierarchy does not have to be the same as the design hierarchy.
You can create both auto-sized and fixed-sized LogicLock regions within a parent
LogicLock region; however, the parent of a fixed-sized child region must also be
fixed-sized. The location of a locked parent region is locked relative to the device; the
location of a locked child region is locked relative to its parent region. If you change
the parent’s location, the locked child’s origin changes, but maintains the same
placement relative to the origin of its parent. The location of a floating child region can
float within its parent. Complex region hierarchies might result in some LABs not
being used, effectively increasing the resource utilization in the device. Do not create
more levels of hierarchy than you need.
Placing LogicLock Regions
A fixed region must contain all resources required by the design block assigned to the
region. Although the Quartus II software can automatically place and size LogicLock
regions to meet resource and timing requirements, you can manually place and size
regions to meet your design requirements. You should consider the following if you
manually place or size a LogicLock region:
■
LogicLock regions with pin assignments must be placed on the periphery of the
device, adjacent to the pins. For the Arria series, Cyclone series, MAX II, MAX V,
and Stratix series of devices, you must also include the I/O block within the
LogicLock Region.
■
Floating LogicLock regions can overlap with their ancestors or descendants, but
not with other floating LogicLock regions.
Placing Device Resources into LogicLock Regions
A LogicLock region includes all device resources within its boundaries, including
memory and pins. The Quartus II software does not include pins automatically when
you assign an entity to a region—you can manually assign pins to LogicLock regions;
however, this placement puts location constraints on the region. The software only
obeys pin assignments to locked regions that border the periphery of the device. For
the Arria series, Cyclone series, MAX II, MAX V, and Stratix series of devices, the
locked regions must include the I/O pins as resources.
1
Pin assignments to LogicLock regions are effective only in fixed and locked regions.
Pin assignments to floating regions do not influence the placement of the region.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
LogicLock Regions
15–7
Only one LogicLock region can claim a device resource. If a LogicLock region
boundary includes part of a device resource, the Quartus II software allocates the
entire resource to that LogicLock region. When the Quartus II software places a
floating auto-sized region, it places the region in an area that meets the requirements
of the contents of the LogicLock region.
1
If you want to import multiple instances of a module into a top-level design, you must
ensure that the device has two or more locations with exactly the same device
resources. (You can determine this from the applicable device handbook.) If the device
does not have another area with exactly the same resources, the Quartus II software
generates a fitting error during compilation of the top-level design.
LogicLock Regions Window
You can use the LogicLock Regions window to create LogicLock regions, assign nodes
and entities to them, and modify the properties of a LogicLock region such as size,
state, width, height, origin, and whether the region is a reserved region. The
LogicLock Regions window also has a recommendations toolbar; select a LogicLock
region from the drop-down list in the recommendations toolbar to display the
relevant suggestions to optimize that LogicLock region. You can customize the
LogicLock Regions window by dragging and dropping the columns to change their
order; you can also show and hide optional columns by right-clicking any column
heading and then selecting the appropriate columns in the shortcut menu.
Figure 15–2. LogicLock Regions Window
The LogicLock Region Properties dialog box provides a summary of all LogicLock
regions in your design. Use the LogicLock Region Properties dialog box to obtain
detailed information about your LogicLock region, such as which entities and nodes
are assigned to your region and which resources are required. The LogicLock Region
Properties dialog box shows the properties of the current selected regions and allows
you to modify them. To open the LogicLock Region Properties dialog box,
double-click any region in the LogicLock Regions window, or right-click the region
and click Properties.
1
May 2011
For designs that target Arria series, Cyclone series, Stratix series, MAX II, and MAX V
devices, the Quartus II software automatically creates a LogicLock region that
encompasses the entire device. This default region is labelled Root_Region, and is
locked and fixed.
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–8
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
LogicLock Regions
1
For Arria series, Cyclone series, Stratix series, MAX II, and MAX V devices, the origin
of the LogicLock region is located at the lower-left corner of the region. For all other
supported devices, the origin is located at the upper-left corner of the region.
Reserved LogicLock Region
The Quartus II software honors all entity and node assignments to LogicLock regions.
Occasionally, entities and nodes do not occupy an entire region, which leaves some of
the region’s resources unoccupied. To increase the region’s resource utilization and
performance, the Quartus II software’s default behavior fills the unoccupied resources
with other nodes and entities that have not been assigned to another region. You can
prevent this behavior by turning on Reserved on the General tab of the LogicLock
Region Properties dialog box. When you turn on this option, your LogicLock region
contains only the entities and nodes that you specifically assigned to your LogicLock
region.
Excluded Resources
The Excluded Resources feature allows you to easily exclude specific device resources
such as DSP blocks or M4K memory blocks from a LogicLock region. For example,
you can assign a specific entity to a LogicLock region but allow the DSP blocks of that
entity to be placed anywhere on the device. Use the Excluded Resources feature on a
per-LogicLock region member basis.
To exclude certain device resources from an entity, in the LogicLock Region
Properties dialog box, highlight the entity in the Design Element column, and click
Edit. In the Edit Node dialog box, under Excluded Element Types, click the Browse
button. In the Excluded Resources Element Types dialog box, you can select the
device resources you want to exclude from the entity. When you have selected the
resources to exclude, the Excluded Resources column is updated in the LogicLock
Region Properties dialog box to reflect the excluded resources.
1
The Excluded Resources feature prevents certain resource types from being included
in a region, but it does not prevent the resources from being placed inside the region
unless you set the region’s Reserved property to On. To indicate to the Fitter that
certain resources are not required inside a LogicLock region, define a resource filter.
For more information about resource filters, refer to “LogicLock Resource Exclusions”
in the Best Practices for Incremental Compilation Partitions and Floorplan Assignments
chapter in volume 1 of the Quartus II Handbook.
Additional Quartus II LogicLock Design Features
To complement the LogicLock Regions window, the Quartus II software has
additional features to help you design with LogicLock regions.
Analysis and Synthesis Resource Utilization by Entity
The Compilation Report contains an Analysis and Synthesis Resource Utilization by
Entity section, which reports resource usage statistics, including entity-level
information. You can use this feature to verify that any LogicLock region you
manually create contains enough resources to accommodate all the entities you assign
to it.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
LogicLock Regions
15–9
Quartus II Revisions Feature
When you evaluate different LogicLock regions in your design, you might want to
experiment with different configurations to achieve your desired results. The
Quartus II Revisions feature allows you to organize the same project with different
settings until you find an optimum configuration.
To use the Revisions feature, on the Project menu, click Revisions. In the Revisions
dialog box, you can create and specify revisions. You can create a revision from the
current design or any previously created revisions. Each revision can have an
associated description. You can use revisions to organize the placement constraints
created for your LogicLock regions.
LogicLock Assignment Precedence
You can encounter conflicts during the assignment of entities and nodes to LogicLock
regions. For example, an entire top-level entity might be assigned to one region and a
node within this top-level entity assigned to another region. To resolve conflicting
assignments, the Quartus II software maintains an order of precedence for LogicLock
assignments. The following order of precedence, from highest to lowest, applies:
1. Exact node-level assignments
2. Path-based and wildcard assignments
3. Hierarchical assignments
h For more information about LogicLock assignment precedence, refer to Understanding
Assignment Priority in Quartus II Help.
1
Open the Priority dialog box by selecting Priority on the General tab of the
LogicLock Regions Properties dialog box. You can change the priority of path-based
and wildcard assignments with the Up and Down buttons in the Priority dialog box.
To prioritize assignments between regions, you must select multiple LogicLock
regions and then open the Priority dialog box from the LogicLock Regions Properties
dialog box.
Virtual Pins
A virtual pin is an I/O element that is temporarily mapped to a logic element and not
to a pin during compilation, and is then implemented as a LUT. Virtual pins should be
used only for I/O elements in lower-level design entities that become nodes when
imported to the top-level design. You can create virtual pins by assigning the Virtual
Pin logic option to an I/O element.
You might use virtual pin assignments when you compile a partial design, because
not all the I/Os from a partial design drive chip pins at the top level.
The virtual pin assignment identifies the I/O ports of a design module that are
internal nodes in the top-level design. These assignments prevent the number of I/O
ports in the lower-level modules from exceeding the total number of available device
pins. Every I/O port that you designate as a virtual pin becomes mapped to either a
logic cell or an adaptive logic module (ALM), depending on the target device.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–10
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Using LogicLock Regions in the Chip Planner
1
The Virtual Pin logic option must be assigned to an input or output pin. If you assign
this option to a bidirectional pin, tri-state pin, or registered I/O element, Analysis and
Synthesis ignores the assignment. If you assign this option to a tri-state pin, the Fitter
inserts an I/O buffer to account for the tri-state logic; therefore, the pin cannot be a
virtual pin. You can use multiplexer logic instead of a tri-state pin if you want to
continue to use the assigned pin as a virtual pin. Do not use tri-state logic except for
signals that connect directly to device I/O pins.
In the top-level design, you connect these virtual pins to an internal node of another
module. By making assignments to virtual pins, you can place those pins in the same
location or region on the device as that of the corresponding internal nodes in the
top-level module. You can use the Virtual Pin option when compiling a LogicLock
module with more pins than the target device allows. The Virtual Pin option can
enable timing analysis of a design module that more closely matches the performance
of the module after you integrate it into the top-level design.
1
In the Node Finder, you can set Filter Type to Pins: Virtual to display all assigned
virtual pins in the design. From the Assignment Editor, to access the Node Finder,
double-click the To field; when the arrow appears on the right side of the field, click
the arrow and select Node Finder.
Using LogicLock Regions in the Chip Planner
You can easily create LogicLock regions in the Chip Planner and assign resources to
them.
Viewing Connections Between LogicLock Regions in the Chip Planner
You can view and edit LogicLock regions using the Chip Planner. To view and edit
LogicLock regions, select the Floorplan Editing layer setting, or any layer setting that
has the User-assigned LogicLock regions setting enabled.
The Chip Planner shows the connections between LogicLock regions. By default, you
can view each connection as an individual line. You can choose to display connections
between two LogicLock regions as a single bundled connection rather than as
individual connection lines. To use this option, open the Chip Planner and on the
View menu, click Inter-region Bundles.
h For more information about the Inter-region Bundles dialog box, refer to Inter-region
Bundles Dialog Box in Quartus II Help.
Using LogicLock Regions with the Design Partition Planner
You can optimize timing in a design by placing entities that share significant logical
connectivity close to each other on the device. By default, the Fitter usually places
closely connected entities in the same area of the device; however, you can use
LogicLock regions, together with the Design Partition Planner and the Chip Planner,
to help ensure that logically connected entities retain optimal placement from one
compilation to the next.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
15–11
You can view the logical connectivity between entities with the Design Partition
Planner, and the physical placement of those entities with the Chip Planner. In the
Design Partition Planner, you can identify entities that are highly interconnected, and
place those entities in a partition. In the Chip Planner, you can create LogicLock
regions and assign each partition to a LogicLock region, thereby preserving the
placement of the entities.
f For more information about using LogicLock regions with design partitions, refer to
the Quartus II Incremental Compilation for Hierarchical and Team-Based Design and the
Best Practices for Incremental Compilation Partition and Floorplan Assignments chapters in
volume 1 of the Quartus II Handbook. For more information about using the Design
Partition Planner with the Chip Planner, refer to About the Design Partition Planner and
Using the Design Partition Planner in Quartus II Help.
Design Floorplan Analysis Using the Chip Planner
The Chip Planner helps you visually analyze the floorplan of your design at any stage
of your design cycle. With the Chip Planner, you can view post-compilation
placement, connections, and routing paths. You can also create LogicLock regions and
location assignments. The Chip Planner allows you to create new logic cells and I/O
atoms and to move existing logic cells and I/O atoms in your design. You can also see
global and regional clock regions within the device, and the connections between I/O
atoms, PLLs and the different clock regions.
From the Chip Planner, you can launch the Resource Property Editor, which you can
use to change the properties and parameters of device resources, and modify
connectivity between certain types of device resources. The Change Manager records
any changes that you make to your design floorplan so that you can selectively undo
changes if necessary.
f For more information about the Resource Property Editor and the Change Manager,
refer to the Engineering Change Management with the Chip Planner chapter in volume 2
of the Quartus II Handbook, and to About the Resource Property Editor and About the
Change Manager in Quartus II Help.
The following sections present Chip Planner floorplan views and design analysis
procedures which you can use with any predefined task, unless a procedure requires
a specific task or editing mode.
Chip Planner Floorplan Views
The Chip Planner uses a hierarchical zoom viewer that shows various abstraction
levels of the targeted Altera device. As you zoom in, the level of abstraction decreases,
revealing more detail about your design.
f For more information about Chip Planner floorplan views, refer to the Engineering
Change Management with the Chip Planner chapter in volume 2 of the Quartus II
Handbook.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–12
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
Bird’s Eye View
The Bird’s Eye View displays a high-level picture of resource usage for the entire chip
and provides a fast and efficient way to navigate between areas of interest in the Chip
Planner.
The Bird’s Eye View is particularly useful when the parts of your design that you
want to view are at opposite ends of the chip and you want to quickly navigate
between resource elements without losing your frame of reference.
h For more information about the Bird’s Eye View, refer to Bird’s Eye View and
Displaying Resources and Information in Quartus II Help.
Properties Window
The Properties Window displays detailed properties of the objects (such as atoms,
paths, LogicLock regions, or routing elements) currently selected in the Chip Planner.
To display the Properties Window, click Properties on the View menu in the Chip
Planner
Viewing Architecture-Specific Design Information
By adjusting the Layer Settings in the Chip Planner, you can view the following
architecture-specific information related to your design:
■
Device routing resources used by your design—View how blocks are connected,
as well as the signal routing that connects the blocks.
■
LE configuration—View logic element (LE) configuration in your design. For
example, you can view which LE inputs are used; if the LE utilizes the register, the
look-up table (LUT), or both; as well as the signal flow through the LE.
■
ALM configuration—View ALM configuration in your design. For example, you
can view which ALM inputs are used, if the ALM utilizes the registers, the upper
LUT, the lower LUT, or all of them. You can also view the signal flow through the
ALM.
■
I/O configuration—View device I/O resource usage. For example, you can view
which components of the I/O resources are used, if the delay chain settings are
enabled, which I/O standards are set, and the signal flow through the I/O.
■
PLL configuration—View phase-locked loop (PLL) configuration in your design.
For example, you can view which control signals of the PLL are used with the
settings for your PLL.
■
Timing—View the delay between the inputs and outputs of FPGA elements. For
example, you can analyze the timing of the DATAB input to the COMBOUT output.
In addition, you can modify the following device properties with the Chip Planner:
■
LEs and ALMs
■
I/O cells
■
PLLs
■
Registers in RAM and DSP blocks
■
Connections between elements
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
■
15–13
Placement of elements
f For more information about LEs, ALMs, and other resources of an FPGA device, refer
to the relevant device handbook.
Viewing Available Clock Networks in the Device
When you select a task with clock region layers enabled, you can display the areas of
the chip that are driven by global and regional clock networks. This global clock
display feature is available for Arria GX, Arria II, Cyclone II, Cyclone III,
HardCopy II, HardCopy III, Stratix II, Stratix II GX, Stratix III, Stratix IV, and Stratix V
device families.
Depending on the clock layers activated in the selected task, the Chip Planner
displays regional and global clock regions in the device, and the connectivity between
clock regions, pins, and PLLs. Clock regions appear as rectangular overlay boxes with
labels indicating the clock type and index.You can select each clock network region by
clicking on the clock region. The clock-shaped icon at the top-left corner indicates that
the region represents a clock network region. You can change the color in which the
Chip Planner displays clock regions on the Options dialog box of the Tools menu.
The Layer Settings dialog box lists layers for different clock region types; when the
selected device does not contain a given clock region, the option for that category is
unavailable in the dialog box. You can customize the Chip Planner’s display of clock
regions by creating a custom task with selected clock layers enabled in the Layers
Settings dialog box.
h For more information about displaying clock regions, refer to Displaying Resources and
Information in Quartus II Help.
Viewing Critical Paths
Critical paths are timing paths in your design that have a negative slack. These timing
paths can span from device I/Os to internal registers, registers to registers, or from
registers to device I/Os. The slack of a path determines its criticality; slack appears in
the timing analysis report. Design analysis for timing closure is a fundamental
requirement for optimal performance in highly complex designs. The analytical
capability of the Chip Planner helps you close timing on complex designs.
Viewing critical paths in the Chip Planner helps you understand why a specific path
is failing. You can see if any modification in the placement can reduce the negative
slack. You can display details of a path (to expand/collapse the path to/from the
connections in the path) by clicking Expand Connections in the toolbar, or by clicking
on the “+/-” on the label.
You can locate failing paths from the timing report in the TimeQuest Timing
Analyzer. To locate the critical paths, run the Report Timing task from the Custom
Reports group in the Tasks pane of the TimeQuest Timing Analyzer. From the View
pane, which lists the failing paths, right-click on any failing path or node, and select
Locate Path. From the Locate dialog box, select Chip Planner to see the failing path in
the Chip Planner.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–14
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
1
To display paths in the floorplan, you must first make timing settings and perform a
timing analysis.
f For more information about performing static timing analysis with the Quartus II
TimeQuest Timing Analyzer, refer to The Quartus II TimeQuest Timing Analyzer
chapter in volume 3 of the Quartus II Handbook.
Viewing Routing Congestion
The Routing Congestion task allows you to determine the percentage of routing
resources in use following a compilation. This feature can identify where there is a
lack of routing resources, helping you to make design changes to meet routing
congestion design requirements.
To view routing congestion in the Chip Planner, select the Routing Congestion task.
The Routing Utilization Settings dialog box appears whenever you select the
Routing Congestion task; this dialog box allows you to set a congestion threshold
value, and to specify the types of routing interconnects of interest (Figure 15–3).
Figure 15–3. Routing Utilization Settings dialog box
h For more information about displaying routing congestion, refer to Displaying
Resources and Information in Quartus II Help.
The routing congestion map uses the color and shading of logic resources to indicate
relative resource utilization; darker shading represents a greater utilization of routing
resources (black indicates zero utilization). Areas where routing utilization exceeds
the threshold value specified in the Routing Utilization Settings dialog box appear in
red. The congestion map can help you determine whether you can modify the
floorplan, or make changes to the RTL to reduce routing congestion.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
15–15
The color and shading displayed by the congestion map for a particular area of the
device is based on the total utilization of all interconnect types that you select in the
Routing Utilization Settings dialog box. For example, consider the following routing
utilization:
Table 15–2. Example routing utilization
Interconnect
type
Total number of
elements
Number of
elements used
Percent
utilization
R3
216
69
32%
R6
108
71
66%
R24
48
46
96%
All interconnect
372
186
50%
If, in the Routing Utilization Settings dialog box, you select All interconnect, the
color displayed in the congestion map corresponds to a utilization of 50%. If you
select only R3 interconnect, the color displayed corresponds to 32%. If you select only
R24, the color displayed corresponds to 96%.
To identify a lack of routing resources, it is necessary to investigate each routing
interconnect type separately by selecting, in the Routing Utilization Settings dialog
box, each interconnect type in turn.
Viewing I/O Banks
The Chip Planner can show all of the I/O banks of the device. To see the I/O bank
map of the device, turn on the I/O Banks layer in the Layers Settings dialog box.
Viewing High-Speed Serial Interfaces (HSSI)
For the Stratix V device family, the Chip Planner displays a detailed block view of the
receiver and transmitter channels of the high-speed serial interfaces. Figure 15–4
shows the blocks of a Stratix V HSSI receiver channel.
Figure 15–4. Stratix V HSSI receiver channel
Generating Fan-In and Fan-Out Connections
The ability to display fan-in and fan-out connections enables you to view the atoms
that fan-in to or fan-out from the selected atom. To remove the connections displayed,
use the Clear Unselected Connections icon in the Chip Planner toolbar.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–16
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
Generating Immediate Fan-In and Fan-Out Connections
The ability to display immediate fan-in and fan-out connections enables you to view
the resource that is the immediate fan-in or fan-out connection for the selected atom.
For example, if you select a logic resource and choose to view the immediate fan-in for
that resource, you can see the routing resource that drives the logic resource. You can
generate immediate fan-in and fan-outs for all logic resources and routing resources.
To remove the displayed connections from the screen, click the Clear Connections
icon in the toolbar.
Highlight Routing
The Highlight Routing command enables you to highlight the routing resources used
by a selected path or connection. Figure 15–5 shows the routing resources in use
between two logic elements.
Figure 15–5. Highlight Routing
f You can view and edit resources in the FPGA using the Resource Property Editor. For
more information, refer to the Engineering Change Management with the Chip Planner
chapter in volume 2 of the Quartus II Handbook.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
15–17
Show Delays
With the Show Delays command, you can view timing delays for paths located from
TimeQuest Timing Analyzer reports. For example, you can view the delay between
two logic resources or between a logic resource and a routing resource. Figure 15–6
shows the delay associated with a path located from a TimeQuest Timing Analyzer
report.
Figure 15–6. Show Delays
Exploring Paths in the Chip Planner
You can use the Chip Planner to explore paths between logic elements. The following
example uses the Chip Planner to traverse paths from the Timing Analysis report.
Locate Path from the Timing Analysis Report to the Chip Planner
To locate a path from the Timing Analysis report to the Chip Planner, perform the
following steps:
1. Select the path you want to locate.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–18
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
2. Right-click the path in the Timing Analysis report, point to Locate, and click
Locate in Chip Planner (Floorplan & Chip Editor). The path is displayed with its
timing data in the Chip Planner main window and is listed in the Locate History
window.
3. To view the routing resources taken for a path you have located in the Chip
Planner, select the path and then click the Highlight Routing icon in the Chip
Planner toolbar, or from the View menu, click Highlight Routing.
Analyzing Connections for a Path
To determine the connections between items in the Chip Planner, click the Expand
Connections icon on the toolbar. To add the timing delays for paths located from the
TimeQuest Timing Analyzer, click the Show Delays icon on the toolbar. Figure 15–7
shows the connections for a path located from the TimeQuest Timing Analyzer that
are displayed in the Chip Planner. To see the constituent delays on the selected path,
click on the “+” sign next to the path delay displayed in the Chip Planner.
Figure 15–7. Path Analysis
Viewing Assignments in the Chip Planner
You can view location assignments by selecting the appropriate layer set in the Chip
Planner. To view location assignments, select the Floorplan Editing task or any
custom task that displays block utilization, and the Assignment editing mode. See
Figure 15–8.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Design Floorplan Analysis Using the Chip Planner
15–19
The Chip Planner shows location assignments graphically, by displaying assigned
resources in a particular color (gray, by default). You can create or move an
assignment by dragging the selected resource to a new location.
Figure 15–8. Viewing Assignments in the Chip Planner
You can make node and pin location assignments and assignments to LogicLock
regions and custom regions using the drag-and-drop method in the Chip Planner. The
Fitter applies the assignments that you create during the next place-and-route
operation.
h For more information about managing assignments in the Chip Planner, refer to
Working With Assignments in the Chip Planner in Quartus II Help.
Viewing High-Speed and Low-Power Tiles in the Chip Planner
The Chip Planner has a predefined task, Power, which shows the power map of
Stratix III, Stratix IV, and Stratix V devices; these devices have ALMs that can operate
in either high-speed mode or low-power mode. The power mode is set during the
fitting process in the Quartus II software. These ALMs are grouped together to form
larger blocks, called “tiles.”
f To learn more about power analyses and optimizations in Stratix III devices, refer to
AN 437: Power Optimization in Stratix III FPGAs. To learn more about power analyses
and optimizations in Stratix IV devices, refer to AN 514: Power Optimization in
Stratix IV FPGAs.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–20
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Scripting Support
When you select the Power task in the Chip Planner for Stratix III, Stratix IV, or
Stratix V devices, the Chip Planner displays low-power and high-speed tiles in
contrasting colors; yellow tiles operate in a high-speed mode, while blue tiles operate
in a low-power mode (see Figure 15–9). When you select the Power task, you can
perform all floorplanner-related functions for this task; however, you cannot edit tiles
to change the power mode.
Figure 15–9. Viewing High-Speed and Low Power Tiles in a Stratix III Device
Yellow Tiles Operate in
High Speed Mode
Scripting Support
You can run procedures and specify the settings described in this chapter in a Tcl
script. You can also run some procedures at a command prompt. For detailed
information about scripting command options, refer to the Quartus II command-line
and Tcl API Help browser. To run the Help browser, type the following command at
the command prompt:
quartus_sh --qhelp r
h Information about scripting command options is also available in API Functions for Tcl
in Quartus II Help.
f For more information about Tcl scripting, refer to the Tcl Scripting chapter in volume 2
of the Quartus II Handbook. For more information about command-line scripting, refer
to the Command-Line Scripting chapter in volume 2 of the Quartus II Handbook. For
information about all settings and constraints in the Quartus II software, refer to the
Quartus II Settings File Manual.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Scripting Support
15–21
Initializing and Uninitializing a LogicLock Region
You must initialize the LogicLock data structures before creating or modifying any
LogicLock regions and before executing any of the Tcl commands listed below.
Use the following Tcl command to initialize the LogicLock data structures:
initialize_logiclock
Use the following Tcl command to uninitialize the LogicLock data structures before
closing your project:
uninitialize_logiclock
Creating or Modifying LogicLock Regions
Use the following Tcl command to create or modify a LogicLock region:
set_logiclock -auto_size true -floating true -region <my_region-name>
1
The command in the above example sets the size of the region to auto and the state to
floating.
If you specify a region name that does not exist in the design, the command creates
the region with the specified properties. If you specify the name of an existing region,
the command changes all properties you specify and leaves unspecified properties
unchanged.
For more information about creating LogicLock regions, refer to “Creating LogicLock
Regions” on page 15–4.
Obtaining LogicLock Region Properties
Use the following Tcl command to obtain LogicLock region properties. This example
returns the height of the region named my_region:
get_logiclock -region my_region -height
Assigning LogicLock Region Content
Use the following Tcl commands to assign or change nodes and entities in a
LogicLock region. This example assigns all nodes with names matching fifo* to the
region named my_region.
set_logiclock_contents -region my_region -to fifo*
You can also make path-based assignments with the following Tcl command:
set_logiclock_contents -region my_region -from fifo -to ram*
Save a Node-Level Netlist for the Entire Design into a Persistent Source
File
Make the following assignments to cause the Quartus II Fitter to save a node-level
netlist for the entire design into a .vqm file:
set_global_assignment-name LOGICLOCK_INCREMENTAL_COMPILE_ASSIGNMENT ON
set_global_assignment-name LOGICLOCK_INCREMENTAL_COMPILE_FILE <file
name>
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–22
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Conclusion
Any path specified in the file name is relative to the project directory. For example,
specifying atom_netlists/top.vqm places top.vqm in the atom_netlists subdirectory
of your project directory.
A .vqm file is saved in the directory specified at the completion of a full compilation.
1
The saving of a node-level netlist to a persistent source file is not supported for
designs targeting newer devices such as Arria GX, Arria II, Cyclone III, MAX V,
Stratix III, Stratix IV, or Stratix V.
Setting LogicLock Assignment Priority
Use the following Tcl code to set the priority for a LogicLock region’s members. This
example reverses the priorities of the LogicLock region in your design.
set reverse [list]
for each member [get_logiclock_member_priority] {
set reverse [insert $reverse 0 $member]
{
set_logiclock_member_priority $reverse
Assigning Virtual Pins
Use the following Tcl command to turn on the virtual pin setting for a pin called
my_pin:
set_instance_assignment -name VIRTUAL_PIN ON -to my_pin
For more information about assigning virtual pins, refer to “Virtual Pins” on
page 15–9.
f For more information about Tcl scripting, refer to the Tcl Scripting chapter in volume 2
of the Quartus II Handbook.
Conclusion
Design floorplan analysis is a valuable method for achieving timing closure and
optimal performance in highly complex designs. With analysis capability, the
Quartus II Chip Planner helps you close timing quickly on your designs. Using the
Chip Planner together with LogicLock and Incremental Compilation enables you to
compile your designs hierarchically, preserving the timing results from individual
compilation runs. You can use LogicLock regions as part of an incremental
compilation methodology to improve your productivity. You can also include a
module in one or more projects while maintaining performance and reducing
development costs and time to market. LogicLock region assignments give you
complete control over logic and memory placement to improve the performance of
nonhierarchical designs as well.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Document Revision History
15–23
Document Revision History
Table 15–3 shows the revision history for this chapter.
Table 15–3. Document Revision History
Date
Version
Changes
■
May 2011
11.0.0
December 2010
July 2010
10.0.0
November 2009
May 2008
10.1.0
■
Edited “LogicLock Regions”
■
Updated “Viewing Routing Congestion”
■
Updated “Locate History”
■
Updated Figures 15-4, 15-9, 15-10, and 15-13
■
Added Figure 15-6
■
Updated for the 10.1 release.
■
Updated device support information
■
Removed references to Timing Closure Floorplan; removed “Design Analysis Using the
Timing Closure Floorplan” section
■
Added links to online Help topics
■
Added “Using LogicLock Regions with the Design Partition Planner” section
■
Updated “Viewing Critical Paths” section
■
Updated several graphics
■
Updated format of Document revision History table
■
Updated supported device information throughout
■
Removed deprecated sections related to the Timing Closure Floorplan for older device
families. (For information on using the Timing Closure Floorplan with older device
families, refer to previous versions of the Quartus II Handbook, available in the Quartus II
Handbook Archive.)
■
Updated “Creating Nonrectangular LogicLock Regions” section
■
Added “Selected Elements Window” section
■
Updated table 12-1
■
Updated the following sections:
9.1.0
8.0.0
Updated for the 11.0 release.
■
■
“Chip Planner Tasks and Layers”
■
“LogicLock Regions”
■
“Back-Annotating LogicLock Regions”
■
“LogicLock Regions in the Timing Closure Floorplan”
Added the following sections:
■
“Reserve LogicLock Region”
■
“Creating Nonrectangular LogicLock Regions”
■
“Viewing Available Clock Networks in the Device”
■
Updated Table 10–1
■
Removed the following sections:
■
Reserve LogicLock Region Design Analysis Using the Timing Closure Floorplan
f For previous versions of the Quartus II Handbook, refer to the Quartus II Handbook
Archive.
May 2011
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
15–24
Chapter 15: Analyzing and Optimizing the Design Floorplan with the Chip Planner
Document Revision History
f Take an online survey to provide feedback about this handbook chapter.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
May 2011 Altera Corporation
16. Netlist Optimizations and Physical
Synthesis
December 2010
QII52007-10.0.1
QII52007-10.0.1
The Quartus® II software offers physical synthesis optimizations to improve your
design beyond the optimization performed in the normal course of the Quartus II
compilation flow.
Physical synthesis optimizations can help improve the performance of your design
regardless of the synthesis tool used, although the effect of physical synthesis
optimizations depends on the structure of your design.
Netlist optimization options work with the atom netlist of your design, which
describes a design in terms of Altera®-specific primitives. An atom netlist file can be
an Electronic Design Interchange Format (.edf) file or a Verilog Quartus Mapping
(.vqm) file generated by a third-party synthesis tool, or a netlist used internally by the
Quartus II software. Physical synthesis optimizations are applied at different stages of
the Quartus II compilation flow, either during synthesis, fitting, or both.
This chapter explains how the physical synthesis optimizations in the Quartus II
software can modify your design’s netlist to improve the quality of results. This
chapter also provides information about preserving compilation results through
back-annotation and writing out a new netlist, and provides guidelines for applying
the various options.
1
Because the node names for primitives in the design can change when you use
physical synthesis optimizations, you should evaluate whether your design flow
requires fixed node names. If you use a verification flow that might require fixed node
names, such as the SignalTap® II Logic Analyzer, formal verification, or the LogicLock
based optimization flow (for legacy devices), you must turn off physical synthesis
options.
WYSIWYG Primitive Resynthesis
If you use a third-party tool to synthesize your design, use the Perform WYSIWYG
primitive resynthesis option to apply optimizations to the synthesized netlist.
The Perform WYSIWYG primitive resynthesis option directs the Quartus II software
to un-map the logic elements (LEs) in an atom netlist to logic gates, and then re-map
the gates back to Altera-specific primitives. Third-party synthesis tools generate either
an .edf or .vqm atom netlist file using Altera-specific primitives. When you turn on
the Perform WYSIWYG primitive resynthesis option, the Quartus II software can
work on different techniques specific to the device architecture during the re-mapping
process. This feature re-maps the design using the Optimization Technique specified
for your project (Speed, Area, or Balanced).
1
The Perform WYSIWYG primitive resynthesis option has no effect if you are using
Quartus II integrated synthesis to synthesize your design.
© 2010 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off.
and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at
www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but
reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any
information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device
specifications before relying on any published information and before placing orders for products or services.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010
Subscribe
16–2
Chapter 16: Netlist Optimizations and Physical Synthesis
WYSIWYG Primitive Resynthesis
To turn on the Perform WYSIWYG primitive resynthesis option, perform the
following steps:
1. On the Assignments menu, click Settings. The Settings dialog box appears.
2. In the Category list, select Analysis and Synthesis Settings. The Analysis &
Synthesis Settings page appears.
3. Turn on Perform WYSIWYG Primitive Resynthesis, and click OK.
If you want to perform WYSIWYG resynthesis on only a portion of your design, you
can use the Assignment Editor to assign the Perform WYSIWYG primitive
resynthesis logic option to a lower-level entity in your design. This logic option is
available for all Altera devices supported by the Quartus II software except MAX 3000
and MAX 7000 devices.
The results of the remapping depend on the Optimization Technique you choose. To
select an Optimization Technique, perform the following steps:
1. In the Category list, select Analysis & Synthesis Settings. The Analysis &
Synthesis Settings page appears.
2. Under Optimization Technique, select Speed, Area, or Balanced to specify how
the Quartus II technology mapper optimizes the design. The Balanced setting is
the default for many Altera device families; this setting optimizes the timing
critical parts of the design for speed and the rest of the design for area.
3. Click OK.
f Refer to the Quartus II Integrated Synthesis chapter in volume 1 of the Quartus II
Handbook for details on the Optimization Technique option.
Figure 16–1 shows the Quartus II software flow for the WYSIWYG primitive
resynthesis feature.
Figure 16–1. WYSIWYG Primitive Resynthesis
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
16–3
The Perform WYSIWYG primitive resynthesis option unmaps and remaps only logic
cells, also referred to as LCELL or LE primitives, and regular I/O primitives (which
may contain registers). Double data rate (DDR) I/O primitives, memory primitives,
digital signal processing (DSP) primitives, and logic cells in carry/cascade chains are
not remapped. Logic specified in an encrypted .vqm file or an .edf file, such as
third-party intellectual property (IP), is not touched.
The Perform WYSIWYG primitive resynthesis option can change node names in the
.vqm file or .edf file from your third-party synthesis tool, because the primitives in the
atom netlist are broken apart and then remapped by the Quartus II software. The
remapping process removes duplicate registers, but registers that are not removed
retain the same name after remapping.
Any nodes or entities that have the Netlist Optimizations logic option set to Never
Allow are not affected during WYSIWYG primitive resynthesis. You can use the
Assignment Editor to apply the Netlist Optimizations logic option. This option
disables WYSIWYG resynthesis for parts of your design.
1
Primitive node names are specified during synthesis. When netlist optimizations are
applied, node names might change because primitives are created and removed. HDL
attributes applied to preserve logic in third-party synthesis tools cannot be
maintained because those attributes are not written into the atom netlist read by the
Quartus II software.
If you use the Quartus II software to synthesize, you can use the Preserve Register
(preserve) and Keep Combinational Logic (keep) attributes to maintain certain
nodes in the design.
f For more information about using these attributes during synthesis in the Quartus II
software, refer to the Quartus II Integrated Synthesis chapter in volume 1 of the
Quartus II Handbook.
Performing Physical Synthesis Optimizations
The Quartus II design flow involves separate steps of synthesis and fitting. The
synthesis step optimizes the logical structure of a circuit for area, speed, or both. The
Fitter then places and routes the logic cells to ensure critical portions of logic are close
together and use the fastest possible routing resources. While you are using this
push-button flow, the synthesis stage is unable to anticipate the routing delays seen in
the Fitter. Because routing delays are a significant part of the typical critical path
delay, the physical synthesis optimizations available in the Quartus II software take
those routing delays into consideration and focus timing-driven optimizations at
those parts of the design. This tight integration of the fitting and synthesis processes is
known as physical synthesis.
The following sections describe the physical synthesis optimizations available in the
Quartus II software, and how they can help improve your performance results.
Physical synthesis optimization options can be used with Arria series, Cyclone,
HardCopy, and Stratix series device families.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
16–4
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
If you are migrating your design to a HardCopy II device, you can target physical
synthesis optimizations to the FPGA architecture in the FPGA-first flow or to the
HardCopy II architecture in the HardCopy-first flow. The optimizations are mapped
to the other device architecture during the migration process.
1
You cannot target optimizations to both device architectures individually because
doing so results in a different post-fitting netlist for each device.
f For more information about physical synthesis optimizations, refer to Physical
Synthesis Optimizations Page (Settings Dialog Box) in Quartus II Help. For more
information about using physical synthesis with HardCopy devices, refer to the
Quartus II Support for HardCopy Series Devices chapter in volume 1 of the Quartus II
Handbook.
You can choose the physical synthesis optimization options you want for your design
during synthesis and fitting in the Physical Synthesis Optimizations page under the
Compilation Process Settings page in the Settings dialog box. The settings include
optimizations for improving performance and fitting in the selected device.
You can also set the effort level for physical synthesis optimizations. Normally,
physical synthesis optimizations increase the compilation time; however, you can
select the Fast effort level if you want to limit the increase in compilation time. When
you select the Fast effort level, the Quartus II software performs limited register
retiming operations during fitting. The Extra effort level runs additional algorithms to
get the best circuit performance, but results in increased compilation time.
To optimize performance, the following options are available:
■
Perform physical synthesis for combinational logic
■
Perform register retiming
■
Perform automatic asynchronous signal pipelining
■
Perform register duplication
To optimize for better fitting, you can choose from the following options:
■
Perform physical synthesis for combinational logic
■
Perform logic to memory mapping
To view and modify the physical synthesis optimization options, perform the
following steps:
1. On the Assignments menu, click Settings. The Settings dialog box appears.
2. In the Category list, select Physical Synthesis Optimizations under Compilation
Process Settings. The Physical Synthesis Optimizations page appears.
3. Specify the options for performing physical synthesis optimizations.
Some physical synthesis options affect only registered logic and some options affect
only combinational logic. Select options based on whether you want to keep the
registers intact or not. For example, if your verification flow involves formal
verification, you might have to keep the registers intact.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
16–5
All Physical Synthesis optimizations write results to the Netlist Optimizations report,
which provides a list of atom netlist files that were modified, created, and deleted
during physical synthesis. To access the Netlist Optimizations report, perform the
following steps:
1. On the Processing menu, click Compilation Report.
2. In the Compilation Report list, select Netlist Optimizations under Fitter.
Similarly, physical synthesis optimizations performed during synthesis write results
to the synthesis report. To access this report, perform the following steps:
1. On the Processing menu, click Compilation Report.
2. In the Compilation Report list, select Analysis & Synthesis.
3. In the Optimization Results folder, select Netlist Optimizations. The Physical
Synthesis Netlist Optimizations table appears, listing the physical synthesis
netlist optimizations performed during synthesis.
Nodes or entities that have the Netlist Optimizations logic option set to Never Allow
are not affected by the physical synthesis algorithms. You can use the Assignment
Editor to apply the Netlist Optimizations logic option. Use this option to disable
physical synthesis optimizations for parts of your design.
Automatic Asynchronous Signal Pipelining
The Perform automatic asynchronous signal pipelining option on the Physical
Synthesis Optimizations page in the Compilation Process Settings section of the
Settings dialog box allows the Quartus II Fitter to perform automatic insertion of
pipeline stages for asynchronous clear and asynchronous load signals during fitting
when these signals negatively affect performance. You can use this option if
asynchronous control signal recovery and removal times are not achieving their
requirements.
The Perform automatic asynchronous signal pipelining option improves
performance for designs in which asynchronous signals in very fast clock domains
cannot be distributed across the chip fast enough due to long global network delays.
This optimization performs automatic pipelining of these signals, while attempting to
minimize the total number of registers inserted.
1
The Perform automatic asynchronous signal pipelining option adds registers to nets
driving the asynchronous clear or asynchronous load ports of registers. These
additional registers add register delays (adds latency) to the reset, adding the same
number of register delays for each destination using the reset. The additional register
delays can change the behavior of the signal in the design; therefore, you should use
this option only if additional latency on the reset signals does not violate any design
requirements. This option also prevents the promotion of signals to global routing
resources.
The Quartus II software performs automatic asynchronous signal pipelining only if
Enable Recovery/Removal analysis is turned on. If you use the TimeQuest Timing
Analyzer, Enable Recovery/Removal analysis is turned on by default. Pipelining is
allowed only on asynchronous signals that have the following properties:
■
December 2010
The asynchronous signal is synchronized to a clock (a synchronization register
drives the signal)
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
16–6
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
■
The asynchronous signal fans-out only to asynchronous control ports of registers
The Quartus II software does not perform automatic asynchronous signal pipelining
on asynchronous signals that have the Netlist Optimization logic option set to Never
Allow.
Physical Synthesis for Combinational Logic
To optimize the design and reduce delay along critical paths, you can turn on the
Perform physical synthesis for combinational logic option, which swaps the look-up
table (LUT) ports within LEs so that the critical path has fewer layers through which
to travel. The Perform physical synthesis for combinational logic option also allows
the duplication of LUTs to enable further optimizations on the critical path.
h For more information about using the Perform physical synthesis for combinational
logic option, refer to Physical Synthesis Optimizations Page (Settings Dialog Box) and to
Setting Up and Running the Fitter in Quartus II Help.
The Perform physical synthesis for combinational logic option affects only
combinational logic in the form of LUTs. These transformations might occur during
the synthesis stage or the Fitter stage during compilation. The registers contained in
the affected logic cells are not modified. Inputs into memory blocks, DSP blocks, and
I/O elements (IOEs) are not swapped.
The Quartus II software does not perform combinational optimization on logic cells
that have the following properties:
■
Are part of a chain
■
Drive global signals
■
Are constrained to a single logic array block (LAB) location
■
Have the Netlist Optimizations option set to Never Allow
If you want to consider logic cells with any of these conditions for physical synthesis,
you can override these rules by setting the Netlist Optimizations logic option to
Always Allow on a given set of nodes.
Physical Synthesis for Registers—Register Duplication
The Perform register duplication option on the Physical Synthesis Optimizations
page in the Compilation Process Settings section of the Settings dialog box allows
the Quartus II Fitter to duplicate registers based on Fitter placement information. You
can also duplicate combinational logic when this option is enabled. A logic cell that
fans out to multiple locations can be duplicated to reduce the delay of one path
without degrading the delay of another. The new logic cell can be placed closer to
critical logic without affecting the other fan-out paths of the original logic cell.
h For more information about the Perform register duplication option, refer to Physical
Synthesis Optimizations Page (Settings Dialog Box) and to Setting Up and Running the
Fitter in Quartus II Help.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
16–7
The Quartus II software does not perform register duplication on logic cells that have
the following properties:
■
Are part of a chain
■
Contain registers that drive asynchronous control signals on another register
■
Contain registers that drive the clock of another register
■
Contain registers that drive global signals
■
Contain registers that are constrained to a single LAB location
■
Contain registers that are driven by input pins without a tSU constraint
■
Contain registers that are driven by a register in another clock domain
■
Are considered virtual I/O pins
■
Have the Netlist Optimizations option set to Never Allow
f For more information about virtual I/O pins, refer to the Analyzing and Optimizing the
Design Floorplan chapter in volume 2 of the Quartus II Handbook.
If you want to consider logic cells that meet any of these conditions for physical
synthesis, you can override these rules by setting the Netlist Optimizations logic
option to Always Allow on a given set of nodes.
Physical Synthesis for Registers—Register Retiming
The Perform Register Retiming option enables the movement of registers across
combinational logic, allowing the Quartus II software to trade off the delay between
timing-critical paths and non-critical paths. Register retiming can be done during
Quartus II integrated synthesis or during the Fitter stages of design compilation.
Figure 16–2 shows an example of register retiming in which the 10-ns critical delay is
reduced by moving the register relative to the combinational logic.
Figure 16–2. Register Retiming Diagram
Retiming can create multiple registers at the input of a combinational block from a
register at the output of a combinational block. In this case, the new registers have the
same clock and clock enable. The asynchronous control signals and power-up level
are derived from previous registers to provide equivalent functionality. Retiming can
also combine multiple registers at the input of a combinational block to a single
register (Figure 16–3).
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
16–8
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
Figure 16–3. Combining Registers with Register Retiming
To move registers across combinational logic to balance timing, perform the following
steps:
1. On the Assignments menu, click Settings. The Settings dialog box appears.
2. In the Category list, select Physical Synthesis Optimizations under Compilation
Process Settings. The Physical Synthesis Optimizations page appears.
3. Specify your preferred option under Optimize for performance (physical
synthesis) and Effort level.
4. Click OK.
h For more information about the Optimize for performance (physical synthesis)
options and effort levels, refer to Physical Synthesis Optimizations Page (Settings Dialog
Box) in Quartus II Help.
If you want to prevent register movement during register retiming, you can set the
Netlist Optimizations logic option to Never Allow. You can apply this option to
either individual registers or entities in the design using the Assignment Editor.
In digital circuits, synchronization registers are instantiated on cross clock domain
paths to reduce the possibility of metastability. The Quartus II software detects such
synchronization registers and does not move them, even if register retiming is turned
on.
The following sets of registers are not moved during register retiming:
■
Both registers in a direct connection from input pin-to-register-to-register if both
registers have the same clock and the first register does not fan-out to anywhere
else. These registers are considered synchronization registers.
■
Both registers in a direct connection from register-to-register if both registers have
the same clock, the first register does not fan out to anywhere else, and the first
register is fed by another register in a different clock domain (directly or through
combinational logic). These registers are considered synchronization registers.
The Quartus II software assumes that a synchronization register chain consists of two
registers. If your design has synchronization register chains with more than two
registers, you must indicate the number of registers in your synchronization chains so
that they are not affected by register retiming. To do this, perform the following steps:
1. On the Assignments menu, click Settings. The Settings dialog box appears.
2. In the Category list, select Analysis & Synthesis Settings. The Analysis &
Synthesis Setting page appears.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
16–9
3. Click More Settings. The More Analysis & Synthesis Settings dialog box
appears.
4. In the Name list, select Synchronization Register Chain Length and modify the
setting to match the synchronization register length used in your design. If you set
a value of 1 for the Synchronization Register Chain Length, it means that any
registers connected to the first register in a register-to-register connection can be
moved during retiming. A value of n > 1 means that any registers in a sequence of
length 1, 2,… n are not moved during register retiming.
The Quartus II software does not perform register retiming on logic cells that have the
following properties:
■
Are part of a cascade chain
■
Contain registers that drive asynchronous control signals on another register
■
Contain registers that drive the clock of another register
■
Contain registers that drive a register in another clock domain
■
Contain registers that are driven by a register in another clock domain
1
The Quartus II software does not usually retime registers across different
clock domains; however, if you use the Classic Timing Analyzer and specify
a global fMAX requirement, the Quartus II software interprets all clocks as
related. Consequently, the Quartus II software might try to retime registerto-register paths associated with different clocks.
To avoid this circumstance, provide individual fMAX requirements to each
clock when using Classic Timing Analysis. When you constrain each clock
individually, the Quartus II software assumes no relationship between
different clock domains and considers each clock domain to be asychronous
to other clock domains; hence no register-to-register paths crossing clock
domains are retimed.
When you use the TimeQuest Timing Analyzer, register-to-register paths
across clock domains are never retimed, because the TimeQuest Timing
Analyzer treats all clock domains as asychronous to each other unless they
are intentionally grouped.
■
Contain registers that are constrained to a single LAB location
■
Contain registers that are connected to SERDES
■
Are considered virtual I/O pins
■
Registers that have the Netlist Optimizations logic option set to Never Allow
f For more information about virtual I/O pins, refer to the Analyzing and Optimizing the
Design Floorplan chapter in volume 2 of the Quartus II Handbook.
If you want to consider logic cells that meet any of these conditions for physical
synthesis, you can override these rules by setting the Netlist Optimizations logic
option to Always Allow on a given set of registers.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
16–10
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
Preserving Your Physical Synthesis Results
The Quartus II software generates the same results on every compilation for the same
source code and settings on a given system, hence you do not need to preserve your
results from compilation to compilation. When you make changes to the source code
or to the settings, you usually get the best results by allowing the software to compile
without using previous compilation results or location assignments. In some cases, if
you avoid performing analysis and synthesis or quartus_map, and run the Fitter or
another desired Quartus II executable instead, you can skip the synthesis stage of the
compilation.
When you use the Quartus II incremental compilation flow, you can preserve
synthesis results for a particular partition of your design by choosing a netlist type of
post-synthesis. If you want to preserve fitting results between compilation runs,
choose a netlist type of post-fit during incremental compilation.
The rest of this section is relevant only for those designs using older devices that do
not support incremental compilation.
f For information about the incremental compilation design methodology, refer to the
Quartus II Incremental Compilation for Hierarchical and Team-Based Design chapter in
volume 1 of the Quartus II Handbook, and to About Incremental Compilation in
Quartus II Help.
You can preserve the resulting nodes from physical synthesis in older devices that do
not support incremental compilation. You might need to preserve nodes if you use the
LogicLock flow to back-annotate placement, import one design into another, or both.
For all device families that support incremental compilation, use that feature to
preserve results.
To preserve the nodes from Quartus II physical synthesis optimization options for
older devices that do not support incremental compilation (such as Max II devices),
perform the following steps:
1. On the Assignments menu, click Settings. The Settings dialog box appears.
2. In the Category list, select Compilation Process Settings. The Compilation
Process Settings page appears.
3. Turn on Save a node-level netlist of the entire design into a persistent source
file. This setting is not available for Cyclone III, Stratix III, and newer devices.
4. Click OK.
The Save a node-level netlist of the entire design into a persistent source file option
saves your final results as an atom-based netlist in .vqm file format. By default, the
Quartus II software places the .vqm file in the atom_netlists directory under the
current project directory. To create a different .vqm file using different Quartus II
settings, in the Compilation Process Settings page, change the File name setting.
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 16: Netlist Optimizations and Physical Synthesis
Performing Physical Synthesis Optimizations
16–11
If you use the physical synthesis optimizations and want to lock down the location of
all LEs and other device resources in the design with the Back-Annotate Assignments
command, a .vqm file netlist is required. The .vqm file preserves the changes that you
made to your original netlist. Because the physical synthesis optimizations depend on
the placement of the nodes in the design, back-annotating the placement changes the
results from physical synthesis. Changing the results means that node names are
different, and your back-annotated locations are no longer valid.
You should not use a Quartus II-generated .vqm file or back-annotated location
assignments with physical synthesis optimizations unless you have finalized the
design. Making any changes to the design invalidates your physical synthesis results
and back-annotated location assignments. If you require changes later, use the new
source HDL code as your input files, and remove the back-annotated assignments
corresponding to the Quartus II-generated .vqm file.
To back-annotate logic locations for a design that was compiled with physical
synthesis optimizations, first create a .vqm file. When recompiling the design with the
hard logic location assignments, use the new .vqm file as the input source file and
turn off the physical synthesis optimizations for the new compilation.
If you are importing a .vqm file and back-annotated locations into another project that
has any Netlist Optimizations turned on, you must apply the Never Allow
constraint to make sure node names don’t change; otherwise, the back-annotated
location or LogicLock assignments are invalid.
1
For newer devices, such as the Arria, Cyclone, or Stratix series, use incremental
compilation to preserve compilation results instead of using logic back-annotation.
Physical Synthesis Options for Fitting
The Quartus II software provides physical synthesis optimization options for
improving fitting results. To access these options, perform the following steps:
1. On the Assignments menu, click Settings. The Settings dialog box appears.
2. In the Category list, select Physical Synthesis Optimizations under Compilation
Process Settings. The Physical Synthesis Optimizations page appears.
3. Under Optimize for fitting (physical synthesis for density), there are two physical
synthesis options available to improve fitting your design in the target device:
Physical synthesis for combinational logic and Perform logic to memory
mapping (Table 16–1).
Table 16–1. Physical Synthesis Optimizations Options
Option
Function
Physical Synthesis for
Combinational Logic
When you select this option, the Fitter detects duplicate combinational logic and optimizes
combinational logic to improve the fit.
Perform Logic to Memory
Mapping
When you select this option, the Fitter can remap registers and combinational logic in your
design into unused memory blocks and achieves a fit.
h For more information about physical synthesis optimization options, refer to Physical
Synthesis Optimizations Page (Settings Dialog Box) in Quartus II Help.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
16–12
Chapter 16: Netlist Optimizations and Physical Synthesis
Applying Netlist Optimization Options
Applying Netlist Optimization Options
The improvement in performance when using netlist optimizations is design
dependent. If you have restructured your design to balance critical path delays, netlist
optimizations might yield minimal improvement in performance. You may have to
experiment with available options to see which combination of settings works best for
a particular design. Refer to the messages in the compilation report to see the
magnitude of improvement with each option, and to help you decide whether you
should turn on a given option or specific effort level.
Turning on more netlist optimization options can result in more changes to the node
names in the design; bear this in mind if you are using a verification flow, such as the
SignalTap II Logic Analyzer or formal verification that requires fixed or known node
names.
Applying all of the physical synthesis options at the Extra effort level generally
produces the best results for those options, but adds significantly to the compilation
time. You can also use the Physical synthesis effort level options to decrease the
compilation time. The WYSIWYG primitive resynthesis option does not add much
compilation time relative to the overall design compilation time.
To find the best results, you can use the Quartus II Design Space Explorer (DSE) to
apply various sets of netlist optimization options.
h For more information about DSE, refer to About Design Space Explorer in Quartus II
Help.
Scripting Support
You can run procedures and make settings described in this chapter in a Tcl script.
You can also run some procedures at a command prompt. For detailed information
about scripting command options, refer to the Quartus II Command-Line and Tcl API
Help browser. To run the Help browser, type the following command at the command
prompt:
quartus_sh --qhelp r
f For more information about Tcl scripting, refer to the Tcl Scripting chapter in volume 2
of the Quartus II Handbook and API Functions for Tcl in Quartus II Help. Refer to the
Quartus II Settings File Manual for information about all settings and constraints in the
Quartus II software. For more information about command-line scripting, refer to the
Command-Line Scripting chapter in volume 2 of the Quartus II Handbook.
You can specify many of the options described in this section on either an instance or
global level, or both.
Use the following Tcl command to make a global assignment:
set_global_assignment -name <QSF variable name> <value> r
Use the following Tcl command to make an instance assignment:
set_instance_assignment -name <QSF variable name> <value> \
-to <instance name> r
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 16: Netlist Optimizations and Physical Synthesis
Scripting Support
16–13
Synthesis Netlist Optimizations
Table 16–2 lists the Quartus II Settings File (.qsf) variable names and applicable values
for the settings discussed in “WYSIWYG Primitive Resynthesis” on page 16–1. The
.qsf file variable name is used in the Tcl assignment to make the setting along with the
appropriate value. The Type column indicates whether the setting is supported as a
global setting, an instance setting, or both.
Table 16–2. Synthesis Netlist Optimizations and Associated Settings
Setting Name
Quartus II Settings File Variable Name
Values
Type
Perform WYSIWYG
ADV_NETLIST_OPT_SYNTH_WYSIWYG_
Primitive Resynthesis REMAP
ON, OFF
Global,
Instance
Optimization
Technique
<Device Family Name>_
OPTIMIZATION_TECHNIQUE
AREA, SPEED,
BALANCED
Global,
Instance
Power-Up Don’t Care
ALLOW_POWER_UP_DONT_CARE
ON, OFF
Global
Save a node-level
netlist into a
persistent source file
LOGICLOCK_INCREMENTAL_COMPILE_ASSIGNMENT
ON, OFF
LOGICLOCK_INCREMENTAL_COMPILE_FILE
<file name>
ADV_NETLIST_OPT_ALLOWED
"ALWAYS ALLOW",
DEFAULT, "NEVER
ALLOW"
Allow Netlist
Optimizations
Global
Instance
Physical Synthesis Optimizations
Table 16–3 lists the .qsf file variable name and applicable values for the settings
discussed in “Performing Physical Synthesis Optimizations” on page 16–3. The .qsf
file variable name is used in the Tcl assignment to make the setting, along with the
appropriate value. The Type column indicates whether the setting is supported as a
global setting, an instance setting, or both.
Table 16–3. Physical Synthesis Optimizations and Associated Settings (Part 1 of 2)
Setting Name
Quartus II Settings File Variable Name
Values
Type
Physical Synthesis
for Combinational
Logic
PHYSICAL_SYNTHESIS_COMBO_LOGIC
ON, OFF
Global
Automatic
Asynchronous Signal
Pipelining
PHYSICAL_SYNTHESIS_ASYNCHRONOUS_
SIGNAL_PIPELINING
ON, OFF
Global
Perform Register
Duplication
PHYSICAL_SYNTHESIS_REGISTER_DUPLICATION
ON, OFF
Global
Perform Register
Retiming
PHYSICAL_SYNTHESIS_REGISTER_RETIMING
ON, OFF
Global
Power-Up Don’t Care
ALLOW_POWER_UP_DONT_CARE
ON, OFF
Global,
Instance
Power-Up Level
POWER_UP_LEVEL
HIGH,LOW
Instance
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
16–14
Chapter 16: Netlist Optimizations and Physical Synthesis
Conclusion
Table 16–3. Physical Synthesis Optimizations and Associated Settings (Part 2 of 2)
Setting Name
Allow Netlist
Optimizations
Save a node-level
netlist into a
persistent source file
Quartus II Settings File Variable Name
Values
ADV_NETLIST_OPT_ALLOWED
"ALWAYS
ALLOW",
DEFAULT,
"NEVER
ALLOW"
LOGICLOCK_INCREMENTAL_COMPILE_ASSIGNMENT
ON, OFF
LOGICLOCK_INCREMENTAL_COMPILE_FILE
<file name>
Type
Instance
Global
Incremental Compilation
For information about scripting and command line usage for incremental compilation
as mentioned in “Preserving Your Physical Synthesis Results” on page 16–10, refer to
the Quartus II Incremental Compilation for Hierarchical and Team-Based Design chapter in
volume 1 of the Quartus II Handbook.
Back-Annotating Assignments
You can use the logiclock_back_annotate Tcl command to back-annotate resources
in your design. This command can back-annotate resources in LogicLock regions, and
resources in designs without LogicLock regions.
For more information about back-annotating assignments, refer to “Preserving Your
Physical Synthesis Results” on page 16–10.
The following Tcl command back-annotates all registers in your design:
logiclock_back_annotate -resource_filter "REGISTER"
The logiclock_back_annotate command is in the backannotate package.
Conclusion
Physical synthesis optimizations restructure and optimize your design netlist. You
can take advantage of these Quartus II netlist optimizations to help improve your
quality of results.
Document Revision History
Table 16–4 shows the revision history for this chapter.
Table 16–4. Document Revision History (Part 1 of 2)
Date
December 2010
July 2010
Version
10.0.1
10.0.0
Changes
Template update.
■
Added links to Quartus II Help in several sections.
■
Removed Referenced Documents section.
■
Reformatted Document Revision History
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
December 2010 Altera Corporation
Chapter 16: Netlist Optimizations and Physical Synthesis
Document Revision History
16–15
Table 16–4. Document Revision History (Part 2 of 2)
Date
Version
November 2009
March 2009
9.1.0
9.0.0
November 2008
8.1.0
May 2008
8.0.0
Changes
■
Added information to “Physical Synthesis for Registers—Register Retiming”
■
Added information to “Applying Netlist Optimization Options”
■
Made minor editorial updates
■
Was chapter 11 in the 8.1.0 release.
■
Updated the “Physical Synthesis for Registers—Register Retiming” and“Physical
Synthesis Options for Fitting”
■
Updated “Performing Physical Synthesis Optimizations”
■
Deleted Gate-Level Register Retiming section.
■
Updated the referenced documents
Changed to 8½” × 11” page size. No change to content.
■
Updated “Physical Synthesis Optimizations for Performance on page 11-9
■
Added Physical Synthesis Options for Fitting on page 11-16
f For previous versions of the Quartus II Handbook, refer to the Quartus II Handbook
Archive.
f Take an online survey to provide feedback about this handbook chapter.
December 2010
Altera Corporation
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
16–16
Quartus II Handbook Version 11.0 Volume 2: Design Implementation and Optimization
Chapter 16: Netlist Optimizations and Physical Synthesis
Document Revision History
December 2010 Altera Corporation
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement