advertisement
UM1237
User manual
STxP70 compiler
Overview
The purpose of the STxP70 compilation driver (stxp70cc) is to manage the stages of the compilation process: preprocessing, compiling into assembly language, assembling and linking. The assembler file is compiled using stxp70-as and linked using stxp70-ld to provide an STxP70 binary image. All these phases are hidden using the driver tool
stxp70cc.
•
•
•
•
•
•
This user manual provides detailed information to enable users to write efficient code optimized to run on the STxP70 processors and to compile and link it ready for execution by
sxrun. The manual covers:
stxp70cc driver options pragmas supported by stxp70cc compiler optimization techniques
GNU C language extensions
GNU asm construct built-in functions
The load/run tool sxrun and the STxP70 debugger sxgdb are described in the STxP70
Professional Toolset user manual (7833754).
May 2013 8027948 Rev 15 1/166
www.st.com
Contents
Contents
UM1237
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1
2
STxP70 development system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Toolset software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Example command-lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Compiling code for STxP70-3 or STxP70-4 . . . . . . . . . . . . . . . . . . . . . . 14
stxp70cc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.8
C preprocessor options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Debugging options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Code coverage options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Call trace instrumentation options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Code generation options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Position independent code generation (PIC) . . . . . . . . . . . . . . . . . . . . . 38
Sending options to a specific phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2/166 8027948 Rev 15
UM1237
3
4
Contents
Directory and library options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Pragmas short description and syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Loop optimization pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
#pragma loopmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
#pragma loopmin<itercount> (minc) . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
#pragma loopmax<itercount> (maxc) . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Code generation pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Miscellaneous pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
#pragma disable_extgen ( fct1, fct2, ... ) . . . . . . . . . . . . . . . . . . . . . . . . 53
#pragma force_extgen ( fct1, fct2, ... ) . . . . . . . . . . . . . . . . . . . . . . . . . . 53
#pragma disable_specific_extgen ( extname[, fct1, fct2, ... ] ) . . . . . . . . 53
#pragma force_specific_extgen ( extname[, fct1, fct2, ... ] ) . . . . . . . . . 54
Optimization guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
stxp70cc inlining options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Advanced control of the unroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8027948 Rev 15 3/166
Contents
5
UM1237
Built-in assume and pragma loopmod . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Memory dependences in C programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Aliasing rules in C/C++ programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Profiling data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Special case of programs that never exit . . . . . . . . . . . . . . . . . . . . . . . . 71
Amount of heap required for profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Instrumenting functions: -finstrument-functions . . . . . . . . . . . . . . . . . . . 74
Instrumenting calls to functions: -minstrument-calls . . . . . . . . . . . . . . . 74
Interprocedural analysis optimization (IPA) . . . . . . . . . . . . . . . . . . . . . . . 76
IPA command line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Limitations and special cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Floating-point code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Precision of floating-point arithmetic in programs . . . . . . . . . . . . . . . . . 79
Controlling the precision of floating-point . . . . . . . . . . . . . . . . . . . . . . . . 79
Use of STxP70 with FPx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Examples of floating-point arithmetic on the STxP70 . . . . . . . . . . . . . . 80
General description and purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Description and syntax of an ACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
ACF grammar description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Behavior of -macf-template option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Scope and known limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
GNU C extensions supported by stxp70cc . . . . . . . . . . . . . . . . . . . . . . 90
Extensions to the C language family . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Statements and declarations in expressions . . . . . . . . . . . . . . . . . . . . . 90
Locally declared labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Naming an expression's type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Referring to a type with typeof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4/166 8027948 Rev 15
UM1237
6
Contents
Generalized Lvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Conditionals with omitted operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Specifying a register for a local variable . . . . . . . . . . . . . . . . . . . . . . . . 93
Macro with variable number of arguments . . . . . . . . . . . . . . . . . . . . . . . 95
Strings literals with embedded newlines . . . . . . . . . . . . . . . . . . . . . . . . 97
Non-Lvalue arrays may have subscripts . . . . . . . . . . . . . . . . . . . . . . . . 97
Arithmetic on void and function pointers . . . . . . . . . . . . . . . . . . . . . . . . 98
Dollar signs in identifier names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Prototypes and old-style function definitions . . . . . . . . . . . . . . . . . . . . 100
C++ comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Character ESC in constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Inquiring on alignment of types or variables . . . . . . . . . . . . . . . . . . . . 100
Incomplete enum type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Function names as strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Placement and layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Miscellaneous attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
GNU ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Differences between the STxP70 core versions . . . . . . . . . . . . . . . . . . .112
8027948 Rev 15 5/166
Contents
7
8
UM1237
GNU ASM optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112
Parsing and optimization of GNU assembly statement . . . . . . . . . . . . . .114
Built-in functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Header files and C-models files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115
General naming scheme, relevant files . . . . . . . . . . . . . . . . . . . . . . . . 116
Types and special built-ins for audio scalar/SIMD extensions . . . . . . . 117
Using built-ins on an STxP70 platform . . . . . . . . . . . . . . . . . . . . . . . . 120
Standard use of built-in C-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Use of built-in C-models on STxP70 target . . . . . . . . . . . . . . . . . . . . . 121
MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Goal of the MPx scalar support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Control of the MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Function pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Scope of the MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Built-in based support with MPx_Vx type . . . . . . . . . . . . . . . . . . . . . . 123
Support of type equivalence between long long and MPx_Vx . . . . . . . 124
Automatic MPx code generation on long long arithmetic . . . . . . . . . . . 124
Pattern recognition for integer and fractional data types . . . . . . . . . . . 125
Automatic code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Operations mapped to single MPx instructions . . . . . . . . . . . . . . . . . . 128
Operations mapped to meta-instructions . . . . . . . . . . . . . . . . . . . . . . . 128
Important remarks and known limitations . . . . . . . . . . . . . . . . . . . . . . . . 129
Avoid mixing MPx and long long . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Long long passed as function parameters . . . . . . . . . . . . . . . . . . . . . . 129
Long long life span crossing function call . . . . . . . . . . . . . . . . . . . . . . 129
Efficiency of code in meta-instructions . . . . . . . . . . . . . . . . . . . . . . . . . 130
Mapping exact conversions and single statement expressions . . . . . . 130
Limitations regarding mapping of fractional instructions . . . . . . . . . . . 131
6/166 8027948 Rev 15
UM1237
9
10
Contents
Unsupported mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Direct mapping of long long arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 132
Meta-instruction, case of a long long max . . . . . . . . . . . . . . . . . . . . . . 133
Case of the 32-bit multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Relocatable loader library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Introduction to dynamic linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Position-independent code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
The dynamic loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Introduction to the relocatable loader library . . . . . . . . . . . . . . . . . . . . . 142
Run-time model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Relocatable run-time model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Relocatable loader library API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Building a relocatable library or main module . . . . . . . . . . . . . . . . . . . . 157
Importing and exporting symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Optimization options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Memory protection support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
STxP70 targeting of RL_LIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Compiler bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Identifying a compiler bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8027948 Rev 15 7/166
Contents
11
UM1237
Checks performed by user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Reporting a compiler bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Known bugs and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8/166 8027948 Rev 15
UM1237
Preface
Preface
This document is part of the documentation suite detailed below. Comments on this or other manuals in the documentation suite should be made by contacting your local
STMicroelectronics sales office or distributor.
Documentation suite
STxP70 compiler user manual (8027948)
This manual describes the C compiler for STMicroelectronics STxP70 cores.
STxP70 Professional Toolset user manual (7833754)
This document explains the toolset architecture and provides information about how to develop and debug applications running on STxP70 systems.
Advanced debugging with the STxP70-4 instruction-accurate simulator
(Doc ID 024404)
This document describes the commands implemented in the instruction-accurate simulator for debugging applications.
STxP70 utilities reference manual (8210925)
This document provides in a single volume, command line reference for each of the generic,
STxP70-3 and STxP70-4 utilities provided with the STxP70 toolset that are not documented elsewhere. For each utility, the manual provides a command line synopsis, a brief description of the utility, the complete list of options that are available, and its return value.
Building STxP70 libraries application note (8226669)
This document explains how to produce a set of standard libraries for the STxP70 compilation tools optimized for the user’s specific purposes.
Conventions used in this guide
General notation
•
•
•
•
•
The notation in this document uses the following conventions: sample code
, keyboard input and file names,
variables, code variables and code comments, equations
and math,
screens, windows, dialog boxes and tool names,
instructions.
8027948 Rev 15 9/166
Preface UM1237
Software notation
•
•
•
•
•
•
•
Syntax definitions are presented in a modified Backus-Naur Form (BNF).
Terminal strings of the language, that is those not built up by rules of the language, are printed in teletype font. For example, void.
Non-terminal strings of the language, that is those built up by rules of the language, are printed in italic teletype font. For example, name.
If a non-terminal string of the language starts with a non-italicized part, it is equivalent to the same non-terminal string without that non-italicized part. For example, vspace-name
.
Each phrase definition is built up using a double colon and an equals sign to separate the two sides (‘::=’).
Alternatives are separated by vertical bars (‘|’).
Optional sequences are enclosed in square brackets (‘[’ and ‘]’).
Items which may be repeated appear in braces (‘{’ and ‘}’).
10/166 8027948 Rev 15
UM1237
1 STxP70 development system
Note:
STxP70 development system
The purpose of the stxp70cc compilation driver is to translate a program written in the C language into the STxP70 assembly language so that is suitable for assembly, linking, and execution. The assembler file is compiled using stxp70-as and linked using stxp70-ld
(a)
to provide an STxP70 binary image. All these phases are hidden using the driver tool
stxp70cc.
The stxp70cc compilation driver and core compiler are common to both STxP70 versions 3 and 4. A specific command line and GUI option can be used to generate code for either
target. See Section 1.2.2: Compiling code for STxP70-3 or STxP70-4 on page 14
.
The stxp70cc compiler uses the GNU C language parser, and implements state-of-the art compiler optimizations. Thanks to this GNU C language parser, the stxp70cc compiler is closely compatible with the GNU C compiler, both at the driver level, and on C language extensions (GNU Compiler Collection project; see
http://www.gnu.org/software/gcc/gcc.html
). The processor-independent compiler optimizations available in the stxp70cc compiler are mostly inherited from the Open64 project hosted on SourceForge; see
http://open64.sourceforge.net
. Other compiler optimizations that are specific to the STxP70 family of processors have been developed by
STMicroelectronics.
•
•
•
•
•
•
•
•
These include: use of hardware loop mechanisms of the STxP70 core (hardware loops and
JRGTUDEC instructions) use of the special addressing modes of the STxP70 core use of the memory space defined in the STxP70 ABI in order to increase memory accesses efficiency aggressive instruction selection including mapping of the user boolean variables to the branch registers instruction scheduling aggressive transformation of loops compiler intrinsics and built-ins support compiler to support X3, FPX and MPx extensions
The binary image can be executed on a STxP70 hardware target or by using the sxrun simulator or the sxgdb debugger. The binary format used for the image is ELF and the debug format is DWARF2.
Where applicable, the available options are accessible through a command-line interface similar to the UNIX style. This will be familiar to most gcc and cc users. The toolset is installed in a directory structure which also follows the UNIX structure, that is bin and lib.
Wherever possible, compatibility with the options of the former sxcc compiler has been preserved.
The compiler supports the ANSI C89 standard and partially supports the ANSI C99 standard, see
Section 2.4: C99 support on page 41
a. For usage information see the GNU linker document “Using ld” that is supplied with the toolset.
8027948 Rev 15 11/166
STxP70 development system UM1237
The STxP70 Professional Toolset is a set of tools that allow C programs compiled for an
STxP70 target to be simulated on a host workstation or executed on an STxP70 target.
The STxP70 Professional Toolset is mainly intended for tool developers, for operating system development and for applications that require modeling interrupts and real-time behavior. It includes the whole set of tools that manipulate STxP70 object files, including the
STxP70 assembler, compiler, linker, load/run tool, debugger and archiver. Here, STxP70 assembler files are translated to STxP70 object files that the linker merges to produce an
STxP70 executable image. This image file does not run natively on the host workstation and requires an interpreter to be executed. See
Section 1.2.1: Example command-lines on page 13
shows the main components of the STxP70 Professional
Toolset (when IPA is not used).
Figure 1.
Components of the STxP70 Professional Toolset interfaces
.c
source files
STxP70 C Compiler
STxP70 assembler files (.s)
STxP70 assembler
(stxp70-as)
STxP70 object file STxP70 object file
STxP70 linker
(stxp70-ld)
STxP70 binary (.elf)
STxP70 libraries target board boot and sysconf files
STxP70 load/run tool
(sxrun)
STxP70 debugger
(sxgdb)
12/166 8027948 Rev 15
UM1237
1.2 Toolset software requirements
STxP70 development system
The stxp70cc compiler produces an STxP70 object file in STxP70 object file formats (ELF).
See the relevant chapter in the STxP70 ABI manual (7937486) for details.
Assuming that we want to compile two files file1.c and file2.c into an STxP70 executable a.elf, the set of commands to issue is:
$[1] stxp70cc –c file1.c
$[2] stxp70cc –c file2.c
$[3] stxp70cc –o a.elf file1.o file2.o
This assumes that the user has sourced the appropriate shell file in the <tools-
dir>/bin
folder. In most cases, the one needed is STxP70.csh. This ensures that all needed configuration environment variables are properly set.
Command [1] causes the following steps to be executed:
<tools-dir>/bin/stxp70cc # stxp70cc driver
<tools-dir>/lib/cpp <cpp_flags> file1.c file1.i # C preprocessor
<tools-dir>/lib/cmplrs/<C compiler> <C Compiler flags> file1.i file1.s
# C compiler
<tools-dir>/bin/stxp70-as <stxp70-as_flags> file1.s file1.o # STxP70 Assembler
Command [2] causes the following steps to be executed:
<tools-dir>/bin/stxp70cc # stxp70cc driver
<tools-dir>/lib/cpp <cpp_flags> file2.c file2.i # C preprocessor
<tools-dir>/lib/cmplrs/<C compiler> <C Compiler flags> file2.i file2.s
# C compiler
<tools-dir/bin/stxp70-as <stxp70-as_flags> file2.s file2.o # STxP70 Assembler
Command [3] causes the link stage to be executed. Please refer to the STxP70 linker user
manual for further details.
Once steps [1] to [3] are completed, an STxP70 executable binary a.elf is generated. This can be executed using the stand-alone driver for the load/run tool (available as sxrun) in the following way:
$[4] sxrun a.elf
This causes the a.elf STxP70 binary to be “interpreted” by the sxrun command. The simulator also provides some minimal tracing, cycle counting and statistics facilities.
8027948 Rev 15 13/166
STxP70 development system
1.2.2 Compiling code for STxP70-3 or STxP70-4
UM1237
By default, the code is compiled for STxP70-3. However a dedicated command line option can be used to compile code for STxP70-4. In the example below, lines [1] and [2] generate a version 3 executable and line [3] generates an executable for version 4:
$[1] stxp70cc file1.c
$[2] stxp70cc -mcore=stxp70v3 file2.c
$[3] stxp70cc -mcore=stxp70v4 file3.c
Except for a few instructions, the STxP70-3 and STxP70-4 are assembly compatible. They are not binary compatible. More details are provided in following sections.
Warning: The assembly codes provided as an example in this document make use of the STxP70-3 assembly syntax. On
STxP70-4, it is now possible to form bundles of one or two instructions. Two successive bundles must be separated by a “;;” pattern. Two successive lines not separated by a “;;” are considered as a single bundle, meaning the two instructions will be emitted in the same cycle.
14/166 8027948 Rev 15
UM1237
2 stxp70cc
stxp70cc
The stxp70cc compiler is similar to any command-line compiler. It is either invoked from a command line interpreter or from a Makefile and implicitly recognizes files by their extension.
2.1
2.1.1
Invoking the compiler
The C compiler is invoked using the stxp70cc command: stxp70cc {<argument>} where:
<argument> = <option> | <input_file>
Examples: stxp70cc -S file.c # produces file.s stxp70cc -c file.c # produces file.o
Conflicting options are resolved by using the last option on the command line.
Input and output
File extension naming conventions are summarized in
and
.
Table 1.
.c
.h
.i
.s
.S
Input names conventions
Extension Convention
C language source file to be pre-processed and compiled
C language header file
C language source file already pre-processed
Assembly language source file to be assembled
Assembly language source file to be pre-processed and assembled
Table 2.
Output names conventions
Extension Convention
.s
.o
Assembly language output file
Object file
Produced by option(s)
-S
-c
The final executable file does not need to have a specific file extension. If no output file name is specified through the -o option, the executable generated is named a.out.
Examples: stxp70cc file.c # generates the executable a.out stxp70cc file.c -o file.u # generates the executable file.u
8027948 Rev 15 15/166
stxp70cc UM1237
This section provides information on the command line options of stxp70cc.
If the compiler driver is given the -help option, it displays the list of available options, and then terminates.
Additionally, the -help option can be followed by an additional keyword separated from the help option by a colon. All entries matching the keyword are displayed on the standard output, for example: stxp70cc -help:W
This command displays all options containing the -W string. In this example, all options related to the emission of compiler warnings are listed.
control the type of processing performed by stxp70cc and the output
it generates, for example: an executable, an object file, an assembler file, a pre-processed file, an archive or a dependency list.
Output files produced by these options default to
<original_file_name>.<output_extension>
and can be renamed using the -o option.
Table 3.
-c
-S
-E
-v
--version
-dumpversion
-keep
-keep_dir
Overall options
Option Description
Compile or assemble the source file, but do not link.
Stop after the compilation phase.
Stop after the preprocessing phase. Output is send to stdout.
Print on stderr the commands executed to run the compilation phases. The message generated indicates the release identity.
Display the version numbers of the invoked compiler and stop.
Print the compiler front-end version (for example, 3.3.3) and stop.
Keep intermediary files produced by the compilation phases in the current folder.
Used in combination with -keep or -Mkeepasm, this option specifies the location to be used to store intermediate files.
16/166 8027948 Rev 15
UM1237
2.2.3
stxp70cc
stxp70cc core selection option
•
•
•
The STxP70 tools delivered in the STxP70 toolset R4.0.0 and higher, support both STxP70-
3 and STxP70-4. The STxP70-4 is different from STxP70-3 in three ways: it implements a variable length encoding of the instruction set (VLIS) it implements dual issue it supports dual arithmetic and logic unit (ALU) configuration
The -mcore option must be used to select the version of the core. By default, the code is compiled for STxP70-3. The STxP70-3 and STxP70-4 are assembly compatible, except for a few instructions.
In the examples below, line [1] and [2] generate a version 3 executable and line [3] generates an executable for version 4:
$[1] stxp70cc file1.c
$[2] stxp70cc -mcore=stxp70v3 file2.c
$[3] stxp70cc -mcore=stxp70v4 file3.c
Table 4.
stxp70v3 stxp70v4
The core selection -mcore option
Option Description
Assembly, object and binary files are generated for single issue, fixed length encoding STxP70-3
Assembly, object and binary files are generated for single/dual issue, variable length encoding STxP70-4
Note:
2.2.4
The set of options that must/can be set is strongly dependent on the core selected. This is especially true for the configuration and code generation options presented in the tables of the next section. Namely, the STxP70-4 can be configured for single or dual issue, as well as single or dual ALU. Each of those choices corresponds to specific compiler options.
stxp70cc compiler generic options
Prefixes of generic options
provide generic means to pass fine grain options to either phase of
the compiler.
Table 5.
Generic options
Option
-Msxflag
-W<phase>,<arg>
-Y<phase>,<path>
Description
stxp70cc interprets the -Msxflag option as an extra code generation or environment option. The list of possible sxflags listed in
be noticed that, due to the GNU front-end, the -M prefix is also used for dependency handling options.
This option is used to pass arguments to a specific phase. The phase names are p, f, b, a, l for pre-processor, front-end, back-end, assembler and linker respectively.
This option is used to change the path to one of the phases. The phase names are p, f, b, a, l, I, S, L for pre-processor, front-end, back-end, assembler, linker, include, startup, libraries respectively.
8027948 Rev 15 17/166
stxp70cc UM1237
Code generation/configuration and environment options with -M prefix
lists the options that can be used with the -M flag. These options have a special
status, as they ensure backward compatibility with the sxcc compiler. Due to the differences in compiler internals, some options have been adapted or removed.
The options that accept further controls are described in the following pages.
Several of the options are able to place certain data items into specific areas of memory called special data areas. See
Section 5.2.1: Placement and layout on page 101
for information about the special data areas.
correspond to code generation and environment options that can be
set using the generic -M flag.
Table 6.
Generic options with -M flag
Option Description
config[=context:<n>|
regbank:<n>|
mult:<n>|
bypass:<n>|
bhb:<n>|
efuif:<n>|
mfuif:<n>|
extmemif:<n>|
itcnodes:<n>|
noevc|
evcglobal:<n>|
evclocal:<n>|
hwloop:<n>|
dcache:<n>|
dmsize:<n>|
pcache:<n>|
pmsize:<n>|
pixel:<n>|
pixelsize:<n>|
rompatch:<n>|
maxszmis:<n>|
minadmis:<n>|
vliw:<n>]
Defines the processor configuration. Further information on these
Code generation and configuration controls on page 19
. The assembler performs some consistency checking based
on this configuration option.
The last option (vliw) is only available on STxP70-4.
It is possible to combine several suboptions in a single -Mconfig option bundle. In this case, suboptions must be separated by a “,”. For instance: -Mconfig=vliw:no,noevc da[={<n>|all}] sda[={<n>|all}] tda[={<n>|all}] enablefractgen
Places certain data items in the data area (GP-based on 32 Kbytes)
Places certain data items in the small data area (GP-based on 4 Kobjects)
Places certain data items in the low memory or tiny data area (32-Kbyte size)
Deprecated, and replaced by extoption. Allows the compiler to
generate fractional instructions of the MPx. Refer to
Chapter 8: MPx native support on page 122
18/166 8027948 Rev 15
UM1237 stxp70cc
Table 6.
Generic options with -M flag (continued)
Option Description
extension[=fpx|MP1x]
[:novliw|
single|
dual] extoption=extension:
option
extrcdir=directory_
path
farcall got[=small|standard| large] hwloop[=option] itstackalign=<n> keepasm mode16 lib16 lib32 nostartup
Connects extension (MP1x, fpx), using the specified VLIW configuration. When STxP70-3 is used, only the novliw suboption can be specified.
MP1x
has been supported through intrinsics and specific types since compiler version 3.2.0. Version 3.3.0 introduces so called “native support”, which provides automatic code generation from pure C
Chapter 8: MPx native support on page 122
Pass a given option to the extension. Refer to
Chapter 8: MPx native support on page 122
Use the stxp70extrc user-defined extension definition file from the directory found on directory_path.
Specifies that all calls and jumps are far (with absolute addresses).
Select the global offset table (GOT) model for position independent code and data (PIC and PID) generation. See
Chapter 9: Relocatable loader library on page 136
Controls use of hardware loops feature.
Further information can be
Code generation and configuration controls on page 19
This option instructs the compiler to align the stack of interruption routines to the specified boundary, as a number of bytes. Default is 8
(that is, 64 bits).
Preserve intermediate files. Files are located in local folder by default.
The -keep_dir option can be used in combination to specify a different folder where intermediate files must be stored.
Instructs the compiler to use the 16 register set (instead of the default
32 register set). Notice that contrary to the -Mconfig option, this option is not a configuration option. No checking is made at assembler level regarding the register indices.
Instructs the compiler to link in a version of the C library that uses the
16 register set.
Instructs the compiler to link in a version of the C library that uses the
32 register set.
Instructs the linker not to use standard boot sequence but one provided by the user.
The control for the options listed in
Code generation and configuration controls
and
Environment controls on page 25
Code generation and configuration controls
The code generation and configuration controls are listed below.
-Mconfig[=context:<n>|regbank:<n>|mult:<n>|efuif:<n>|mfuif:<n>|
extmemif:<n>|itcnodes:<n>|noevc|evcglobal:<n>|evclocal:<n>|
hwloop:<n>|dcache:<n>|dmsize:<n>|pcache:<n>|pmsize:<n>|pixel:<n>|
pixelsize:<n>|rompatch:<n>|maxszmis:<n>|minadmis:<n>|vliw:<n>]
Use -Mconfig to specify the configuration of the STxP70 core IP. The subflags to this option are listed in
.
8027948 Rev 15 19/166
stxp70cc UM1237
Table 7.
Subflags allowed in the -Mconfig option
Subflag Description
context:<n> regbank:<n> mult:<n> bypass:<n> bhb:<n> efuif:<n> mfuif:<n> extmemif:<n>
Defines context number. Where n can be: 1 | 2 | 4 | 8.
Defines register bank number. Where n can be: 1 | 2.
Defines multiplier implementation. Where n can be: yes | no.
Note that using the FPX enables the multiplier as well.
Defines memory bypass configuration. Where n can be: no | mem2_exe
. (mem2_exe indicates a bypass is implemented between the memory2 and execution stages of the pipeline. When a bypass is present, the load-use penalty is one cycle instead of two cycles when the bypass is not implemented.)
Defines branch history buffer configuration. Where n can be: yes | no
.
Defines extension functional unit interface width. Where n can be: no | 32 | 64 | 128 | 256 | 512
.
Defines MFU interface width. Where n can be: no | 32 | 64 | 128 | 256 | 512
.
Defines external memory interface width. Where n can be: no | 32 | 64
.
itcnodes:<n>
Defines ITC number of nodes. Where n can be: no | 8 | 16 | 32.
noevc implementation.
evcglobal:<n> evclocal:<n>
Defines EVC number of global events. Where n can be:
4 | 8 | 16 | 32
.
Defines EVC number of local events. Where n can be:
4 | 8 | 16 | 32
.
hwloop:<n> dmsize:<n>
Defines hardware loop implementation. Where n can be: no | bycxt | forall
.
Defines data memory size. Where n can be: no | 512 | 1k | 2k | 4k | 8k | 16k | 32k | 64k |
128k | 256k | 512k | 1M | 2M | 4M
.
dcache:<n> pmsize:n<n> pcache:<n> pixel:<n> pixelsize:<n> rompatch:<n> maxszmis:<n>
Defines data cache implementation. Where n can be: yes | no.
Defines program memory size. Where n can be: no | 512 | 1k | 2k | 4k | 8k | 16k | 32k | 64k |
128k | 256k | 512k | 1M | 2M | 4M
.
Defines program cache implementation. Where n can be: yes | no.
Defines the pixel mode implementation. Where n can be: yes | no.
Defines the pixel data size. Where n can be: 8 | 10 | 12 | 14.
Defines the ROM patch controller implementation. Where n can be: yes | no
.
Defines the size of the largest memory access supporting misalignment. Where n can be no | 2 | 4 | 8 | 16 | 32 | 64.
20/166 8027948 Rev 15
UM1237 stxp70cc
Table 7.
Subflags allowed in the -Mconfig option (continued)
Subflag Description
minadmis:<n> vliw:<n>
Defines the minimal address alignment at which misaligned memory accesses are supported. Where n can be: no | 2 | 4 | 8 | 16 | 32
.
This STxP70-4 specific option indicates the number of issues and
ALUs available on the core. The value of n can be: no | singlecoreALU | dualcoreALU
. The values of those options must be interpreted as follows:
– no: the core is single issue, single ALU,
– singlecoreALU: the core is dual issue, single ALU,
– dualcoreALU: the core is dual issue, dual ALU.
If the vliw option is not set and code is compiled for STxP70-4, then the default behavior corresponds to -Mconfig=vliw:no.
By default, -Mconfig enables four contexts, two register banks, multiplier, no memory bypass, no branch history buffer (BHB), 32-bit EFU interface, 32-bit MFU interface,
32-bit external memory interface, eight ITC nodes, EVC with 16 global and 16 local events, two hardware loops for all contexts, 4 Mbytes data memory, no data cache,
4 Mbytes program memory, no pcache, no pixel support, no ROM patch support, no misaligned memory access and single issue architecture.
-Mda[={ <n> | all }]
Place data objects of aggregate alignment <= n bytes in the region of memory called the medium data area (DA). It is possible to generate optimized (that is, shorter) addresses for data in the medium data area. (GP-based addugp is used instead of
make and more.)
The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size constraint. -Mda is equivalent to -Mda=all.
Notice that -Mda options are ignored if IPA memory placement is enabled. Refer to
Section 4.8: Interprocedural analysis optimization (IPA) on page 76
-Msda[={ <n> | all }]
Place data objects of aggregate alignment <= n bytes in the region of memory called the small data area (SDA). It is possible to generate optimized (that is, shorter) addresses for data in the small data area. (GP-based addressing mode can be used, thus constructing the address and performing the access itself in the same instruction.)
The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size constraint. -Msda is equivalent to -Msda=all.
In the case of a structure that contains fields of different types, the decision of where to place the variable depends on the alignment of the largest data types, whereas the choice of the section to be used depends on the size of the smallest field. This means that a structure with both int and char fields is placed if option is either -Msda=all or -Msda=4. If placement is achieved, then the structure is placed in SDA1.
Notice that -Msda options are ignored if IPA memory placement is enabled. Please refer to
Section 4.8: Interprocedural analysis optimization (IPA) on page 76
for further details.
8027948 Rev 15 21/166
stxp70cc UM1237
-Mtda[={ <n> | all }]
Place data objects of aggregate alignment <= n bytes in the region of memory called the low memory data area (TDA). It is possible to generate optimized (that is, shorter) addresses for data in the low memory area. Addresses in the TDA area are encoded using a maximum of 15 bits and therefore may be constructed using a single make instruction.
The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size constraint. -Mtda is equivalent to -Mtda=all.
-Mdarange=[minSize],maxSize
Use data area (DA) addressing mode on selected variables with a size between
minSize
and maxSize bytes.
-Msdarange=[minSize],maxSize
Use small data area (SDA) addressing mode on selected variables with a size between
minSize
and maxSize bytes.
-Mtdarange=[minSize],maxSize
Use tiny data area (TDA) addressing mode on selected variables with a size between
minSize
and maxSize bytes.
-Menablefractgen
Enables generation of the fractional instructions when MP1x is present. This option was formerly named -Mfractsupport. These two options are now deprecated, and
replaced by the suboption -Mextoption. Refer to
Chapter 8: MPx native support on page 122
for further details on this option.
-Mextension[=fpx|MP1x][:novliw|single|dual]
Only the X3 extension is connected by default. (This means that the corresponding option x3 is no longer available.)
Connect extension fpx to the core to enable floating point arithmetic. Activating fpx allows the compiler to generate floating point extension specific instructions, which includes native floating point (32-bit) arithmetic instructions and some integer instructions (such as multiply, divide) that completes core integer support.
MP1x
has been supported in the compiler since version 3.2.0 using built-in functions and specific data types. Version 3.3.0 introduces the so-called “native support” of the
MPx extension. This means that the compiler can generate code that makes use of
MPx registers and instruction from pure C code (that is, even if no MPx built-in functions and types are present). More details can be found in
Chapter 8: MPx native support on page 122
.
The vliw configuration can be specified for the extension. On extension for STxP70-3, only the novliw configuration can be used.
-Mextoption
Used to pass different options to the extensions. Refer to
Chapter 8: MPx native support on page 122
for further details on this option.
22/166 8027948 Rev 15
UM1237 stxp70cc
-Mextrcdir=directory_path
Specifies where to find a particular extension package, which may be a location outside the user workspace. The -Mextrcdir option enables the user to switch between different extensions, stored in different locations. Full directory paths are recommended but are not mandatory.
The directory path specified to -Mextrcdir must include the sub-directory _STxP70-
Extension_
where the stxp70extrc file is located. (This is the directory/file structure used by sximport when the extension is imported. sximport creates/updates an extension configuration file called stxp70extrc and puts it in the subdirectory _STxP70-Extension_. stxp70extrc indicates where different files relating to the extension are located, for example header files, libraries).
For example: stxp70cc -Mextension=MP1x -Mextrcdir=My_Extrcdir/_STxP70-Extension_
This command sets the directory path to find the extension package in
My_Extrcdir/_STxP70-Extension_
.
The compiler checks that the location specified by -Mextrcdir contains the file stxp70extrc
.
If the -Mextrdir option is not specified, the {SX}/sxext/_STxP70-Extension_ directory is used by default.
The STxP70 Utilities manual (8210925) documents several utilities that interact with the extension package, for example sximport, stxp70-elfdump, stxp70objcopy
.
The STxP70 User-defined extension methodology guide (8175272), “How to integrate
an Extension in an application” chapter, gives further information about extension libraries.
-Mfarcall
Specify that all calls are far. The compiler generates a calling sequence composed of a make/more/calla
sequence instead of callr.
•
•
-Mhwloop[=option]
Controls hardware loop code generation. The default, (-Mhwloop specified with no suboptions), is equivalent to:
-Mhwloop=all
if core configuration includes hardware loops
-Mhwloop=jrgtudeconly
if core configuration does not include hardware loops
option
can be any of the values listed in
.
8027948 Rev 15 23/166
stxp70cc UM1237
Table 8.
List of options for -Mhwloop
Option Description
none
Disables hardware loop and special jump code generation. By hardware loop, we mean setle/ls/lc structures; by special jump, we mean jrgtudec special jumps. However, hardware loops forced by means of pragmas are still generated if supported by core configuration.
jrgtudeconly setle/ls/lc
hardware loop code generation.
However, hardware loops forced by means of pragmas are still generated if supported by core configuration.
hwlooponly jrgtudec
special jumps loop code generation. A warning is generated if core configuration does not have hardware loops.
all
Enables hardware loops for all loops wherever possible. A warning is generated if core configuration does not have hardware loops.
Hwloops are discarded in -O0 and -O1.
-Mitstackalign=<n>
By default, the stack of interruption routines (IT) is aligned to an 8-byte/64-bit boundary.
As a consequence, extra instructions are added to IT prolog and epilog to handle this realignment. Since IT are often speed-critical parts of code, this may be a severe drawback.
This option instructs the compiler to align the stack of IT to a smaller boundary
(typically: 4 bytes/32 bits) to avoid the overhead in prolog and epilog of those routines.
Several methods are provided for controlling the alignment of the stack. For interruption routines, the precedence is as follows, in decreasing order:
– aligned_stack
attribute, which specifies the alignment of the stack of a given interruption routine
– interrupt_nostackalign
attribute, which indicates that the stack of a given interruption routines is to be aligned on a 4-byte/32-bit boundary
–
-Mitstackalign
option
– default (8 bytes/64 bits)
For any other function (not an interruption routine), the precedence is as follows, in decreasing order:
– aligned_stack attribute
– default (8 bytes/64 bits)
Section 5.2: Attributes on page 101
attributes which control the alignment of the stack of functions and interruption routines.
-Mmode16
The STxP70 compiler generates code for a context with 32 registers. Selecting the -
Mmode16
option switches to context with 16 registers. Note that the impact of this option is slightly different than that of -Mconfig=regbank:1. Namely, no assumption is made on the core configuration regarding register banks, and no checking is performed at assembly level to ensure that only the lower bank is used.
24/166 8027948 Rev 15
UM1237
2.2.5
stxp70cc
-Mnoextgen[=ext1,ext2,...]
Disables the code generation for specified extensions. This option has only effect when
MPx
are used. It has no effect with fpx.
Environment controls
The environment controls are listed below.
-Mlib16
-Mlib32
-Mnostartup
Instructs the compiler to link with a version of the C library that uses
16 registers of the core. This is the default behavior when using
16 registers contexts.
Instructs the compiler to link with a version of the C library that uses
32 register set of the core.
Instructs the linker not to use standard boot.o file at link time. It is then the user’s responsibility to provide a boot object file at link time.
C preprocessor options
The preprocessor is run on each C source file before actual compilation. The options in
control how the sources are preprocessed.
Table 9.
Preprocessor options
Option Description
-E
-C
-CC
-P
-Ddef
-Ddef=defn
-M
-MM
-MG
-H
-dM
-dD
-dN
-fpreprocessed
Only the preprocessor is run.
The preprocessor does not discard comments.
The preprocessor copies comments inside macros to the output file when the macro is expanded. This is intended for use by applications which place metadata or directives inside comments. Use with the -E option.
The preprocessor does discard #line information. Use with the -E option.
Define the macro definition with the string 1 as the definition.
Define the macro definition as defn.
Generates a list of object file dependencies suitable for a makefile.
Similar to -M, but ignores system header files, that is, header files included by <header.h>.
Along with -M or -MM, treat missing files as generated in the local directory.
Display the name and path of the header in use.
Print a list of macro definitions in use after preprocessing. Use with the -E option.
Print a list of macro definitions in use while preprocessing. Use with the -E option.
Same as -dD, except that the macro arguments are not shown. Use with the
-E
option.
Indicate to the preprocessor that the input file has already been preprocessed.
8027948 Rev 15 25/166
stxp70cc
2.2.6
UM1237
C dialect options
The option -std=value instructs the compiler front-end to select the appropriate C language dialect to use. For instance, the C99 restrict keyword is only recognized with the -std=c99 option. However, this keyword also exists as a GNU extension keyword, either __restrict or __restrict__ that are recognized by default. Possible values for std
are listed in
Table 10.
C dialect options
Option Description
-std=iso9899:1990
-std=iso9899:199409
-std=iso9899:1999
-std=c89
-std=c99
-std=gnu89
-std=gnu99
Same as -ansi
ISO C as modified in amendment 1
ISO C 99
Same as -std=iso9899:1990
Same as -std=iso9899:1999
This is the default, iso9899:1990 + gnu extensions iso9899:1999 + gnu extensions
Note:
Diagnostic messages can be requested from the compiler to notify potentially erroneous or dangerous C program constructions.
lists a subset of the GCC options.
Table 11.
General warning options
-Wall
Option
-w
-Werror
-pedantic
-pedantic-error
Description
Enables all warnings.
Disables all warnings.
Turns warnings into errors.
Issues all warnings needed for strict ANSI C compliance.
Turn all pedantic warnings into errors.
give the positive form of the option. The negative form of each option can be constructed by replacing the -W prefix with a -Wno prefix, for example -
Wnoformat
disables the printing of warning messages associated with calls to the printf and scanf family of library functions.
The online help and “man” page of the stxp70cc driver lists the full set of possible warning options.
26/166 8027948 Rev 15
UM1237 stxp70cc
Table 12.
Detailed warning options
Option
-m[no-]warn-packstruct
-m[no-]warn-smartpackstruct
-Waggregate-return
-Wbad-function-cast
-Wcast-align
-Wcast-qual
-Wchar-subscripts
-Wcomment
-Wconversion
Description
-mwarn-packstruct
this option enables the emission of warnings/errors when option -fpack-struct is set (see
). The warnings emitted are the most
conservative ones, and based on the evaluation of a risk that a misalignment occurs.
-mno-warn-packstruct
this option disables the emission of warnings/errors when option -fpack-struct is set. This is the default behavior.
-mwarn-smart-packstruct
this option enables only the emission of smarter warnings/errors when option
-fpack-struct
warnings are more accurate ones: some of them are filtered if the compiler can assess that a misalignment cannot occur, due to the layout of the structure.
-mno-warn-smart-packstruct
this option disables the emission of smarter warnings/errors when option
-fpack-struct
is set. This is the default behavior.
Warn if any functions that return structures or unions are defined or called.
Warn whenever a function call is cast to a non-matching type.
Warn whenever a pointer is cast such that the required alignment of the target is increased.
Warn whenever a pointer is cast so as to remove a type qualifier from the target type.
Warn if an array subscript has type char.
Warn if nested comments are detected.
Warn if a prototype causes a type conversion that is different from what would happen in the absence of a prototype.
-Werror-implicit-functiondeclaration
Output error when a function is used but not declared.
-Wformat
-Wimplicit
Check calls to the printf and scanf family of library functions.
-Wimplicit-int
and -Wimplicit-functiondeclaration
.
-Wimplicit-functiondeclaration
Warn when a function is used but not declared.
-Wimplicit-int
-Wlarger-than-number
-Wlong-long
-Wmissing-braces
-Wmissing-declarations
Check that all declarations specify a type, which is int by default in C89.
Warn if an object is larger than number bytes.
Warn if long long type is used. Only active along with pedantic
.
Warn if an aggregate or union initializer is not fully bracketed.
Warn if a global function is defined without a previous declaration.
8027948 Rev 15 27/166
stxp70cc UM1237
Table 12.
Detailed warning options (continued)
-Wmissing-noreturn
-Wmissing-prototypes
-Wmultichar
-Wnested-externs
-Wpacked
-Wpadded
-Wparentheses
-Wpointer-arith
-Wredundant-decls
-Wreturn-type
-Wshadow
-Wsign-compare
-Wstrict-prototypes
-Wswitch
-Wtrigraph
-W[no-]uninitialized
-Wunknown-pragmas
-Wunused
Option
-Wwrite-strings
Description
Warn about functions which might be candidates for attribute noreturn
.
Warn if a global function is defined without a previous prototype declaration.
Warn if a multi-character constant is used.
Warn if an extern declaration is encountered within a function.
Warn if a structure is given the packed attribute, but the packed attribute has no effect on the layout or size of the structure.
Warn if padding is included in a structure, either to align an element of the structure or to align the whole structure.
Warn if parentheses are omitted in certain contexts.
Warn about anything that depends on the “size of” a function type or of void.
Warn if anything is declared more than once in the same scope.
Warn when a function is defined with a return-type that defaults to int.
Warn whenever a local variable shadows another variable.
Warn when a comparison between signed and unsigned values could produce an incorrect result.
Warn if a function is declared or defined without specifying the argument types.
Warn whenever a switch statement may be incomplete.
Warn if any trigraphs are encountered that might change the meaning of the program.
Warn if an un-initialized automatic variable is detected.
Optimization must be enabled (see
) in order for -Wuninitialized or -Wall to report uninitialized variables. See also the entries for the -trapuv and
-zerouv
.
-W[no-]uninitialized
instructs the compiler not to warn about uninitialized variables.
Warn when a #pragma is encountered which is not understood by stxp70cc.
Warn whenever a static function, a label, a parameter, a value is not used.
Warn when trying to write to a string constant.
28/166 8027948 Rev 15
UM1237 stxp70cc
Note:
The -g option instructs stxp70cc to generate symbolic information for debugging. DWARF2 format is used.
The -g option may be used with optimization up to level -O2 and with -Os (see
Section 2.2.12: Optimization options ).
Minimal debug information (that is, call frames) are generated whatever options are selected.
The STxP70 compiler (version 3.4.0 and higher) supports profiling options. The dedicated pg
option instructs the compiler to generate gprof profiling information. See
for more information on this topic.
2.2.10 Code coverage options
•
•
The stxp70cc compiler (version 3.4.0 and higher) supports code coverage options. Two options are provided.
The -ftest-coverage option instructs the compiler to generate code coverage file for the GNU gcov code coverage utility.
The -fprofile-arcs option instructs the compiler to generate information that allows gcov to reconstruct the program flow graph.
See
Section 4.6: Code coverage on page 72
for further details on this topic.
2.2.11 Call trace instrumentation options
The options -finstrument-functions and -minstrument-calls instruct stxp70cc to generate instrumentation calls. See
Section 4.7: Call trace on page 74
for further details on call trace instrumentation.
control optimization levels.
Table 13.
Optimize options
Option Description
-O0
No
-O1
-Os
Minimal optimization.
Optimize for code size.
-O2
-O3
-O4
Global optimization, speed orientated.
Aggressive optimization, speed orientated
Aggressive optimization, speed orientated. Enables aggressive loop unrolling when compiling code for
STxP70-4 in dual issue/dual ALU core configuration.
8027948 Rev 15 29/166
stxp70cc UM1237
Note: 1
-O
optimization is equivalent to -O2.
2
-Os
optimization applies the optimizations of -O2, except for those that increase the code size (such as unrolling).
enable finer control of the optimization level.
Table 14.
Advanced options
Option Description
--deadcode
--no-deadcode
-f[no-]unroll-loops
This option forces the binary optimizations (binopt) performed after link stage. For instance, enabling this option removes non-static functions that are never called in the executable binary file. This is the default behavior when highest optimization level is set (-Os and
-O4
).
This option disables the binary optimizations (binopt) performed after the link stage.
-f[no-]strict-aliasing
-fstrict-aliasing
enables the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C and
C++ this activates optimizations based on the type of expressions. In particular an object of one type is assumed never to reside at the same address as an object of a different type, unless types are almost the same (the aliasing rules are stated in the ANSI C standard, in clause 6.5 (7) Expressions. For example an unsigned int
can alias an int, but not a void * or a double. The types char
and types with the may_alias attribute can alias any other type.
The default is -fstrict-aliasing. If this causes problems in legacy code, use -fno-strict-aliasing to disable it.
-funroll-loops
forces loop unrolling. This is the default at -O2,
-O3 and -O4.
-fno-unroll-loops
disables loop unrolling. This is the default at
-Os
.
Loops with a #pragma unroll directive are not affected by these two options.
Section 4.2: Loop unrolling on page 63
for details of the unrolling policy.
30/166 8027948 Rev 15
UM1237 stxp70cc
control various aspects of the code generation.
Table 15.
Code generation options
Option
-gnu3
-gnu4
-macf-decl <act_filename.acf>
-macf-active "string"
-macf-template
{source_filename
1
...}
-fb <name>
-fb_create <name>
-fsigned-char
-funsigned-char
-fsigned-bitfields
-funsigned-bitfields
-fno-signed-bitfields
-fno-unsigned-bitfields
Description
The GCC front-end version 3.3.3 is used.
The GCC front-end version 4.2.0 is used. This is the default for toolset 2011.1 and higher.
Reads acf_filename.acf as an ACF, using the default configuration declared in the file as the active configuration. See
Section 4.10: Application configuration files on page 82
Use in conjunction with -macf-decl
<act_filename.acf>
. Enables a configuration named
string
to be specified as the active configuration.
string
must be defined in <act_filename.acf>.
See
Section 4.10: Application configuration files on page 82
for details.
Generates the ACF template for the application implemented by the source files specified. The source files must be linkable, and the compilation include a link stage to ensure that template is complete. See
Section 4.10: Application configuration files on page 82
for details.
Not yet supported.
Not yet supported.
-fsigned-char
implements type char as signed.
-funsigned-char
implements char as unsigned.
Note that when the -funsigned-char option is used, the __CHAR_UNSIGNED__ preprocessor symbol is defined.
The compiler default is signed.
These options control whether a bitfield is signed or unsigned, when the declaration does not use either
‘signed’ or ‘unsigned’.
The compiler default is signed.
8027948 Rev 15 31/166
stxp70cc UM1237
Table 15.
Code generation options (continued)
Option Description
-ffixed-reg=<register-list>
<register-list>
is a list of one or several commaseparated register names or dash-separated register ranges, either general purpose registers or boolean registers. The syntax used for registers is:
– rn for core GPR, where n can be 0 to 31,
– gn for core guards registers, where n can be 0 to 7,
– fn for fpx extension registers, where n can be 0 to 15.
This option makes the given registers fixed registers; that is, the code generated by the compiler never uses them.
There are however, some registers that are used by the compiler for ABI register conventions. See the table of general registers in the STxP70 ABI manual. The registers with a specified use must not be reserved with this option.
Note that specific care must be taken when using this option since low-level library and run-time support code are not specifically built to support non-ABI register usage. For instance, reserving the r5 register does not prevent already compiled library code from using it. Using this option generally requires rebuilding a set of libraries either with the same option (for C/C++ code) or to take into account that this option has been used.
Examples: stxp70cc -ffixed-reg=r6,g0 stxp70cc -ffixed-reg=f12-f15
-mdisabled-reg=<register-list>
This option is similar to -ffixed-reg described above, except that the corresponding registers cannot be used by the register GNU extension or asm statement clobber list.
The syntax of the <register-list> is the same as for option -ffixed-reg above.
Note that the -MMode16 configuration option is based on this option.
32/166 8027948 Rev 15
UM1237 stxp70cc
Table 15.
Code generation options (continued)
Option
-fshort-double
-mlib-short-double
-mlib-nofloat
Description
By default, the compiler assumes double precision floating point. This means that floating point constants with implicit type declaration are promoted to double precision. This promotion is propagated in the expression where the constant is used. For example, the expression used to compute C is performed as double precision because of the implicit constant type: float A; float B; float C=A*B*3.45;
If the constant is explicitly declared as a single precision, the expression remains in single precision: float A; float B; float C=A*B*3.45F;
The option -fshort-double instructs the compiler to assume single precision instead.
When the FPx floating point extension is used, this option is required to ensure an efficient code
generation. A warning is emitted if FPx is used without this option.
More details can be found in
Section 4.9: Floating-point code generation on page 79
This option instructs the compiler to use single precision floating point libraries.
This option is forced as soon as the -fshort-double option is set. On the STxP70, this option is deprecated, since it is forced to fit the default code generation setting.
It is preserved mainly for legacy reasons.
Instructs the compiler to use the C-library without floatingpoint support. Leads to a much smaller C-library (nearly half the size of default library).
8027948 Rev 15 33/166
stxp70cc UM1237
Table 15.
Code generation options (continued)
Option
-fpack-struct
-fshort-enums
-fverbose-asm
-fno-verbose-asm
-falign-functions
-falign-functions=n
-falign-loops
-falign-loops=n
Description
Instructs the compiler to pack structures. The goal of this option is to reduce the memory footprint of the data sections of the objects and binary files. Note that this may induce a need for misaligned accesses, which usually increases the size of the code in text section. Gains in size will be more significant if large arrays of structures are used.
This option should be used by advanced users only. It may conflict with the assumptions or semantics of the source code. For instance:
– if the source code performs some verifications based on the size of a structure, then enabling this option may cause the check to fail
– in some cases, some alignment constraints may no longer hold when the option is set
Some warnings and errors are emitted to prevent the compiler from silently perform ming non-conservative code generation. See the options
-m[no-]warn-packstruct
and
-m[no-]warn-smart-packstruct
for controlling warnings.
If you encounter a problem with this option, it is advised to disable it, and check if the issue is still present.
Instructs the compiler to use the shortest integer type required to represent the values of an enumeration. The goal of this option is to reduce the memory footprint of the data sections of the objects and binary files. This option is more likely to have a real impact if it is used in combination with -fpack-struct.
This option should be used by advanced users only. It may conflict with the assumptions or semantics of the source code. For instance:
– if the source code performs some verifications based on the size of a structure, then enabling this option may cause the check to fail
– in some cases, some alignment constraints may no longer hold when the option is set
Some warnings and errors are emitted to prevent the compiler from silently perform non-conservative code generation. If you encounter a problem with this option, it is advised to disable it, and check if the issue is still present.
The -fno-verbose-asm removes extra commentary information in the generated assembly code.
The default is to have verbose asm output.
Align the start of functions to the next power of two greater than n (if n is specified), skipping up to n bytes.
Align the first address of loops to the next power of two greater than n (if n is specified), skipping up to n bytes.
34/166 8027948 Rev 15
UM1237 stxp70cc
Table 15.
Code generation options (continued)
-falign-jumps
-falign-jumps=n
-falign-labels
-falign-labels=n
-falign-instructions
-falign-instructions=n
-ffast-math
-f[no-]math-errno
-mreassoc=0
-mreassoc=1
-mreassoc=2
-fpic
--rlib
--rmain
Option
-maggressive_unroll=n
Description
Align the target address of jumps to the next power of two greater than n (if n is specified), skipping up to n bytes.
Align the labels to the next power of two greater than n (if
n
is specified), skipping up to n bytes.
Align the instructions to the next power of two greater than
n
(if n is specified), skipping up to n bytes.
Defines the preprocessor macro __FAST_MATH__ and invokes -f[no-]math-errno.
-fmath-errno
causes the compiler to generate code to set the mathematical error flag in floating point code. The compiler also makes use of slower libm from Newlib libm
with errno setting. This is the default behavior when the FPx floating point extension is not used.
-f[no-]math-errno
causes the compiler not to generate code to set the mathematical error flag in floating point code. The compiler also makes use of fast libm
overrides, for example sqrtf from the FLIP library with no errno setting. This is the default behavior when the FPx floating point extension is used.
No re-associations, folding or simplifications. This is the default.
Accurate simplifications that are correct for finite arithmetic are allowed, for instance, a/a -> 1.0, recip(recip(a)) ->a
.
For example, the transformation a/a -> 1.0 is not valid when a is 0.0 because in this case 0.0/0.0 -> NaN.
Aggressive re-association of expressions is performed to favor the selection of fused multiply-add routines. Such changes in the evaluation order can lead to slightly different results, compared to the original evaluation order.
Generate position independent code (data accesses only).
Chapter 9: Relocatable loader library on page 136
.
Build a relocatable library that can be loaded by RL_LIB.
See
Chapter 9: Relocatable loader library on page 136
.
Build a main program suitable for loading relocatable
Chapter 9: Relocatable loader library on page 136
Modify the aggressiveness of the default unrolling policy.
n
is a value in the range [0, 6]. The higher it is, the more
aggressive the unrolling. Refer to
Section 4.2: Loop unrolling on page 63
for details about this option and the
values of n.
8027948 Rev 15 35/166
stxp70cc UM1237
Table 15.
Code generation options (continued)
-trapuv
-zerouv
Option Description
Initialize uninitialized local variables to pre-defined values.
-trapuv
helps to find issues that are due to uninitialized variables. This option has a slight performance impact. It affects local scalar, array variables and memory returned by alloca. It does not affect the behavior of globals or memory allocated with malloc.
Integer variables are initialized to 0xdeaddead.
Float variables are initialized to 0xfffa5a5a (NaN, floating-point NaN).
Pointer variables are initialized to 0x0.
A sub-type is given a sub part of the pattern of its original type: char
is initialized to 0xad.
short
is initialized to 0xdead.
long
long is initialized to 0xdeaddeaddeaddeadLL.
double
is initialized to 0xfffa5ffffa5a5a5a (NaN).
Default values of patterns can be controlled as follows:
-DEBUG:trapuv_int_value=0xffffffff
to change integer pattern to 0xffffffff.
-DEBUG:trapuv_float_value=0xeeeeeeee
to change float pattern to 0xeeeeeeee.
-DEBUG:trapuv_pointer_value=0xdddddddd
to change pointer pattern to 0xdddddddd.
Note: Using -trapuv removes the possibility of using -
Wuninitialized
Sets uninitialized variables to zero at runtime. This option has a slight performance impact. It affects local scalar, array variables and memory returned by alloca. It does not affect the behavior of globals or memory allocated with malloc.
Note: Using -zerouv removes the possibility of using -
Wuninitialized
36/166 8027948 Rev 15
UM1237 stxp70cc
Table 15.
Code generation options (continued)
Option
-m[no-]parse-asmstmts
-m[no-]parse-meta-asmstmts
Description
-mparse-asmstmts
causes the compiler to parse and optimize user defined GNU assembly statements. When set, the compiler analyzes the content of GNU assembly statement, and optimizes it if possible.
-mno-parse-asmstmts causes the compiler not to parse and optimize user defined GNU assembly statements. The compiler leaves the instructions of the
GNU assembly statement unchanged, except regarding register allocation. This is the default behavior.
See
Section 6.8: Parsing and optimization of GNU assembly statement on page 114
for details.
-mparse-meta-asmstmts is similar to
-mparse-asmstmts,
but applies only to the GNU assembly statements used internally by the compiler to automatically map the instructions of the extensions. This is the default behavior.
-mno-parse-meta-asmstmts is similar to
-mno-parse-asmstmts
, but applies only to the GNU assembly statements used internally by the compiler to automatically map the instructions of the extensions.
See
Section 6.8: Parsing and optimization of GNU assembly statement on page 114
for details.
The options -OPT:unroll_size, -OPT:cray_ivdep and -OPT:liberal_ivdep modify the behavior of pragmas and are documented in
Section 3.2.1: #pragma unroll (n) on page 45
Section 3.2.2: #pragma ivdep on page 46
.
The -OPT:alias option is documented in
Section 4.3: Memory dependences in C programs on page 65
.
The -inline, -noinline and -INLINE options are provided to control inlining of
Section 4.1.1: Single file inlining on page 55
Only functions marked with the inline keyword are subject to inlining unless specified otherwise.
The -ipa option enables interprocedural analysis, and is described in
Interprocedural analysis optimization (IPA) on page 76
. This section documents a range of advanced -IPA options that provide control over the optimizations performed.
8027948 Rev 15 37/166
stxp70cc UM1237
Note:
The STxP70 compiler now provides some support for position independent code (PIC) generation and dynamic loading of shared components.
This is a partial support since only data accesses are position independent.
Chapter 9: Relocatable loader library on page 136
.
2.2.18 Sending options to a specific phase
The -W<phase>,<arg> option passes the specified argument <arg> to a specific processing phase <phase> of stxp70cc.
lists the different values of <phase>.
Table 16.
Possible value for phase
a l p f o
Value of phase Description
Preprocessor cpp
Compiler front-end
Assembler stxp70-as
Linker stxp70-ld
Binary optimizer tool binopt - not yet used by stxp70cc
There must be a comma between the option -W<phase> and the argument and no spaces.
Anything occurring after a space is treated as the next option to stxp70cc. Also the argument is only passed to <phase> if <phase> is normally run from the specified command.
For example: stxp70cc -O3 -Wl,-strict_warn a.out
This command causes the linker to emit strict warnings regarding link files.
38/166 8027948 Rev 15
UM1237 stxp70cc
lists the options that select header files, libraries and compiler executables.
Table 17.
Directory options
Option Description
-Idirectory
Add to the beginning of the search list for include files.
-nostdinc
-l<library>
No predefined include search path.
Search the library named lib<library>.a when linking. The linker looks for the library in the directories specified by the -L options and then in a standard list of directories.
The position of this option on the command line makes a difference. The linker processes object files and libraries in the order that they are specified on the command line. For example, if the following is specified: stxp70cc file1.o file2.o -lmylib then the files are processed in the order file1.o, file2.o, libmylib.a.
However, if the following is specified: stxp70cc file1.o -lmylib file2.o
then the files are processed in the order file1.o, libmylib.a, file2.o.
In this case, file2.o should not refer to any symbols defined in libmylib.a
.
-L<directory>
Add to the beginning of the search list for library files.
-nostdlib
No predefined libraries search path.
The search path for the various phases of the compiler can be overridden by using the option: -Y<phase>,<path> where <phase> can take the values listed in
<path> is the path of the required tool. There must be a comma and no spaces separating
-Y<phase>
and <path>.
Currently there are no special environment variables that affect stxp70cc.
8027948 Rev 15 39/166
stxp70cc UM1237
Predefined macros are described in
Note: 1 The list of macros currently defined can be obtained by typing: stxp70cc -E -dM
filename.c where filename.c can be any .c file including an empty file.
2 Do not rely on a macro that is not documented, even if it is currently defined.
3 Some macro values are subject to change because of evolution of compiler design. This may affect, for instance, front-end identification values.
Table 18.
Predefined macros
Name Default definition
__open64__
__GNUC__
__GNUC_MINOR__
__stxp70cc__
__STXP70CC_MINOR__
__STXP70CC_PATCHLEVEL__
__STXP70CC_DATE__
__STXP70CC_VERSION__
__LITTLE_ENDIAN__
_LANGUAGE_C
Defined
3
3
Compiler technology identification
Front end major release identification
Front end minor release identification
Defined, value depends on major compiler version
Defined, value depends on minor compiler version
Compiler identification
Compiler identification
Defined, value depends on compiler patch level
Defined, value depends on compiler release date
Defined for C source
Compiler identification
Compiler identification
Defined, value is an identification string
Compiler identification
Defined by default
Endianness identification
Language currently processed is C language.
-no-gcc
-no-gcc
40/166 8027948 Rev 15
UM1237 stxp70cc
Table 18.
Predefined macros (continued)
Name Default definition
_LANGUAGE_ASSEMBLY
__STRICT_ANSI__
Defined for ASM source
Language currently processed is assembly language.
Defined when std=c89
or std=c99
or -ansi
Compiler is in strict ansi mode.
-std
__STDC_VERSION__
__OPTIMIZE__
__OPTIMIZE_SIZE__
Defined when std=c99
with value 199901L
Compiler is in
C99 ansi mode
Defined as soon as optimization is on.
Optimization mode detection.
-Os
Optimization size detection
__INLINE_INTRINSICS
Defined
Intrinsics inlining mode detection.
-std
-O
-Os
-OPT:inline_ intrinsics
__STDC_HOSTED__
__FAST_MATH__
Defined by default. Hosting mode.
Defined when ffast-math option is used.
-f[no-]hosted f[no-]freestanding
Libraries or user code can take advantage of this definition to define alternative sequences of floating point code.
-ffast-math
Note: The C standard guarantees that the __cplusplus symbol is never defined when compiling
C source code.
The stxp70cc compiler supports a subset of the C99 standard. Most features are implicitly available through default compiler command line options, with the notable exception of the restrict
keyword that requires the -std=c99 command line option to be specified.
It is recommended that any code fragment that depends upon C99 specific behavior be guarded by the following preprocessing definitions, which are correctly triggered when the std=c99
command line option is used:
#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
// Your C99 dependent code here
#else
#error "This source file depends upon C99 features not available with this compiler."
#endif
8027948 Rev 15 41/166
stxp70cc UM1237
summarizes the status of the stxp70cc compiler C99 support.
Table 19.
C99 support in stxp70cc
Feature as described in the C99 standard Status
Restricted character set support via digraphs iso646.h
included
Wide character library support
More precise aliasing rules via effective type
Restricted pointer
Variable length arrays
YES
YES
NO: type not supported and library not provided.
YES: provided that the -fnostrict-aliasing
option is not used
YES: provided that the -fnostrict-aliasing
option is not used
PARTIAL: only local allocation, but no other features
Flexible array members
Static and type qualifiers in parameter array declarators
YES
YES
Complex support (<complex.h>) NO
Type generic math macros (<tgmath.h>)
The long long int type and library functions
Increased minimum translation unit
NO: include file not provided
YES
YES
Additional floating-point characteristics (<float.h>) NO
Remove implicit int YES
Reliable integer division YES
Universal character names
Extended identifiers
Hexadecimal floating-point constants
NO
NO
YES
Compound literals
Designated initializers
// comments
YES
YES
YES
Extended integer type and library functions in <inttypes.h> and <stdint.h>
YES
Remove implicit function declaration NO: can get warning
Preprocessor arithmetic done in intmax_t/uintmax_t
Mixed declaration and code
New block scope for selection and iteration statements
Integer constant type rules
Integer promotion rules vararg
macro
YES
YES
YES
YES
YES
YES
42/166 8027948 Rev 15
UM1237 stxp70cc
Table 19.
C99 support in stxp70cc (continued)
Feature as described in the C99 standard Status
The vscanf family of function in <stdio.h> YES
Additional math library functions in <math.h> NO
Floating point environment access in <fenv.h> NO
ISO 60559 Arithmetic support NO
Trailing comma allowed in enum declaration YES
%lf conversion allowed in printf NO
Inline functions
YES: but not fully ansi compliant in the extern inline case
The snprintf family of functions in <stdio.h> YES
Boolean type in <stdbool.h>
Idempotent type qualifiers
NO bool native type but
<stdbool.h>
header provided
YES: but still emits warnings
Empty macro arguments
New struct type compatibility rules (tag compatibility)
Additional predefined macro names
_Pragma
preprocessing operator
Standard pragmas
__func__
predefined identifier
YES
YES
MOST
YES
NO
YES
VA_COPY
macro
Additional strftime conversion specifiers
LIA compatibility annex
Deprecate ungetc at the beginning of a binary file
Remove deprecation of aliased array parameters
Conversion of array to pointer not limited to lvalues
Relaxed constraints on aggregate and union initialization
Relaxed restrictions on portable header names
Return without expression not permitted in function that returns a value (and vice versa)
NO
NO: library support not provided
NO
NO
YES
YES
YES
YES
YES
8027948 Rev 15 43/166
Pragmas
3 Pragmas
UM1237
This chapter provides details of the #pragma directives that are recognized by stxp70cc.
3.1 Pragmas short description and syntax
Table 20.
stxp70cc pragmas
Optimization level
(1)
#pragma unroll
(unroll_amount)
#pragma loopmod(q,r)
#pragma looptrip(n)
#pragma disable_extgen
(fct1,fct2,...)
#pragma force_extgen
(fct1,fct2,...)
#pragma disable_specific_extgen
(extname[,fct1,fct2,...])
Start of a loop body
#pragma ivdep
Start of a loop body
#pragma loopdep
PARALLEL |
VECTOR | LIBERAL
Start of a loop body
Start of a loop body
Start of a loop body
#pragma loopseq READ |
WRITE
#pragma hwloop none | forcehwloop<loopid> | forcejrgtudec
Start of a loop body
Start of a loop body
#pragma loopmin<itercount>
(minc)
Start of a loop body
#pragma loopmax<itercount>
(maxc)
Start of a loop body
#pragma frequency_hint
NEVER|FREQUENT
Applies to the function or statement that follows the pragma
#pragma ident "string"
-
#pragma weak
symbol -
-
-
-
Unrolls the loop
unroll_amount
times
Liberalizes dependence analysis
Liberalizes dependence analysis
Provides trip count modularity information
Provides trip count estimation information
Ordering of the READ (or
WRITE) accesses
Controls mapping of HW loops and JRGTUDEC
Controls the guards to be placed around loops
Controls specific cases of
HW loop mapping
Execution frequency hint
-O2
-O3
-O3
-O2
-O2
-O2
-O2
-O2
-O2
-O1
Adds a .comment section to an assembly file.
-O0
Marks a symbol as weak
-O0
Disables native code generation for all extensions in the given functions.
-O2
Enables native code generation for all extensions in the given functions.
-O2
Disables native code generation for specified extensions in the given functions.
-O2
44/166 8027948 Rev 15
UM1237 Pragmas
Table 20.
stxp70cc pragmas (continued)
Optimization level
(1)
#pragma force_specific_extgen
(extname[,fct1,fct2,...])
-
Enables native code generation for specified extensions in the given functions.
-O2
#pragma inline_next
(function)
#pragma noinline_next
(function)
Function call site
Function call site
#pragma inline_function
(function)
Function
#pragma noinline_function
(function)
Function Inlining
(2)
-O1
#pragma inline_file
(function)
#pragma noinline_file
(function)
File
File
#pragma defaultinline
(function)
-
1. This column denotes the lowest optimization level for which the pragma has an effect. For example -O0 means the pragma is applicable even when optimization is switched off. A list of optimization levels is given in
Section 2.2.12: Optimization options on page 29
.
2. All inlining pragmas are described in
Section 4.1.4: Inlining pragmas on page 58
.
3.2 Loop optimization pragmas
This pragma suggests to the compiler the type of loop unrolling that should be done. The pragma is a recommendation to the compiler to add n-1 copies of the loop body to the inner loop. The value of n must be at least 1. If it is 1, then unrolling is not performed.
If the loop that this pragma immediately precedes is an inner loop, then it implies standard inner loop unrolling. See
Figure 2.
Inner loop unrolling example
for (i=0; i < 10; i++)
#pragma unroll (2)
for (j=0; j < 10; j++)
a[i][j] = a[i][j]+b[i][j]; becomes: for (i=0; i < 10; i++)
for (j=0; j < 10; j+=2) {
a[i][j] = a[i][j] +b[i][j];
a[i][j+1] = a[i][j+1]+b[i][j+1];
}
8027948 Rev 15 45/166
Pragmas UM1237
If the loop that this pragma immediately precedes is an outer loop that contains only an inner loop, then the compiler attempts to unroll the outer loop and perform loop fusion on the resulting inner loops. This transformation, known as “unroll-and-jam”, is especially useful to create parallel execution opportunities when the innermost loop alone does not present such opportunities. See
Figure 3.
Unroll-and-jam example
// Ensure ad[] and sd[] do not alias.
#pragma unroll(2) for (i=0; i<16; i++) {
int sum = 0;
}
for (k=M; k<8+M; k++) {
sum += sd[k]*sd[k-i];
}
ad[i] = sum; becomes: for (i=0; i<16; i+=2) {
int sum0 = 0;
int sum1 = 0;
for (k=M; k<8+M; k++) {
sum0 += sd[k]*sd[k-i];
sum1 += sd[k]*sd[k-i-1];
}
ad[i] = sum0;
}
ad[i+1] = sum1;
•
•
•
The following tips provide information on how to control the desired inner loop unrolling with the pragma unroll value.
A counted loop with a compile-time constant trip count is always fully unrolled if a pragma unroll with a value greater or equal to the loop trip count is specified.
When a counted loop is not fully unrolled, the pragma unroll value is rounded to the greatest power of two lower than the specified unrolling value.
The maximum size of a loop after unrolling is controlled by the command line option -
OPT:unroll_size=<n>
.
46/166
This pragma instructs the compiler to liberalize dependence analysis between memory accesses. The #pragma ivdep applies only to the innermost loops in a set of nested loops; therefore, if it is used on a loop that has an inner loop, the compiler ignores it. By default, this pragma allows the compiler to assume there are no memory dependences between loop iterations.
•
•
The following command line options modify the ivdep semantic.
-OPT:cray_ivdep=TRUE
Only ignore backward memory dependences (Cray semantics).
-OPT:liberal_ivdep=TRUE
Also ignore all memory dependences in the same loop iteration.
8027948 Rev 15
UM1237 Pragmas
For example:
#pragma ivdep for (i = 0; i < n; i++) {
a[b[i]] = a[b[i]]+3; // These dependencies cannot be computed by
}
This pragma instructs the compiler to liberalize dependence analysis between memory accesses, based on the specified type of loop dependences. Contrary to the pragma ivdep described above, the semantics cannot be modified by command line options.
The loopdep pragma takes an argument to tell the compiler which kind of loop dependencies it can ignore, VECTOR, PARALLEL or LIBERAL.
#pragma loopdep VECTOR
#pragma loopdep VECTOR
allows the compiler to assume there are no backward memory dependences between loop iterations. This pragma is equivalent to #pragma ivdep, -OPT:cray_ivdep=TRUE
.
Example:
#pragma loopdep VECTOR for (i = 0; i < n; i++) {
}
a[i] = a[i+k]+3;
In this example, the compiler cannot tell when a[i+k] does not depend on a[i], but this is in fact the case if k is always > 0 in the program. The pragma allows the compiler to assume there are no dependences between the read of a[i+k] in the current loop iteration, and the write of a[i] in the following loop iterations. The compiler could rewrite the loop as: for (i = 0; i < n; i+=2) {
t0 = a[i+k]+3;
t1 = a[i+1+k]+3;
a[i] = t0;
}
a[i+1] = t1;
#pragma loopdep PARALLEL
#pragma loopdep PARALLEL
allows the compiler to assume there are no dependences between any two memory accesses that are in different loop iterations. This pragma is equivalent to:
#pragma ivdep, -OPT:cray_ivdep=FALSE -OPT:liberal_ivdep=FALSE
For example:
#pragma loopdep PARALLEL for (i = 0; i < n; i++)
a[b[i]] = a[b[i]] + 3;
8027948 Rev 15 47/166
Pragmas UM1237
In this example, the compiler cannot tell that either the load or store of a[b[i]] in the current loop iteration does not depend on the load or store of a[b[i]] in a following loop iteration. This is in fact the case if b[i] != b[j] for all i != j. The compiler could rewrite the loop as: for (i = 0; i < n; i+=2) {
t1 = a[b[i+1]] + 3;
t0 = a[b[i]] + 3;
a[b[i+1]] = t1;
}
a[b[i]] = t0;
#pragma loopdep LIBERAL
#pragma loopdep LIBERAL
allows the compiler to assume there are no dependences between any two memory accesses that are either in the same, or different, loop iterations.
This pragma is equivalent to:
#pragma ivdep, -OPT:liberal_ivdep=TRUE
Example:
#pragma loopdep liberal for (i = 0; i < n; i++) {
}
a[j] = b[i];
c[i] = a[i] + 3;
In this example, the compiler cannot tell that the load of a[i] does not depend on the store of a[j]. This is in fact the case if i != j for all values of i and j in the loop iterations.
48/166
This pragma tells the compiler the number of times a loop is taken in terms of a multiple q and a residual r.
The syntax of this pragma is:
#pragma loopmod(q,r) where q is strictly a positive integer, r is a positive integer, r < q.
For example:
#pragma loopmod (4,0)
This tells the compiler that the loop is taken 0, 4, 8, 12 .... times.
#pragma loopmod (4,1)
This tells the compiler that the loop is taken 1, 5, 9, 13 .... times.
When applied to an inner loop, this pragma indicates that the trip count tc, that is the number of iterations that are executed by any execution of the loop can be written as: tc = p q + r with q > 0, r >= 0
Where q is strictly a positive integer. This information helps the compiler in loop unrolling optimization, and in software.
When unrolling loops, the compiler creates multiple loop bodies (the unrolling factor specifies the number of loop bodies created). However, the compiler cannot always
8027948 Rev 15
UM1237
Note:
Pragmas
statically determine the trip count. When it cannot determine the trip count, the compiler must also create residual code in case the unrolling factor is not a divisor of the loop trip count.
However, it is possible for application writers to know the modular properties of some of the loops in their own code. Bringing this accurate information to the compiler, the residual code can be largely removed or better optimized.
Bringing inexact information on the trip count may lead to inexact code. Be careful that the property asserted is valid in all cases.
The following example shows the use of the #pragma loopmod.
void copychar(unsigned char* __restrict p, unsigned char * q,
{
int i ;
assert(sz % 4 == 0) ;
#pragma loopmod(4,0)
for(i=0; i<sz; i++)
p[i] = q[i];
}
The function copychar duplicates a byte stream, whose size must be a multiple of 4.
During unrolling, and without the pragma, the compiler would create a residual loop. This is totally removed when the pragma information is asserted. In this example, the pragma does not provide the compiler with any information about the memory alignment of p or q, which the compiler would need to generate word accesses after unrolling.
This pragma instructs the compiler that the estimate of the number of iterations of the loop
(the loop trip count estimate) is n. This is not an assertion that the loop effectively iterates n times.
•
•
•
•
A number of optimizations are affected by the #pragma looptrip (n), when the compiler has not already determined the exact trip count: basic block frequency estimation uses this information as an approximation of the loop trip count unrolling and cross-iteration optimizations are reduced if the given loop trip count estimate is low software pipelining is limited if the estimate is low automatic data prefetch generation is limited if the estimate is low
One scenario of usage is for ‘for’ loops with trip counts of unknown values where the user knows that the approximate effective value is low:
#pragma looptrip(4) for (i=0; i<n; i++) a[i] = b[i] ;
This example avoids non-beneficial optimizations. On such loops the compiler trip count estimate without the pragma is 100.
8027948 Rev 15 49/166
Pragmas UM1237
A second scenario is for ‘while’ loops where the user knows that the approximate effective trip count is high:
#pragma looptrip(100) while (*p++=*s++)
This example gives a better approximation of the weight of the loop. Generally the compiler trip count estimate for a while loop is very low.
•
•
•
Possible error messages are:
Warning : pragma ‘LOOPTRIP’ : inconsistent with computed value, ignored
Warning : pragma ‘LOOPTRIP’ : not followed by a loop, ignored
Warning : malformed ‘#pragma looptrip (n)’
#pragma hwloop none
#pragma hwloop forcehwloop <loopid>
#pragma hwloop forcejrgtudec
The hwloop pragmas allow fine control of special looping mechanisms available on
STxP70 processor. They are all to be placed before loop statement. They respectively allow: hwloop none
Block the mapping of both hardware loops and JRGTUDEC special instructions.
hwloop forcehwloop
<loopid>
Force a given loop to make use of hardware loop. Notice that the mapping is performed by the compiler only if it is legal to do so. The
loopid
argument is optional. It allows the user to force the use of either of the two hardware loop register. Thus possible values are 0 and 1. The main interest is to force the use of the saved loop register
L0 when a call is present in loop body, but the callee is known to have no side effect on HW loop registers (that is, is HW loop free), thus avoiding to save/restore loop register. It is the user responsibility to ensure that using the specified register is legal. hwloop forcejrgtudec
Force a given loop to make use of the JRGTUDEC special instruction.
The hardware loop pragmas must be placed before the loop statement:
#pragma hwloop forcejrgtudec for(i=0; i<n; i++) { a[i] = ...;
}
50/166 8027948 Rev 15
UM1237 Pragmas
3.2.8
The content of the hardware loop register of the STxP70 core, used to indicate tripcount, has 32-bit dynamics. This register is named LC. The zero value, however, is not legal from a hardware standpoint. Furthermore, no special instruction is available to indicate that the hardware loop must be skipped. Therefore, if the value used to set the LC register is less or equal than zero, a guard is needed.
Use the loopmin pragma to instruct the compiler that the loop tripcount is at least minc. If minc
is 1 or more, then the compiler is allowed to remove the guard that is needed otherwise. This saves both cycles and bytes because of the removal of comparison and branching instructions.
The loopminiter and loopminitercount syntaxes are equivalent. The second one is for legacy code that formerly used the sxcc compiler.
Use this pragma as follows:
#pragma loopmin (1) // loopminitercount can be used as well for(i=0; i<n; i++) { a[i] = ...;
}
#pragma loopmax<itercount> (maxc)
Use the loopmax pragma to instruct the compiler that a loop tripcount is at most maxc. This pragma is not generally useful on an STxP70 core. In a few cases, it is useful as a workaround for hardware problems that exposed problems when actual tripcount exceeded a given range (for instance: 16-bit integer).
Use this pragma as follows:
#pragma loopmaxitercount (1) for(i=0; i<n; i++) { a[i] = ...;
}
#pragma loopseq READ
#pragma loopseq WRITE
This pragma instructs the compiler that the memory READ accesses (or respectively the memory WRITE accesses) as they appear in the loop should be sequenced. This is not an assertion that the accesses must be kept in sequence, for instance, this is not a replacement for volatile accesses where it is mandatory to keep them in order.
The effect of this pragma is that the scheduler serializes all load prefetch operations (or respectively all stores) in the loop. Therefore the memory read (or write) accesses, as written in the C code are kept in order, as long as no aggressive transformation occurs in the loop.
8027948 Rev 15 51/166
Pragmas UM1237
The following scenario can occur when the user wants to keep memory writes in order to take advantage of a combining write buffer:
#pragma loopseq WRITE for(i=0; i<n; i++) { a[i] = ...; a[i+1] = ... ; a[i+2] = ... ; a[i+4] = ... ;
}
The pragma hints that the compiler should keep writes to the array in order. If the loop is unrolled, generating a large number of stores, this improves locality and may take advantage of combining write buffers. By default the compiler does not put restrictions on the ordering of non-overlapping store operations.
A second scenario is when the user has scheduled prefetch and load operations by hand, and wants to ensure that the compiler does not reorder them.
#pragma loopseq READ for(i=0; i<n; i+=S) {
... = a[i] ;
__builtin_prefetch(&a[i+S]) ;
}
The pragma hints that the compiler should keep the load and prefetch in order. In this example, the prefetch is not placed before it is effectively used in the next iteration by the load.
#pragma frequency_hint
This pragma allows the user to specify information about the execution frequency for certain regions of code with the following frequency specifications:
NEVER
This region of code is never or rarely executed. The compiler might move this region of the code away from the normal path. This movement might either be at the end of the procedure or at some point to an entirely separate section.
FREQUENT
This region of code is frequently executed. The compiler might try to put this region in the fall through path.
Example: if (debug) {
#pragma frequency_hint NEVER trace();
}
52/166 8027948 Rev 15
UM1237 Pragmas
3.3.1 #pragma ident “string”
Adds a .comment section in an assembly file.
Marks a symbol as weak.
This pragma instructs the link editor to not issue a warning if it does not find a defining declaration of the specified weak symbol. In which case the symbol is set to 0.
Allow the overriding of the current definition by a non-weak definition. See
.
Figure 4.
#pragma weak example
#pragma weak opt_handler extern void opt_handler (void); int main(int argc, char *argv[])
{
/* If opt_handler has not been defined, the linker does not
complain and the condition is false.*/
}
/* If opt_handler has been defined, the opt_handler is
invoked.*/
if (opt_handler) opt_handler();
This pragma can be used only when MPx extension is used. It disables the native code
generation for all extensions. Refer to
Chapter 8: MPx native support on page 122
for further details.
This pragma can be used only when MPx extension is used. It forces the native code
generation for the all extensions. Refer to
Chapter 8: MPx native support on page 122
for further details.
This pragma can be used only when MPx extension is used. It disables the native code
generation for the specified extension. Refer to
Chapter 8: MPx native support on page 122
for further details. The typical use will be:
#pragma disable_specific_extgen ( MP1x, fct1, fct2).
8027948 Rev 15 53/166
Pragmas UM1237
This pragma can be used only when MPx extension is used. It forces the native code generation for the specified extension. The typical use will be:
# pragma force_specific_extgen ( MP1x, fct1, fct2)
Chapter 8: MPx native support on page 122
for further details.
54/166 8027948 Rev 15
UM1237 Optimization guide
This chapter describes specific compiler options and techniques that can be used to gain maximum performance in your application.
4.1 Inlining
Inline function expansion is performed for function calls that the compiler estimates to be frequently executed. These estimations are based on a set of heuristics. The compiler might decide to replace the instructions of the call with code for the function itself (inline the call).
The current version of the compiler only supports the single file inlining mode as described in
. The compiler supports both the single file inlining mode as described in
Section 5.2.1: Placement and layout on page 101
and cross file inlining through the IPA
Section 4.8: Interprocedural analysis optimization (IPA) on page 76
The purpose of this section is to make users aware of the underlying algorithms used to select functions to inline. First, it describes how possible candidates are selected for inlining, and how the selection is finalized, taking size conditions into account. Then, user-level compiler switches are listed, to show how the inlining process can be controlled.
The inlining decisions of the compiler can be observed with the -INLINE:list option. We recommend that this option should be used when tuning inlining decisions. The exact scope and syntax of the -INLINE option are described throughout this section.
There are two kinds of candidates for inlining: may-inline and must-inline functions.
•
•
May-inline functions are selected by the compiler according to the following conditions: function is declared with the inline C keyword the functions not declared inline are may-inline candidates only if the -
INLINE:only_inline=off
option is specified. In this case, a function is a mayinline candidate if:
– it is declared with the static C keyword
– its name is not weak
– its address is neither passed nor saved
Must-inline functions are specified by the user, through the command line option:
-INLINE:must=fn1,fn2,...
May-inline and must-inline functions are then checked against several criteria to decide whether to inline them or not.
8027948 Rev 15 55/166
Optimization guide UM1237
Inlining criteria
•
•
•
•
Each candidate function is checked against inlining-exclusion cases which include: requires no-inlining by the user (-INLINE:never=fn, -INLINE:off command line options) recursive function vararg
function exception handler
After this preliminary test, each candidate function is inlined regardless of cost if it is marked must-inline, or if the -INLINE:all option has been specified by the user.
Otherwise, cost evaluation is used to decide whether to inline or not, and the candidate function is rejected if its estimated cost is above a given threshold set by the compiler. The -
INLINE:list=on
option can be used to list what is inlined. Changing the compiler limits is
not recommended, since this can lead to longer compilation times or increased memory usage or both, with no noticeable performance benefit.
•
•
Finally: the function to be inlined must be defined and visible in the same source file as the function using it a static function that is inlined can be in specific circumstances considered “dead”, and removed from the final object file
(b)
specifies the options to control the stand-alone inlining.
More than one sub-option can be specified to the -INLINE:option either by using colons to separate each sub-option or by specifying multiple options on the command line. Some -
INLINE:option s are specified with a setting that either enables or disables the feature. To disable a feature, specify the sub-option with either =OFF, =FALSE or =0 (all these strings are case insensitive, for example -INLINE:list=OFF). To enable a feature, either use the option name alone (for example -INLINE:list) or any other string can be used on the right of the “=” sign (as in -INLINE:list=all). It is generally recommended to use =ON,
=TRUE
, =1 for the sake of clarity (for example -INLINE:list=ON).
Table 21.
Standalone inlining options
-inline
-noinline
Option
-INLINE:(on|off)
-INLINE:aggressive=(on|off)
Description
Enable inlining on inline functions. This is activated by default at optimization levels > 1.
Disable inlining.
Enable/disable inlining. Use of other -INLINE options implicitly set this to on.
Inline even non-leaf, out-of-loop calls. Default is off.
56/166 b. Note that this dead code removal was not performed in earlier versions of the stxp70cc compiler (that is, the compiler provided in toolset 3.1.0 and earlier). With those versions, inlining usually causes an increase in size, because both the original (not inlined) instance is preserved in the final executable code, even if it is never called.
8027948 Rev 15
UM1237 Optimization guide
Table 21.
Standalone inlining options (continued)
Option Description
-INLINE:all
-INLINE:all_inline
Forces may inline functions to be inlined, bypassing cost evaluation. This option conflicts with -INLINE:off, and takes precedence if both are specified. Default is off.
Inline all functions marked by the C language inline keyword.
Allow dead function elimination. Default is on.
-INLINE:dfe
-INLINE:list=(on|off)
List compiler actions. Default is off.
-INLINE:must=name1[,name2...]
Always attempt to inline the named subroutines in addition to the default heuristic.
-INLINE:never=name1[,name2...]
Never attempt to inline the named subroutines.
-INLINE:only_inline=(on|off)
-INLINE:size_static=(on|off)
-INLINE:specfile=filename
-INLINE:static=(on|off)
Default is on. Inline only functions marked by the C language inline keyword. The
-INLINE:only_inline=off
option is mandatory to allow inlining of non inline functions.
Set to on, this option limits the inlining of static functions.
Set to off, this option allows more aggressive inlining of
is optimized for size (-Os) and for optimization levels:
-O0
, -O1 and -O2 the default is on; when code is optimized for speed (-O3, -O4) the default is off.
Specifies a filename containing inlining options. Default is none
.
Default is off. Allow static functions to be candidates for inlining.
In addition to these options, the option given in
may be of interest when building a large body of inline functions (which is not recommended and may adversely affect performance).
Table 22.
Option changing inlining behavior
Option
-OPT:0limit=[0..n]
Description
Functions larger than size n are not optimized. Default is
3000
. Specifying 0 removes any limit but may lead to a very long compile time.
Inlining static functions
When the option -INLINE:size_static=on, the compiler assesses the total size increase that would result from the inlining of all the calls to the static callee function in the current caller. If this increase is above a given threshold, none of the calls to this callee function in the current caller are inlined.
When the option -INLINE:size_static=off, the compiler assesses the size increase that would result from the inlining of the calls to the static callee function incrementally. The first calls to the callee are inlined until the size increase becomes greater than the threshold.
8027948 Rev 15 57/166
Optimization guide
4.1.3
UM1237
Inlining any further calls is suspended when the size increase becomes greater than the threshold.
Extern inline functions
If both inline and extern are specified in a function definition, then the definition is used only for inlining. The function is never compiled on its own, not even if its address is referred to explicitly. The address becomes an external reference, as if the function had only been declared but not defined.
This combination of inline and extern has almost the same effect as a macro. The way to use it is to put a function definition in a header file with these keywords, and put another copy of the definition (lacking inline and extern) in a library file. The definition in the header file will cause most calls to the function to be inlined. If any instances of the function remain, they will refer to the single copy in the library.
The inlining process can be controlled within the C source code using #pragmas.
The stxp70cc compiler already supports several command-line options to configure its behavior, but it is not flexible enough. For instance, with the option -INLINE:never=foo the user can disable the inlining of foo everywhere it is called; conversely, with -
INLINE:must=foo
the user can force inlining of foo everywhere.
The user has the ability to force inlining or non-inlining at call sites through the use of pragmas. In addition, the noinline and always_inline attributes can be used at function declaration.
Pragmas
•
•
To force inlining or non-inlining of a function in the scope of a call site, the following two pragmas are introduced:
#pragma inline_next (foo,...)
forces inlining of function foo in the next statement
#pragma noinline_next (foo,...)
prevents inlining of function foo in the next statement
The ... denotes that it is possible to provide several function names with the same pragma. It is equivalent to several pragma lines.
•
•
Two similar pragmas are provided that can be used within the scope of a function:
#pragma inline_function (foo,...)
forces inlining of function foo every time it is called until the end of the current function
#pragma noinline_function (foo,...)
prevents inlining of function foo every time it is called until the end of the current function
The two call site scope pragmas take precedence over these two function scope pragmas.
•
•
Two lower priority pragma are provided, with file scope:
#pragma inline_file (foo,...) to force inlining of function foo every time it is called until the end of the current source file
#pragma noinline_file (foo,...) to prevent inlining of function foo every time it is called until the end of the current file
58/166 8027948 Rev 15
UM1237 Optimization guide
Finally, to revert inlining policy to the default one (that is, rely on the inliner’s evaluation of callee weight), the following pragma is introduced:
#pragma defaultinline (foo,...)
Function naming
As a special case, if the user does not provide any function name, the corresponding pragma applies to all functions called in the scope of the pragma. In this case, parentheses around the function names are optional.
User diagnostics
Several warning messages are provided to the user to help track errors.
If two conflicting pragmas are provided only the later is taken into account. For instance,
#pragma inline_next (foo)
#pragma noinline_next (foo) foo();
This generates the following warning: warning: #pragma noinline_next (foo) overrides previous #pragma inline_next (foo)
If pragmas are provided at an invalid scope (that is outside of a function), the following message is displayed: warning: #pragma noinline_function (foo) ignored (incorrect scope)
To help track misspelling, a warning is also displayed if a pragma could not be applied to any function call.
#pragma noinline_next (bar) foo(i);
This generates the following warning: warning: #pragma noinline_next (bar) matched no call
noinline and always_inline attributes
In order to enable the user to inhibit inlining of one function wherever it is called, the noinline
attribute is introduced, and is used at the function declaration level.
Conversely, to enable the user to force inlining of one function wherever it is called, the always_inline
attribute is introduced.
Precedence
Command-line options -INLINE:must=foo and -INLINE:never=foo take precedence over both pragmas and attributes.
Attributes take precedence over pragmas. That is, a function declared with
__attribute__((noinline))
is never inlined, regardless of pragma inline_xxx statements. However, the user can override this behavior with the -INLINE:must=foo command-line option.
If several contradictory pragmas with the same scope apply to the same function, the last one overrides the earlier ones.
8027948 Rev 15 59/166
Optimization guide UM1237
Examples
Example one (
) illustrates the use of the #pragma noinline_next directive. All
calls to f1() are candidates for inlining, except the one directly following #pragma noinline_next
.
Figure 5.
#pragma noinline_next example
int ig = 0; inline void f1(int i) {ig += i;} void main()
{
f1(1); // f1 is candidate for inlining
#pragma noinline_next (f1)
f1(2); // f1 is not marked for inlining
f1(3); // f1 is candidate for inlining
}
printf("result is %d\n", ig);
Example two (
) illustrates the use of the #pragma inline_function directive.
All calls to f1() following the #pragma inline_function (f1) directive are forced to be inlined, except the one directly following #pragma noinline_next (f1). The call to f2()
following the #pragma inline_next (f2) is also forced to be inlined, while the first call to f2() is only a candidate for inlining (inlining depends on the respective weights of f2() and its caller).
Figure 6.
#pragma inline_function example
int ig = 0; int jg = 0; inline void f1(int i) {ig += i ;} inline void f2(int i) {jg += i ;} void main()
{
#pragma inline_function (f1)
f1(1); // f1 is forced to be inlined
f2(1); // f2 is candidate to inlining
#pragma noinline_next (f1)
f1(2); // f1 is not marked for inlining
#pragma inline_next (f2)
f2(3); // f2 is forced to be inlined
f1(3); // f1 is forced to be inlined
}
printf("result is %d %d\n", ig, jg);
60/166 8027948 Rev 15
UM1237 Optimization guide
Example three (
) illustrates the use of the #pragma defaultinline directive.
Figure 7.
#pragma defaultinline example
int ig = 0; int jg = 0; inline void f1(int i) {ig += i ;} inline void f2(int i) {jg += j ;} void main()
{
#pragma noinline_function (f1)
f1(1); // f1 is not marked for inlining
f2(1); // f2 is candidate to inlining
#pragma inline_next (f1)
f1(2); // f1 is forced to be inlined
#pragma noinline_next (f2)
f2(3); // f2 is not marked for inlining
#pragma defaultinline (f1)
}
f1(4); // f1 is candidate to inlining
printf("result is %d %d\n", ig, jg );
) illustrates the use of several function names or an empty name list with #pragma directives.
Figure 8.
Empty or multiple function name example
#pragma noinline_file () int f(int i) { return i+1; } int g(int i) {
#pragma inline_next (f,g)
ignored
j += f(i) + f(i); // f is not marked for inlining
} int h(int i) {
#pragma noinline_next ()
int j=i + f(i) + g(i); // f and g are not marked for
inlining
#pragma inline_next (f,g)
j+=i + f(i) + g(i); // f and g are forced to be inlined
} void main()
{
}
// g and h are not marked for inlining
printf("result is %d %d\n", g(0), h(0));
8027948 Rev 15 61/166
Optimization guide UM1237
Example five (
) illustrates the use of the noinline attribute and shows how the
attribute has precedence over #pragma.
Figure 9.
noinline attribute example
#pragma inline_file(f3) int ig = 0; void __attribute__ ((noinline)) f3(int i) { ig += i ; } int main()
{
f3(1); // f3 is not marked for inlining
#pragma inline_next(f3)
f3(2); // f3 is not marked for inlining
#pragma defaultinline (f3)
}
f3(3); // f3 is not marked for inlining
printf("result is %d\n", ig);
62/166 8027948 Rev 15
UM1237 Optimization guide
4.2.1
This section describes how the stxp70cc compiler implements loop unrolling.
Default unrolling policy
The way loops are unrolled depends on the optimization level and on the version and configuration of the core (single or dual ALU/dual issue, and the number of general purpose registers (GPRs)).
•
•
Two main parameters are controlled: the maximum unrolling factor to be applied the maximum size of the loop after unrolling (this size corresponds to the number of instructions in the internal representation rendered by the compiler when unrolling is applied)
The exact parameters used to control unrolling are listed in
Table 23.
Loop unrolling parameters
Optimization level
-O0, -O1, -Os
-O2
-O3
-O4
-O4
-O4
Core
All
All
All
STxP70-3
STxP70-4-single issue
STxP70-4-dual issue,
16 GPRs
STxP70-4-dual issue,
32 GPRs
2
4
4
Maximum unrolling factor
No unrolling
2
2
Maximum unroll size
No unrolling
32
64
64
64
128
Note: 1 Depending on the internal analysis, the compiler is free to apply an actual unrolling factor which is smaller than the maximum specified for the optimization level and core. This is especially the case if a smaller unrolling factor enables the compiler to avoid the generation of a remainder loop.
2
The #pragma unroll directive takes precedence over the default behavior of the loop
unroller.
8027948 Rev 15 63/166
Optimization guide
4.2.2 Advanced control of the unroller
UM1237
•
•
The following facilities are provided to fine tune loop unrolling: the loop unroll pragma #pragma unroll
This pragma can be used to apply a precise unrolling factor to a given loop. This pragma is described in
Section 3.2.1: #pragma unroll (n) on page 45
.
the stxp70cc -maggressive_unroll=n option
This option enables the aggressiveness of the unroller to be set. This option takes an integer in the range [0, 6] as an argument. It applies the unrolling parameters described in
Table 24.
-maggressive_unroll option: values of n
Level Maximum unroll factor
0
1
2
3
4
5
6
8
8
4
4
No effect
2
2
Maximum unroll size
No effect
64
128
64
128
64
128
4.2.4
•
•
The precedence order is as follows:
#pragma unroll
takes precedence over both the default unroller behavior and the
-maggressive_unroll
option the -maggressive_unroll option takes precedence over the default unroller behavior
Built-in assume and pragma loopmod
The built-in, __builtin_assume can be used to instruct the compiler that the loop count is a multiple value of a given integer. This allows the compiler to apply an unrolling factor which does not cause the generation of a remainder loop. This saves code size while often ensuring a better efficiency of the final code.
The following code provides an example where the loop count is stated to be a multiple value of 4:
__builtin_assume((lcount&3)==0);
for(i=0; i<lcount; i++) {
*dest=*src;
dest++; src++;
}
The built-in can be easier to integrate in the code than using the #pragma loopmod described in
Section 3.2.4: #pragma loopmod on page 48
64/166 8027948 Rev 15
UM1237
4.3
Optimization guide
Memory dependences in C programs
Precise analysis of memory dependences is key to compilation optimization, since it enables the compiler to more freely schedule load instructions above store instructions. By default, a C compiler assumes that any pair of memory accesses that reference distinct types are not aliased (that is, memory dependent). However, real world cases almost always involve pointers to the same types that are actually un-aliased: the compiler cannot generally deduce this property and must rely on additional information. This effect can be achieved either through the C language restrict keyword, or with the compiler option:
-OPT:alias=value
where possible values are listed in
.
Table 25.
Possible value to the -OPT:alias option
Value Description
any typed unnamed restrict disjoint
The default. Any pair of memory accesses may be aliased.
Any pair of memory accesses that reference distinct types are not aliased.
Assume pointers never point to global objects.
Assume that different pointers never point to the same area
Assume multiple pointer indirection never overlap.
Although the compiler is able to compute precise memory dependences in many cases, this is not possible when complex memory accesses are involved, such as in the following example: for (i = 1; i < n; i ++) {
a[i-1] = a[i] + b[i];
} for (i = 1; i < n; i ++) {
c[d[i]] = c[i] + 1;
}
On the first loop, the compiler can fully determine the dependences between memory accesses, provided that it knows that a and b point to distinct memory locations (see the C language restrict qualifier). On the second loop, however, without information on values in d, the compiler assumes that all memory accesses in the loop are dependent. In particular, the sequence of load and store memory accesses in the iterations of the loop must be strictly respected, resulting in a poor instruction schedule if the loop is unrolled or software pipelined.
A useful property for loop optimizations is when a loop is vectorizable. This property can be enforced on a loop by using the #pragma loopdep VECTOR. A vectorizable loop is such that it can be decomposed into a sequence of loops, one per statement of the original loop, without changing the program results. Moreover, for each loop resulting from that decomposition (that contains only one statement), all load memory accesses can be performed before all store memory accesses, which means that a vector version of the loop can be written. In practice, unless the target processor is a real vector processor, the compiler does not decompose vectorizable loops as described. Rather, it uses the
8027948 Rev 15 65/166
Optimization guide UM1237
vectorizable property of the original loop to remove dependences between memory accesses.
In the example above, the first loop is vectorizable, provided that a and b do not overlap.
The second loop is also vectorizable if the assertion (d[i]<=i) holds for all i.
Another useful property for loop optimizations is when a loop can be parallelized. This property can be enforced on a loop by using the #pragma loopdep PARALLEL. A parallelized loop is one where memory accesses that reference a given memory location may occur only in the same iteration of the loop. As a result, the sequence of memory accesses of the original loop can be changed in any way that preserves the relative order of memory accesses originating from the same loop iteration. Note that a parallelized loop is always vectorizable, so the #pragma loopdep PARALLEL is stronger (but less generally applicable) than the #pragma loopdep VECTOR.
In the example above, the first loop cannot be parallelized. The second loop can be parallelized if the assertion (d[i]==i) holds for all values of i.
The last useful property for loop optimizations is when a loop is liberal. This property can be enforced on a loop by using the #pragma loopdep LIBERAL. A liberal loop is one where all its memory accesses reference unique memory locations. As a result, all the memory accesses in the loop can be freely reordered. Note that a liberal loop can always be parallelized, so the #pragma loopdep LIBERAL is stronger (but less generally applicable) than the #pragma loopdep PARALLEL.
In the example above, the second loop is liberal if the assertion: (d[i]<1 || d[i]>=n) holds for all i. (For clarity, we omitted this case for the VECTOR and PARALLEL pragmas.)
The restrict qualifier, which applies to pointers or arrays in a C program, is also highly useful to remove dependences between memory accesses inside and outside loops. The restrict property states that two memory accesses originating from different pointers or arrays cannot reference the same memory location, when at least one of the pointers or array has the restrict qualifier. Please note that all memory accesses based on a given restrict pointer or array are still assumed dependent, unless it is obvious to the compiler that they are not, or there is a #pragma loopdep on the loop that applies to these dependences.
4.4
Note:
Aliasing rules in C/C++ programs
The -fstrict-aliasing option is enabled by default and allows the compiler to assume the strictest aliasing rules applicable to the language being compiled (the aliasing rules are stated in clause 6.5 (7) of the ISO/IEC Standard (Expressions)).
For C and C++, this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.
The type attribute may_alias is also available so that accesses to objects with types with this attribute are not subject to type-based alias analysis. Instead they are assumed to be able to alias any other type of object.
The -fno-strict-aliasing option can be used to disable the default option if required.
Particular attention is required before reporting a compiler issue related to aliasing, specifically when code runs correctly with the -fno-strict-aliasing option, but
66/166 8027948 Rev 15
UM1237
Note:
Note:
Optimization guide
diverges when the default aliasing option is used. This is often caused by a violation of aliasing rules, which are part of the ISO C/C++ standard. These rules say that a program is invalid if you try to access a variable through a pointer of an incompatible type.
The example shown in
demonstrates this violation, where a float is accessed through a pointer to integer.
Figure 10. Aliasing example, using a cast
#include <stdio.h> int main(int argc, char *argv[])
{
float a = 0.0f ;
int *pa = (int *)&a ;
*pa = 0x40000000; /* violation of aliasing rules */
if (a != 0.0f)
puts("LEGACY BEHAVIOR") ;
else
puts("STRICT ALIASING BEHAVIOR") ;
}
return 0;
The aliasing rules were designed to allow compilers to perform more aggressive optimization. Basically, a compiler can assume that all changes to variables happen through pointers or references to variables of a type compatible with the accessed variable. Dereferencing a pointer that violates the aliasing rules results in undefined behavior.
In the case above, the compiler may assume that no access through an integer pointer can change the float a. Therefore, the actual value of a may be unaffected by the writing through pa
. What really happens is up to the compiler and may change with architecture and optimization level.
To disable optimizations based on alias-analysis for ‘faulty legacy code’, the option -fnostrict-aliasing
must be used as a work-around.
Because the practice of reading from a different union member other than the one most recently written to (called “type-punning”) is common, even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.
To fix the code in
above, you can use a union instead of a cast, as shown in
.
This is a GCC extension which might not work with other compilers.
8027948 Rev 15 67/166
Optimization guide
Figure 11. Aliasing example, using a union
#include <stdio.h>
/*
According to GNU documentation, this code should work in
both strict and non-strict aliasing rules
*/
int main(int argc, char *argv[])
{ union {
float f ;
int i;
} u;
u.f = 0.0f ;
u.i = 0x40000000 ; /* is 2.0f */
if (u.f != 2.0f)
puts("NON-GNU BEHAVIOR") ;
else
puts("GNU ALIASING BEHAVIOR") ; return 0;
}
Now the result is always GNU ALIASING BEHAVIOR.
UM1237
68/166 8027948 Rev 15
UM1237 Optimization guide
Finally, to fully respect the ANSI C/C++ aliasing rules, it is necessary to write the data through a character type before reading it again. See
. The drawback of this standard conforming solution is that it has to account for endianness, and that it is less efficient than simply writing through an integer.
Figure 12. Aliasing example, writing through a character type
#include <stdio.h>
/*
According to ANSI standard, this code should work in
both strict and non-strict aliasing rules
*/
#include <stdio.h>
#define EXTRACTBYTE(val, pos) (((val) >> (pos*8)) & 0xff) int main(int argc, char *argv[])
{
union
{
float f ;
char c[4] ;
} u; const unsigned int twoasint = 0x40000000 ;
u.f = 0.0f ;
#if defined(__BIG_ENDIAN__)
u.c[0] = EXTRACTBYTE(twoasint, 3) ;
u.c[1] = EXTRACTBYTE(twoasint, 2) ;
u.c[2] = EXTRACTBYTE(twoasint, 1) ;
u.c[3] = EXTRACTBYTE(twoasint, 0) ;
#elif defined(__LITTLE_ENDIAN__)
u.c[0] = EXTRACTBYTE(twoasint, 0) ;
u.c[1] = EXTRACTBYTE(twoasint, 1) ;
u.c[2] = EXTRACTBYTE(twoasint, 2) ;
u.c[3] = EXTRACTBYTE(twoasint, 3) ;
#else
#error "Unknown endianness : please define either __BIG_ENDIAN__ or __LITTLE_ENDIAN__"
#endif
if (u.f != 2.0f)
puts("UNEXPECTED BEHAVIOR") ;
else
puts("ANSI ALIASING BEHAVIOR") ;
return 0;
}
In this case, the program always prints “ANSI ALIASING BEHAVIOR” regardless of the compiler and its optimization options.
8027948 Rev 15 69/166
Optimization guide UM1237
4.5 Profiling
Before optimizing any application, we recommend that you analyze the critical areas of your code to identify where optimization will have the most effect.
Profiling creates an instrumented program from your source code. Whenever this instrumented code is executed, the program generates an information file that can be displayed using the stxp70-gprof utility, supplied with the toolset.
4.5.1
4.5.2
Warning: Note that the functions in the toolset libraries (most especially, the standard C library) are not instrumented for intrusive profiling. Therefore, the time and cycles spent in the library functions is assigned to the caller functions in the application.
This section is not a complete guide to profiling, but a brief refresher on how to proceed with the compiler.
Profiling data generation
Profiling is enabled by the -pg compiler option. For example: stxp70cc -O2 -pg *.c -o myexe
Using profiling data
The first run of a program compiled using the -pg option generates a file called gmon.out.000
. This file can be viewed with the stxp70-gprof utility.
After each run in the same directory, the numerical suffix of gmon.out.000 is incremented.
The profile information for the next run is therefore gmon.out.001, and so on.
•
•
•
Note that a second file named stprof.out.xxx is also created. This file provides timing measurements related to the call tree. The following data are available: basic_time: only the time spent within a function callcost_time: time spent in the function and its children count: number of function calls
The symbolic information available in the profile information can be augmented by using the
-g
option when compiling the source code.
Users who are familiar with the standard gprof tool may use gprof to read the profiling output file. In this case, it is necessary to pass the option --graph to the tool: gprof --graph myexe gmon.out.000
70/166 8027948 Rev 15
UM1237
4.5.3
Optimization guide
Special case of programs that never exit
•
•
Usually, profiling data are generated at program exit. Many embedded applications, however, are built as infinite loops and thus never exit. To enable profiling of such applications, the toolset provides a dedicated function named UserProfilingWrite().
When this function is called, it updates the following profiling output files: call-graph file gmon.out.xxx time profiling file stprof.out.xxx
In those file names, xxx stands for a magic number that is incremented each time this profiling function is called. It is only possible to use the function UserProfilingWrite() if the correct toolset header file gprof.h is included in the source code:
#include <gprof.h>
Warning: We recommend that you use UserProfilingWrite() outside critical or very often executed loops. It should be called only a few times in a program. Be aware that a call to this function may have side effect on compiler optimizations, and may therefore bias results if placed in critical parts of the code.
The profiling functions make use of the 64-bit cycle counters of the STxP70 core, and the value of the counter is read each time a function is entered and exited. Therefore, using those counters must be avoided when profiling is enabled. The predefined profiling macro __LIBGPROF_CYCLE_PROFILING
(which is automatically defined when -pg option is set) can be used to protect the user-defined instrumentation code based on cycle counters.
This small code sample below illustrates how to use this macro to avoid conflict between profiling and user instrumentation involving cycle counters:
#ifndef __LIBGPROF_CYCLE_PROFILING clrcc(); startcc();
#endif
8027948 Rev 15 71/166
Optimization guide UM1237
Usually, program instrumentation dedicated to profiling does not require any more heap bytes than specified in the standard link script. However, in some specific applications – in particular when involving a large number of routines – the standard heap size may be too small. If this happens, the following message can appear at application run-time:
ERROR : profiling : cannot malloc profiling stack of XXX bytes: please increase heap!
To overcome this problem, edit the link script file associated to your application and increase the padding of .heap section. By default, the .heap section contribution line is:
.heap ALIGN(16) PAD(64K) NOINIT : { } > EXTSM
This means that the.heap section base is aligned on a multiple of 16 boundary address, is
64 Kbytes in size and not zero-initialized at startup. Moreover, this section is located in
EXTSM memory region. To increase the padding of this contribution, you should change the
64K by something bigger depending on the XXX amount required, as shown in the error message above.
Please note that if you do not specify a link script on your link command-line, the sx_valid.ld
file used by default is the one located in the folder:
<Toolset_Root>/arch_v3/stxp70cc/<stxp70cc_version>/lib/ldscript
Copy this file into your application project, modify its content according to statements above and add it to your link command.
The toolset provides several options to generate test coverage data that can be used with the GNU gcov test coverage program. Both the -ftest-coverage and -fprofilearcs
options produce data files that can then be input to gcov. See the Using the GNU
Compiler Collection (GCC) manual provided with this product for a description of how to apply code coverage techniques.
Table 26.
Standalone inlining options
Option Description
-fbranch-probabilities
Re-compile a program that has already been compiled with the fprofile-arcs
option. The -fbranch-probilities option instructs the compiler to optimize using estimated branch probabilities generated by -fprofile-arcs.
-fcoverage-counter64
Instruct the compiler to use a 64-bit edge counter instead of the default 32-bit counter. Each counter is saved as 64 bits and so the output can still be used with any gcov utility. Use this option if you think a statement is executed more than 2
32
times.
72/166 8027948 Rev 15
UM1237 Optimization guide
Table 26.
Standalone inlining options
(continued)
Option Description
-fprofile-arcs
-ftest-coverage
Instrument the "arcs" of the program flow during compilation. For each function of your program, stxp70cc creates a program flow graph, then finds a spanning tree for the graph. Only arcs that are not on the spanning tree have to be instrumented; the compiler adds code to count the number of times that these arcs are executed.
-fprofile-arcs
also makes it possible to estimate branch probabilities, and to calculate basic block execution counts. In general, basic block execution counts alone do not give enough information to estimate all branch probabilities.
When the program exits, -fprofile-arcs saves a list of arcs in the program flow graph to a file called sourcename.gcda. gcov can reconstruct the program flow graph and compute all basic block and arc execution counts from the information in this file.
Use the compiler option -fbranch-probabilities when recompiling to apply further optimizations.
Create a data file for the GNU gcov code coverage utility. The name of the data file begins with the name of your source file:
sourcename.gcno
. It contains a mapping from basic blocks to line numbers, which gcov uses to associate basic block execution counts with line numbers.
Note:
When recompiling, you must use the same code generation and optimization options for both compilations. The only difference allowed is to replace -fprofile-arcs with fbranch-probabilities
.
When running Interprocedural analysis, all the sources are merged into a unique file (or several files for large programs). Therefore, the compiler is unable to know which procedure belongs to which .c or .cxx file. The correspondence between a .c or .cxx and a .gcno or .gcda file is no longer possible. The name of .gcda and .gcno files is the name of the final executable, plus “_”, plus the number of the .s file that IPA has created. Since all the original .c or.cxx filenames are saved in the .gcno file, gcov is able to associate each procedure with a source file.
You will need a copy of gcov with a version number higher than or equal to 3.4.4.
8027948 Rev 15 73/166
Optimization guide UM1237
This section describes the use of the options -finstrument-functions and minstrument-calls
.
The -finstrument-functions option provides standard GCC functionality. Using this option generates instrumentation calls for entry and exit to functions.
Just after function entry and just before function exit, the following profiling functions are called with the address of the current function and its call site: void __cyg_profile_func_enter (void *this_fn, void *call_site); void __cyg_profile_func_exit (void *this_fn, void *call_site);
The first argument is the address of the start of the current function. This may be looked up specifically in the symbol table.
The second argument is the address of the call site from where the current function was invoked. It corresponds to an address in the range of the caller function addresses that may be found in the symbol table of the executable.
Functions that are inlined by the compiler are not instrumented. To force instrumentation of all functions, use the -fno-inline option to disable inlining.
A function may be given the attribute no_instrument_function, in which case instrumentation is not done for this function. This can be used, for example, for the profiling functions listed above, high-priority interrupt routines, and any functions from which the profiling functions cannot safely be called (perhaps signal handlers, if the profiling routines generate output or allocate memory).
The program must be linked with an object file that implements the two functions above to link correctly.
Note: The option -minstrument-calls is not a standard GCC option.
Use this option to generate instrumentation calls just before, and just after each function call.
74/166 8027948 Rev 15
UM1237 Optimization guide
The following profiling function is called with the address of the caller function and the address of the callee function: void __profile_cal(void *caller_fn, void *callee_fn, const char *caller_name, const char *callee_name, int event);
The arguments to this function are as follows: caller_fn
This is the address of the start of the current function (the caller function), which can be looked up specifically in the symbol table.
callee_fn caller_name
This is the address of the start of the called function (the callee function), which can be looked up specifically in the symbol table.
This is the name of the caller function.
callee_name event
This is the name of the callee function, or NULL if the call is an indirect call.
The function names passed in the third and fourth arguments are pointers to static strings that have the lifetime of the instrumented executable or shared object.
The function names are the mangled names in C++.
This is 0 when this function is invoked just before a call, instrumenting a function entry. It is 1 when this function is invoked just after a call, instrumenting a function exit.
Function calls that are inlined by the compiler are not instrumented.
To force instrumentation of all functions use the -fno-inline option to disable inlining.
A function may be given the attribute no_instrument_function, in which case this instrumentation is not done if the caller or the callee function has the attribute no_instrument_function
.
The program must be linked with an object file that implements the function above to link correctly.
•
•
•
•
The main differences with the -finstrument-functions option are listed below.
This instrumentation tracks (caller, callee) address pairs instead of (call_site, callee) address pairs. If the call site information is required use the -finstrumentfunctions option.
This instumentation provides the caller and callee name when available, which avoids a specific post processing pass to retrieve the function names.
This instrumentation is at the call site and not in the callee, therefore for instance calls to top level library functions (which are not instrumented) are seen while the option finstrument-functions
does not see them. To disable the instrumentation of the call to a particular library routine you must declare it with the no_instrument_function
attribute.
This instrumentation is not standard GCC functionality.
8027948 Rev 15 75/166
Optimization guide
4.8 Interprocedural analysis optimization (IPA)
UM1237
The -ipa option enables interprocedural analysis. With this option enabled, the compiler identifies opportunities for optimization across module boundaries. It does this by extending its scope for optimization and inlining from a single module to multiple modules.
Warning: The -ipa option in addition to the required optimization level must be included in both the compiler and linker phases.
•
•
•
•
The major benefits of IPA are: interprocedural constant propagation interprocedural alias analysis inter-module inlining interprocedural placement of data in specific memory spaces. On the STxP70, the possible spaces are DA and SDA. These can be controlled manually using options and
attributes already described (see also
Table 6: Generic options with -M flag on page 18
). The command line options that control
manual memory placement (such as -Mda and -Msda) are ignored when automatic placement is enabled.
A more advanced use of IPA is function specialization (also known as cloning).
The only mandatory option to trigger IPA compilation is -ipa.
The compilation and link time is longer because much of the optimization work is driven from the linker. This can be observed by using the -v compiler option.
•
•
•
•
The following steps are performed when building an executable in IPA mode: the .c files are translated into special .o files the .o files are merged together (code, symbol table) the .o files are analyzed and optimized the final link is performed
Because IPA mode optimizations are carried out by the linker as well as the compiler, the optimization is carried out only if the appropriate command line options are passed to both the linker and the compiler. It may, therefore, be necessary to modify the Makefile accordingly.
76/166 8027948 Rev 15
UM1237
4.8.2
Optimization guide
IPA command line options
describes advanced IPA options.
Table 27.
-dryipa
Advanced IPA options
Option
-IPA:aggr_cprop=ON|OFF
-IPA:cgi=ON|OFF
-IPA:cprop=ON|OFF
-IPA:depth=n
-IPA:dfe=ON|OFF
-IPA:dve=ON|OFF
-IPA:forcedepth=n
-IPA:inline=ON|OFF
-IPA:keeplight=ON|OFF
-IPA:maxdepth=n
Description
The -dryipa option replaces the -dryrun option, which is no longer relevant for IPA. The -dryipa option dumps details of the different steps invoked by the driver.
Enable or disable aggressive inter-procedural constant propagation. This option attempts to avoid passing constant parameters, replacing formal parameters by their corresponding constant values. The default in ON.
Enable or disable constant global variable identification. This option marks non-scalar global variables that are never modified as constants, and propagates their constant values to all files. The default is ON.
Enable or disable inter-procedural constant propagation. This option identifies formal parameters which always have a specific constant value. The default is ON. See also
-IPA:aggr_cprop
.
This option is identical to -IPA:maxdepth=n
Enable or disable dead function elimination. This option removes subprograms which are never called from the program. The default is ON.
Enable or disable dead variable elimination. This option removes variables which are never referenced from the program. The default is ON.
Set inline depths. Instead of the default inlining heuristics, this option directs IPA to attempt to inline all functions at a depth of
(at most) n in the call graph, where functions which make no call are at depth 0, those which call only depth 0 function are at depth 1, and so on. This ignores the default heuristic limits on inlining.
Perform inter-file subprogram inlining during main IPA processing. The default in ON.
Direct IPA not to send -keep to the compiler, in order to save disk space. The default is ON. Setting it to OFF leaves intermediate files in a directory which has the name of the final executable but suffixed with .ipakeep.
Direct IPA not to attempt to inline functions at a depth of more than n in the call graph, where functions which make no call are at depth 0, those which call only depth 0 functions are at depth
1, and so on. Inlining remains subject to overriding limits on code expansion. See also forcedepth, space and plimit.
8027948 Rev 15 77/166
Optimization guide UM1237
Table 27.
Advanced IPA options (continued)
Option Description
-IPA:mem_placement=ON|OFF
Enable or disable automatic placement of variables into the special SDA and DA memory spaces. This STxP70 specific optimization results in a more efficient address construction in the use of GP-based instructions. Default is ON when optimization level is O2 or higher (O2, O3, O4 and Os), OFF otherwise. Command line options that control the manual memory placement are ignored when automatic memory placement is enabled.
-IPA:mem_array=ON|OFF
-IPA:mem_struct=ON|OFF
-IPA::SDAspace=n
Enable or disable automatic placement of array variables into special memory spaces.
Enable or disable automatic placement of structure variables into special memory spaces.
Set the size of the SDA memory space to n bytes (the default is
4096).
-IPA::DAspace=n
-IPA:multi_clone=n
-IPA:node_bloat=n
-IPA:plimit=n
-IPA:space=n
-IPA:specfile=filename
Set the size of the DA memory space to n bytes (the default is
32768).
Specify the maximum number of clones that can be created from a single procedure. By default, this value is 0. Aggressive procedure cloning may provide opportunities for interprocedural optimization, but it also may significantly increase the code size.
When used in conjunction with -IPA:multi_clone, n this option specifies the maximum percentage growth of the total number of procedures relative to the original program.
Stop inlining in a particular subprogram when it reaches a size of n bytes in the intermediate representation. The default is
2500.
Stop inlining when the program size has increased by n%. For example, space=20 limits code expansion due to inlining to approximately 20%. The default is 100%.
Open filename to read more options. A spec file contains zero or more of IPA options.
4.8.3 Limitations and special cautions
IPA and debug options
IPA optimization is not compatible with the -g compiler option. If both options are passed to stxp70cc
, then the -ipa option is automatically disabled by the driver, and debugging information is generated.
IPA and compilations stages
The full benefit of IPA optimization is obtained only if both the compilation and the link stages receive the -ipa option and the optimization level in command line. This is particularly true when existing makefiles have separate stages and flags for compilation and link stages.
78/166 8027948 Rev 15
UM1237 Optimization guide
IPA memory placement versus options and attributes
•
•
•
The manual placement of variables in the special memory spaces takes precedence over the automatic placement. The automatic placement takes precedence over the command line options that control manual memory placements. For instance: the automatic placement does not operate on a variable if an attribute instructs the compiler to place it manually in a specific memory space if the memory spaces are already filled with variables placed manually as a consequence of either attribute, the automatic placement has no effect if manual memory placement and automatic memory placement options are passed to the compiler, then the options that control the manual memory placement are ignored
This section describes the stxp70cc command line options for controlling floating-point.
•
•
The IEEE754 standard defines two types of floating-point representation:
The "single precision" is a 32-bit representation. It corresponds to the float data type in
C.
The "double precision" is a 64-bit representation. It corresponds to the double data type in C.
By default, a C compiler considers that floating-point calculations must be performed with double precision, unless explicitly specified by the programmer. Furthermore, if any 32-bit floating-point data is encountered in a floating-point calculation, it is promoted to 64-bit precision. This aims at ensuring that the maximum precision is preserved.
Syntax
•
•
In a program which must only use 32-bit floating-point arithmetic, a programmer should: declare all floating-point variables as 32-bit variables, that is "float" use only 32-bit floating-point constants, that is, use the "F" suffix (for example, "5.3F" is interpreted as a 32-bit constant, whereas "5.3" is considered as a 64-bit constant).
Limitation and options
•
•
When the mechanism for controlling floating-point precision is only implemented by syntax this can cause problems: many programmers are not aware that floating-point constants without the F suffix are interpreted as 64-bit constants if the whole precision of a program needs to be modified, then all types and constants may have to be changed, which may be tedious
The option -fshort-double is to be used to change the default behavior of the compiler, and assume that floating-point arithmetic must be carried out in 32-bit arithmetic, even if
"double" types or constants without the F suffix are used.
8027948 Rev 15 79/166
Optimization guide
4.9.3
4.9.4
UM1237
The option -mlib-short-double is to be used when specific libraries are provided to support short double code generation. On the STxP70, this option is deprecated, since it is forced to fit the default code generation setting. It is preserved mainly for legacy reasons.
Use of STxP70 with FPx
On any core without specific floating-point support, performing floating-point calculations in
32-bit or 64-bit arithmetic mainly results in calling different runtimes, or in different expansion of floating-point operations. This has a limited impact on performance.
On cores with 32-bit floating-point support, the problem is different. A program with 64-bit floating-point arithmetic cannot use the floating-point support of the core, which means that it will call the runtime instead. This is the case for the STxP70 with the FPx floating-point extension.
In other words, the FPx can be used efficiently only when floating-point arithmetic is 32-bit.
This is why it is highly recommended to use the option -fshort-double when the FPx is used, because it ensures that all floating-point computations are performed using 32-bit precision.
From the STxP70 toolset 4.1.0 onwards, a warning is emitted if the FPx is used without this option.
On the STxP70, -mlib-short-double is deprecated and no longer has effect. It is still recognized for legacy reasons.
Examples of floating-point arithmetic on the STxP70
Example 1: effect
Consider the following functions: float fct (float A)
{
return A * 5.3;
}
If this code is compiled with the option -O3 -Mextension=fpx, then the compiler generates the following code:
.global fct fct:
pushrl LK ;;
subu R15, R15, 4 ;;
.LEH_post_adjust_sp_fct_1:
callr __stod ;;
L_BB2_fct:
make R2, 13107 ;;
more R2, 13107 ;;
make R3, 16405 ;;
more R3, 13107 ;;
callr __muld ;;
L_BB3_fct:
addu R15, R15, 4 ;;
poprl LK ;;
jr __dtos ;;
80/166 8027948 Rev 15
UM1237
Note:
Optimization guide
Because the default compiler behavior is 64-bit floating-point, the constant is considered 64bit, and the whole calculation is promoted to 64-bit. As a consequence, the multiplication is performed due to the 64-bit runtime. The FPx cannot be used although this was specified in the command line.
Example 2: adding -fshort-double
Adding the option -fshort-double to the command line modifies the default behavior of the compiler and the floating-point calculations are all performed in 32-bit. The resulting code now makes use of the FPx:
.global fct fct:
L_BB1_fct:
.global fct fct:
L_BB1_fct:
make R0, 16553 ;;
more R0, 39322 ;;
fmvr2f F1, R0 ;;
fmul F0, F0, F1 ;;
rts ;;
Example 3: specifying 32-bit floating-point using only syntax
Alternatively, the same result could be reached by modifying the source code as follows, and compiling without the -fshort-double option: float fct_float (float A)
{
return A * 5.3F;
}
When -fhsort-double is used, "double" data types are interpreted as 32-bit floatingpoint. This means that the following function, compiled with -O3 -Mextension=fpx fshort-double
, will lead to the same result as in
Example 2: adding -fshort-double , and
thus effectively makes use of the FPx:
double fct (double A)
{
return A * 5.3;
8027948 Rev 15 81/166
Optimization guide UM1237
This section introduces and describes application configuration files (ACF), which facilitate the fine tuning of compiler options in files and functions.
4.10.1 General description and purpose
Note:
Open64 based compilers do not allow fine grain option settings. This means that, except for pragmas and attributes (such as inlining) that are already implemented, compiler options apply to all functions in a file, and to all files on a command line.
•
•
When IPA is not enabled, this limitation can be partly worked around: by using different command lines to generate object files by splitting code into different files if particular functions must be compiled with different options
On STxP70-v4, some optimizations are performed at linker or post link level. Those optimizations can depend on compilation options. Applying different options at compiler and linker post/linker level must be made with caution.
In any case, when IPA is enabled, this workaround cannot be applied. This may be problematic for the debugging and fine tuning of large applications. It is not easy to be implemented either in the context of the STWorkbench.
The application configuration files (ACF) have thus been implemented to apply specific compiler options to the different files and functions. The full set of options to be applied to the files and functions of the same application is called a configuration. An application configuration file can define several configurations, corresponding to different tuning scenarios. Those configurations can then be selected by a dedicated compiler option.
Principles and overview of the implementation
The implementation of application configuration files takes place directly at compiler driver level. It allows a fine grain, options control at a global level, file level and function level.
An application configuration file contains structured information to be attached to the corresponding functions or files.
It is read by the compiler if specified by a dedicated option. Then it is parsed by the driver, which applies the options at the requested level.
An ACF reproduces part of, or the whole of the application it is designed for, by listing files and functions names in a configuration. It can contain several configurations, and only one will be active during a compilation phase.
shows an example of an application configuration file.
82/166 8027948 Rev 15
UM1237 Optimization guide
Figure 13. Example application configuration file
configuration "c1" { // Starts the definition of a configuration called c1
-Os
// Option defined for all the application file "f1" { // Configuration specification for file f1
-O3 // In file f1, use speed optimization level function "foo" { // Configuration specification for function foo
-O2
-CG:if_conv=false // In function foo, disable if-conversion
}
}
} configuration "c2" { // Other configuration
-O3
} active configuration "c1"
•
•
•
, notice the definition of two possible configurations "c1" and
"c2".
If configuration "c1" is applied, then all files are compiled with the -Os option, except file "f1", which is compiled with the option -O3. Furthermore, function "foo" in file "f1" is compiled with the option -O2, and if conversion is disabled.
If configuration "c2" is applied, then all files are compiled with option -O3, without any exception.
By default, configuration "c1" is applied as the active configuration. The configuration
"c2" can be activated by a dedicated compiler option (see
Listing files or functions
It is possible to use a list of files or functions in a configuration, if several files (or functions) have to be compiled with the same set of options. The wild character asterisk "*" can be used in the names of files (or functions) to catch regular expressions. For example, an ACF could contain a section, such as the one shown in
.
Figure 14. Listing files and functions
file "f*" {//Configuration specification for all files with a name starting with 'f'
-Os // In those files, use speed optimization level function "foo1" "foo2" "foo3" {// Configuration specification for function
// foo1, foo2 and foo3
-O3
}
}
In this case, all files whose name is prefixed by an "f" are compiled with the option -Os.
Functions "foo1", "foo2", "foo3" are compiled with the option -O3.
8027948 Rev 15 83/166
Optimization guide
configuration_file ::= configuration_file configuration | configuration_file active_configuration active_configuration ::= active configuration string configuration ::= configuration string { one_configuration } | configuration string { } one_configuration ::= one_configuration file_conf | one_configuration global_option | file_conf | global_option global_option options file_conf files_name
::= options
::= <list of compiler options>
::= file files_name { one-file_conf } | file files_name { }
::= files_name string
| string
| <nothing> one_file_conf file_option func_conf one_func_conf string
::= one_file_conf func_conf | one_file_conf file_option | func_conf | file_option
::= options
::= function files_name { one_func_conf }
::= one_func_conf option |
<nothing>
::= " <characters> "
UM1237
84/166 8027948 Rev 15
UM1237 Optimization guide
4.10.4 Using the ACF
Note:
Note:
Compiling with an ACF
The option -macf-decl can be used to instruct the compiler to read and use an ACF: stxp70cc -macf-decl my_acf.acf
The driver then parses the given file and applies defined options at the requested level, provided that a default configuration is defined in the file.
Options defined in a configuration file take precedence over options defined on the command line (or in an STWorkbench session).
Specifying the active configuration
•
•
The active configuration can be specified by two different means:
Using the dedicated keyword in the ACF: active configuration
"string"
For example: active configuration "c1"
Using the compiler option:
-macf-active string
For example: stxp70cc -macf-decl my_acf.acf -macf-active c1
The -macf-active option takes precedence over the active configuration keyword
in the ACF.
Some warnings are emitted if no active configuration can be actually selected and applied.
In this case the ACF is ignored.
Creation of the ACF template
Even if the syntax is quite simple, writing the ACF for a large application can be a tedious work. Thus, it is possible to automatically create the template of the ACF to be used on a given application by using the dedicated option -macf-template.
For example, the following command creates the template of the ACF needed to compile an application implemented in four source files; the template is created with the constant name template.acf
: stxp70cc -macf-template file1.c file2.c file3.c main.c
This file lists all files and functions present in the application in a single configuration, with no specific option. It also defines this configuration as the default one and names it “c1”.
The file template.acf is created locally, in the compilation directory. If a file with this name already exists in this folder, the new content may be appended.
The template file remains incomplete until the link stage is run. This enables it to be appended to, by subsequent compilation steps. It is, only when the template is linked, that it is closed and cannot be further appended to. The mechanism for appending to and closing
the template file is described further in
.
8027948 Rev 15 85/166
Optimization guide UM1237
Summary
•
•
•
There are three ways to handle an ACF, demonstrated by the following examples: stxp70cc -macf-decl acf_filename.acf
Reads acf_filename.acf as an ACF, using the default configuration declared in the file as the active configuration. stxp70cc -macf-decl acf_filename.acf -macf-active c1
Reads acf_filename.acf as an ACF file, and uses the command line option to define the active configuration as c1. Configuration "c1" must be defined in the ACF acf_filename.acf
.
stxp70cc -macf-template source_file1.c source_file2.c source_file3.c source_main.c
Generates the ACF template for the application implemented by the source files specified. The source files must be linkable, and the compilation include a link stage to ensure that template is complete. For example: stxp70cc -macf-template source_file1.o source_file2.o source_file3.o source_main.o
4.10.5 Behavior of -macf-template option
Note:
The use of the -macf-template option is introduced in
Creation of the ACF template on page 85
The configuration defined and considered as the default in the template file is always named
"c1".
The behaviour of the -macf-template option depends on whether a template file already exists and also on whether it is considered complete and closed.
If the template.acf file is generated by one or more compilations without a link stage, the template file remains incomplete (and unusable) until the link stage is run.
Case 1: template.acf does not exist
1.
The following command is issued to create a file template.acf: stxp70cc -c -macf-template foo1.c
– this template contains the definition of a configuration "c1" for file foo1.c and all functions herein
– the closing bracket for "c1" is missing, and default configuration is not declared
2. The following command is now used to create a template for the file foo2.c: stxp70cc -c -macf-template foo2.c
– we have the pre-existing file template.acf created by the command in step
1.
– this new command appends the information related to file foo2 and all functions herein to the configuration "c1" of pre-existing file template.acf
– the closing bracket for "c1" and the declaration of the default configuration are still missing
86/166 8027948 Rev 15
UM1237 Optimization guide
3. Finally the following command is used to close the template and link it: stxp70cc -macf-template foo1.o foo2.o
– this last command only invokes the link stage. The file template.acf is closed, with “c1” declared as the default configuration
Steps
1.
to
2.
above generate the same file template.acf as the equivalent unique command: stxp70cc -macf-template foo1.c foo2.c
Case 2: template.acf exists and is closed
If the creation of a template is run with an existing, complete and closed template.acf file in the current folder, then the syntax will be invalid, and the parser will reject the resulting configuration file with an error message.
Makefiles
•
•
Compilation through makefile performs independent calls to the compiler to generate object files before linking. In this context, the generation of an ACF template requires an incremental behavior. The mechanism of the template generation tests if the template file template.acf
exists in the compilation directory. If it exists, it opens it in append mode.
Otherwise, it creates it. At the linker or archive creation stage, the following actions are performed: the template file is closed from a syntactical point of view (close of last '}', and the active configuration
lines are written) buffer and file are closed from a file system point of view
If the compilation does not end with a linker or archive creation stage (only use of the -S or
-c
option), then the buffer is flushed, the file is closed, but the file is not closed from a syntactical point of view. Since it does not end with the expected pattern, the corresponding template is not usable.
4.10.6 Scope and known limitations
Compiler options
Most stxp70cc compiler options, both external or internal can be used in the ACF.
Nevertheless, it would not make any sense to apply some of the options to only a subset of the files or functions. This is especially true for the compiler options which describe the hardware configuration.
•
•
•
The following options are not taken into account at file or function level:
-Mconfig
options: These options describe the hardware setup used to run the binary file to be generated by the compiler. Since this hardware is the same for all the parts of the code, those options should be the same in all files and functions. They are taken into account if they are defined at the global level of an ACF. They are ignored if they are defined only for some files or functions.
-Mextension
options: These options describe which extensions are available on the hardware, and can be used to generate the code. Like the-Mconfig options, they are accepted at global level, but discarded at file or function level.
-Mmode16
or -Mmode32: This option does not describe the hardware configuration, but rather the registers to be used during code generation. This option is accepted at
8027948 Rev 15 87/166
Optimization guide UM1237
global and file levels, but not at function level. This is linked to technical reasons in relation with ABI handling (register saving at entry and exit of functions), which must be consistent over the whole application.
Inliner
The inliner operates on a full compilation unit and then takes into consideration the optimization level specified at global or file level, but not at function level. As a result, when using ACFs, we can get different assembly code for a given function. Depending on the scenario used, the function can apparently be compiled twice at the same optimization level.
For instance, consider the file f1.c: int foo1() { return 1; } int foo2() { return 2; } int foo3() { return foo1() + foo2(); }
1.
First scenario
With this first scenario, the file is compiled by the following command line, based on a global -Os option: stxp70cc -Os -c f1.c
Here foo3() is compiled using -Os.
Assembly code for foo3() contains calls to foo1() and foo2(), which are not inlined because of -Os.
An ACF acf1.acf is defined with the following directives: file "f1" { function "foo3" { -Os }
}
Code is compiled using this ACF: stxp70cc -O3 -c -macf-decl acf1.acf f1.c
Here foo3() is compiled using -Os.
Assembly code for foo3() does not contain calls to foo1() and foo2(), which are inlined because of -O3, which is visible to the inliner.
Code is compiled with option -O3: stxp70cc -O3 -c f1.c
Here foo3() is compiled using -O3.
Assembly code for foo3() does not contain calls to foo1() and foo2().
88/166 8027948 Rev 15
UM1237 Optimization guide
An ACF acf1.acf is defined with the following directives: file "f1" { function "foo3" { -O3 }
}
Code is compiled using this ACF: stxp70cc -Os -c -macf-decl acf1.acf f1.c
Here foo3() is compiled using -O3.
Assembly code for foo3() contains calls to foo1() and foo2(), which are not inlined because -Os is visible to the inliner.
Intuitively, the user might expect to have the same code for scenario 1 and 2, as well as for scenario 3 and 4, but this will not be the case because of the implementation of inlining.
8027948 Rev 15 89/166
GNU C extensions supported by stxp70cc
5 GNU C extensions supported by stxp70cc
UM1237
•
•
•
GNU cc provides a large set of extensions that are widely used in the GNU Linux community. These extensions can be used to: describe embedded features, for example, data section placement provide guidance to the compiler for optimization, for example, the noreturn function provide language extensions, for example, conditional lvalue or C99 features
The GNU extensions are sometimes the only way to access ELF features that are not directly available in the C language; for example, to declare a symbol as weak.
5.1.1
stxp70cc provides several language features not found in ANSI standard C. (The pedantic
option directs stxp70cc to print a warning message if any of these features are used.) To test for the availability of these features in conditional compilation, check for a predefined macro __GNUC__, which is always defined under stxp70cc.
It is recommended to always put code containing stxp70cc extensions under the C preprocessor macro __GNUC__.
#if __GNUC__
/* Original GNU code */
#else
/* Work-around code */
#endif
Statements and declarations in expressions
Statements and declarations in expressions allow complicated C statements to be written and used as if they were a simple C expression, optionally returning a result value. Local declarations and labels may be embedded.
This provides a way to construct a safe preprocessor macro that comprises several statements, without using the do { } while(0) trick that swallows the semi-colon.
#define cfoo() \
( { int y = foo (); int z; \
if (y > 0) z = y; \
else z = - y; \
90/166 8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
When GNU extensions are used in conjunction with expression statements and macros, they enable service labels to be used, that is, labels whose scope is limited to the current statement. See
.
Figure 15. Locally declared labels example
#define SEARCH(array, max, target) \
({ \
__label__ found; \
typeof (target) _SEARCH_target = (target); \
typeof (*(array)) *_SEARCH_array = (array); \
int i, j; \
int value; \
for (i = 0; i < max; i++) \
for (j = 0; j < max; j++) \
if (_SEARCH_array[i][j] == _SEARCH_target) \
{ value = i; goto found; } \
value = -1; \ found: \
value; \
})
5.1.4
The address of a label defined in the current function, or a containing function, can be
obtained with the extended && unary operator that has type void*. See
.
Figure 16. Labels as values example
const char * cgoto(int i)
{
void *ptr = &&foo;
static void *array[] = { &&foo, &&bar, &&hack };
goto *array[i]; foo: bar:
} hack:
return "hack" ;
Naming an expression's type
A name can be given to the type of an expression using a typedef declaration with an initializer. To define name as a type name for the type of expression, do: typedef name = expression;
8027948 Rev 15 91/166
GNU C extensions supported by stxp70cc
Note:
UM1237
This can be used in conjunction with the statements-within-expressions feature described in
. For example, to define a safe “maximum” macro that operates on any
arithmetic type:
#define max(a,b) \
({typedef _ta = (a), _tb = (b); \
_ta _a = (a); _tb _b = (b); \
_a > _b ? _a : _b; })
The reason for using names that start with underscores for the local variables is to avoid conflicts with variable names that occur within the expressions that are substituted for a and b
.
In the future the GNU C language may include a new form of declaration syntax that allows the declaration of variables whose scopes start only after their initializers; this will be a more reliable way to prevent such conflicts.
typeof
allows you to refer to an object data type by referring to an object of that type. It is particularly useful to write generic and safe macro-definitions, which can then be applied to various primitive types or user-defined data types. Without this extension, it is necessary to define as many specific macros as the number of different types used in calls to the generic macro.
#define max(a,b) ({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
_a > _b? _a: _b; \
})
Compound expressions, conditional expressions and casts are allowed as lvalues provided their operands are lvalues. For example:
(a, b) += 5;
The middle operand in a conditional expression may be omitted, for example: z = x? : y; long long
support (integer 64-bits) is supported by the stxp70cc compiler. It is now also an ISO C99 feature. long long x;
92/166
Floating-point numbers are written in hexadecimal format: float f = 0x1.fp3;
8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
5.1.10 Specifying a register for a local variable
A register in either the core or an extension may be specified for a local variable, for example:
// R6 core register allocated to the myvar long variable
register long myvar asm ("r6") = name;
Note:
// The part number 1 of 128-bit width in the register 2
// of the register class D of the user defined extension MP2x
// is allocated to the variable myvarext
register MP2x_DP myvarext asm ("D2_P1");
The syntax for extension register specification is described in details in
Syntax of scalar/SIMD audio extension register lists on page 110
The extension multi-level register must always be specified using the smallest subpart syntax. It is however possible to allocate a top level register. In this case, the specified sub register must be the first one of the group composing the full register. For instance:
// declare a variable at level P allocated to D2_P1
register MP2x_DP var64 asm ("D2_P1");
// declare a variable at level X allocated to D1_P0 and D1_P1
register MP2x_DX var128 asm ("D1_P0");
8027948 Rev 15 93/166
GNU C extensions supported by stxp70cc UM1237
5.1.11 Array of length zero
Zero length arrays are allowed in GNU C. They are very useful as the last element of a
structure which is really a header for a variable length object. See
Figure 17. Zero length array example
#include <stdio.h>
#include <stdlib.h> struct line {
int length;
char contents[0];
}; struct line *newline( unsigned int this_length)
{
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
}
thisline->length = this_length;
return thisline ; void delline(struct line *thisline)
{
}
free(thisline) ; int main(int argc, char *argv[])
{
enum { __MAXL = 128 } ;
enum { __L = 16 } ;
struct line *lines[__MAXL] ;
int i ;
printf("sizeof(line) : %d\n", sizeof(struct line)) ;
for(i=0; i< __MAXL; i++) {
lines[i] = newline(__L) ;
}
for(i=0; i< __MAXL; i++) {
}
puts("Done.") ;
}
return 0 ;
94/166 8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
5.1.12 Array of variable length
An array of variable length is an automatic array defined with a length that is not a constant expression. This type of array is also known as a VLA. See
.
Figure 18. Variable length array example
#include <stdio.h>
#include <stdlib.h> void sadcat(char *s1, char *s2)
{
char str[strlen (s1) + strlen (s2) + 1];
strcpy (str, s1);
strcat (str, s2);
printf("%s + %s == %s\n", s1, s2, str) ;
}
printf ("sizeof(str) = %d\n", sizeof(str)); void tester (int len, char buffer[len][len]) {
int i=0, j=0;
char tt[len][len];
for (i=0; i<len; i++)
}
for (j=0; j<len; j++)
buffer [i][j] = i*j;
printf ("sizeof(tt) = %d\n", sizeof(tt));
printf ("sizeof(buffer) = %d\n", sizeof(buffer)); char data[10][10]; int main(int argc, char *argv[])
{
sadcat("Foo", "Bar") ;
tester (4, data);
tester (10, data);
}
return 0 ;
This extension enables a macro to be defined that can safely be expanded into a function with a variable number of arguments. These macros are also called CPP vararg macros.
For example, the following C program:
#define eprintf(format, args...) fprintf (stderr, format, ##args)
eprintf ("success!\n");
eprintf ("%s%d: ", input_file_name, line_number); is expanded to: fprintf ((&__iob[2]), "success\n!"); fprintf ((&__iob[2]), "%s%d: ", input_file_name, line_number);
8027948 Rev 15 95/166
GNU C extensions supported by stxp70cc
Note:
UM1237
GNU C supports two types of “variable number of arguments” syntax. The ISO C99 format, which uses __VA_ARGS__ and the GNU format that uses ##args. The ISO C99 format does not support the case where the number of parameters passed as part of the ellipsis is zero. GNU C reuses the ## trick to absorb the comma in this case. See
.
Figure 19. Variable number of arguments example
#include <stdio.h>
#define gnu_eprintf(format, args...) \
fprintf (stdout, "gnu_eprintf " format, ## args)
#define isoc99_eprintf(format, ...) \
fprintf (stdout, "isoc99_eprintf " format, __VA_ARGS__)
#define extended_isoc99_eprintf(format, ...) \
fprintf (stdout, "extended_isoc99_eprintf " format, \
#define errprintf(args...) \
gnu_eprintf ("errprintf " "%s\n", ## args) int main(int argc, char *argv[]) {
/* Try 1, 2, 3 arguments */
gnu_eprintf ("One argument: %s. Done.\n", __FILE__);
gnu_eprintf ("Two arguments: %s:%d. Done.\n", __FILE__, \
__LINE__);
isoc99_eprintf ("One argument: %s. Done.\n", __FILE__);
isoc99_eprintf ("Two arguments: %s:%d. Done.\n", __FILE__, \
__LINE__);
extended_isoc99_eprintf ("One argument: %s. Done.\n", __FILE__);
extended_isoc99_eprintf ("Two arguments: %s:%d. Done.\n", \
extended_isoc99_eprintf ("Three arguments: %s:%s:%d. Done.\n", \
__FUNCTION__, __FILE__, __LINE__);
/* The case with no arguments ... */
gnu_eprintf ("No arguments. Done.\n");
/* The line below causes a syntax error */
isoc99_eprintf ("No arguments. Done.\n");
extended_isoc99_eprintf ("No arguments. Done.\n");
/* Cascade of macros with variable number of arguments */
errprintf (__FILE__);
}
return 0 ;
96/166 8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
GNU cpp permits string literals to cross multiple lines without escaping the embedded newlines. Each embedded newline is replaced with a single newline character in the resulting string literal, regardless of what form the newline took originally.
The macro definition:
#define MESSAGE \
"Hello, good brave new World!
" would be written under ISO:
#define MESSAGE \
"Hello,\n" \
"good brave new World!\n"
In ISO C99, arrays that are not lvalues still decay to pointers, and may be subscripted.
However, they may not be modified or used after the next sequence point and the unary
operator “&” may not be applied to them. See
.
Figure 20. Non-lvalue arrays example
struct foo {int a[4];}; struct foo f() {
static const struct foo f = { 2, 4, 8, 16 };
return f ;
} void bar (void)
{
int i;
for (i=0; i<4; i++)
}
printf ("f().a[%d] == %d\n", i, f().a[i]) ; int main(int argc, char *argv[])
{
bar ();
f().a[0] = 15;
bar ();
}
return 0 ;
8027948 Rev 15 97/166
GNU C extensions supported by stxp70cc UM1237
5.1.16 Arithmetic on void and function pointers
In GNU C, addition and subtraction are supported by pointers to void and by pointers to functions. The size used for a void or for a function is 1. This means that although sizeof is allowed for void and for a function, it always returns 1. See
.
Figure 21. Arithmetic on void and function pointers example
void f0(void) {} void *p = 0; void (*pf)(void) = 0; bar (void) {
}
p++;
pf++;
printf ("sizeof(void) = %d\n", sizeof(void));
printf ("sizeof(func) = %d\n", sizeof(f0));
As in standard C++ and ISO C99, the elements of an aggregate initializer for an automatic variable are not required to be constant expressions. For example: int foo (int f, int g)
{
}
int beat_freqs[2] = { f-g, f+g };
return beat_freqs[0] * beat_freqs[1] ;
Compound literals used to be called “Constructor Expressions” before ISO C99 normalized them under the term “Compound Literals”. A compound literal looks like a cast containing an initializer. See
Figure 22. Compound literal example
#include <stdio.h>
#include <malloc.h> struct foo {int a; char b[2];} ; struct foo * givefoo(int x, int y, char a, char b) {
struct foo * sfoo = (struct foo *) malloc(sizeof (struct foo));
/* Fill in the anonymous struct at once with a Compound Literal */
*sfoo = (struct foo) {x + y, a, b};
}
return sfoo;
GNU C allows initialization of objects with static storage duration by compound literals, whereas ISO C99 does not.
98/166 8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
This extension was called “GNU Style Labeled Elements in Initializers”. It is now an ISO C99 feature. It allows the initialization of particular elements of an aggregate, a structure or an array, by specifying the member name or the indices of the elements to initialize, in any
Figure 23. Designated initializers example
const int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 }; int a[6] = { [4] 29, [2] = 15 } ; enum { v1 = 1, v2 = 2 , v4 = 4 } ; int b[6] = { [1] = v1, v2, [4] = v4 } ; struct point { int x, y; }; struct point makep(int xvalue, int yvalue )
{
struct point p = { y: yvalue, x: xvalue };
return p ;
} struct point makepp(int xvalue, int yvalue )
{
}
struct point p = { .y = yvalue, .x = xvalue };
return p ;
With GNU C the = character can be omitted after the [index] indication.
Case ranges may be specified with integer value intervals in switch statements.
const char * which (int v) {
switch (v) {
case 0 ... 31: return "Control";
case 'A' ... 'Z': return "Upper";
case 'a' ... 'z': return "Lower";
default: return "None";
}
}
5.1.21 Cast to a union type
A cast to union type is similar to other casts, except that the type specified is a union type.
The type is specified either with the union tag or with a typedef name.
union foo { int i; double d; } u, v; makefoo (int i, double f) {
}
u = (union foo) i;
v = (union foo) f;
8027948 Rev 15 99/166
GNU C extensions supported by stxp70cc UM1237
5.1.22 Dollar signs in identifier names
Dollar signs are allowed in identifier names.
int $a;
5.1.23 Prototypes and old-style function definitions
GNU C extends ISO C to allow a function prototype to override a later old-style nonprototype definition.
int isroot (uid_t); int isroot (x) /* ??? lossage here ??? */
uid_t x;
{
return x == 0;
}
// C++ comment
C++ comments are not recognized by the stxp70cc option -ansi. This is to avoid problems with constructs that contain the forward slash character “//”. For example: x = a //**/b;
5.1.25 Character ESC in constants
The sequence “\e” is recognized in string or character constants as an ASCII <escape> character. char escape = '\e'; char s[] = "\e\e";
5.1.26 Inquiring on alignment of types or variables
__alignof__
allows enquiries about how an object is aligned, or the minimum alignment required by a type or variable.
struct foo { int x; char y; } f; int x = __alignof__ (double); int b = __alignof__ (f.y);
Warning: The STxP70 ABI states that the stack is aligned to a 64 bit boundary. However, for wider extension data types, it is necessary to increase this value. A dedicated attribute aligned_stack is defined for this purpose.
100/166 8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
An enum type can be defined without specifying its possible values. typedef enum _e e; struct _s {
e* p;
} s; enum _e { red, green, blue, black }; e x;
5.1.28 Function names as strings
GNU cc predefines two magic identifiers to hold the name of the current function. The identifier __FUNCTION__ holds the name of the function as it appears in the source. The identifier __PRETTY_FUNCTION__ holds the name of the function printed in a language specific fashion.
char here[] = "Function " __FUNCTION__ " in file " __FILE__;
5.2 Attributes
Attributes are generally a much better design than a #pragma directive for several reasons.
Firstly, an attribute specification is a piece of C language that can be generated by use of a
cpp macro definition, whereas a #pragma directive generation is generally not supported by non-GNU C preprocessors. Secondly, it avoids the scoping issues of the #pragma directive.
Several attributes can be applied to the same object by using a comma to separate them.
For example, to declare a symbol that is both weak and aliased: void useful (void) __attribute__ ((weak, alias("useful_func")));
5.2.1 Placement and layout section
When applied to a function, places the function in a user-defined section.
void myfunc (void) __attribute__ ((section(".mytext"))); void myfunc (void) {
printf ("From myfunc in .mytext section.\n");
}
When applied to a data object, places the data in a user-defined section.
struct duart a __attribute__ ((section ("DUART_A"))) = { 0 };
Support must be explicitly added in the startup file or system loader to load the newly created section.
8027948 Rev 15 101/166
GNU C extensions supported by stxp70cc UM1237
memory
•
•
•
The STxP70 processor provides several special memory spaces that allow less costly accesses.
Tiny Data Area (TDA)
Data in the TDA is accessed using a single instruction of the form baseaddress+offset
, where offset is expressed in elements. The TDA is based at address 0 (which is byte 4 as accessing address 0 is not possible in C). Due to the way it is accessed, only 32 Kbytes can be placed in the TDA.
Small Data Area (SDA)
Data in the SDA is accessed using a single instruction of the form baseaddress+offset
, where offset is expressed in elements. An element can be a byte, 16-bit word, or 32-bit word depending on the type of the data object. An aggregate of 4,096 elements can be placed in SDA. This can be a mixture of scalars, arrays, and structures of various sizes and with element sizes of byte, 16-bit word, or
32-bit word, but the aggregate number of elements over all entries can not exceed
4,096.
Data Area (DA)
The addresses of data in the DA are build using a single instruction of the form addugp Ri, offset
, where offset is expressed in bytes. An aggregate of 32,768 bytes can be placed in the DA. This can be a mixture of scalars, arrays, and structures of various sizes and with element sizes of byte, 16-bit word or 32-bit word.
Three attributes are defined to instruct the compiler to place a variable in these spaces: int __attribute__ ((memory ("tda"))) x; // x is placed in TDA int __attribute__ ((memory ("sda"))) y; // y is placed in SDA int __attribute__ ((memory ("da"))) z; // z is placed in DA
aligned
When applied to a variable or a structure field, specifies a minimum alignment for a variable or structure field, measured in bytes. The aligned attribute can only increase the alignment; it can be decreased by specifying packed as well.
int x __attribute__ ((aligned (16))) = 0; struct _s { int x[2] __attribute ((aligned (8))); }; short array [3] __attribute ((aligned));
When applied to a type: typedef int more_aligned_int __attribute__ ((aligned(8)));
Warning: It is also possible to make use of a specific syntax for aligned data types. This based on the addition of the _aligned suffix to the type name. This syntax can be applied to any data type, but is especially recommended on SIMD audio extension (see also
Aligned data types on page 118
102/166 8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
aligned_stack
When applied to a function, this attribute specifies that the head of the stack must be aligned to a given boundary. The value provided as an argument corresponds to the number of bytes to which the stack must be aligned. The argument must be a power of 2, strictly greater than 8 and lower than or equal to 256.
For instance the attribute below specifies that the stack of function fct() must be aligned to a 128-bit boundary: void fct() __attribute__ ((aligned_stack(16))); void fct()
{
...
}
Warning: Several means are provided to control the alignment of the stack. It is recommended to refer to
Table 6: Generic options with -M flag on page 18
for the description of the related option and precedence rules. Please note that the compiler is also able to perform self-alignment of the stack on many occasions, taking the size of local variables into account.
weak
When applied to a function, causes the function to be emitted as a weak symbol. Set to 0 if the symbol is not defined at link time. This is primarily of use in defining library functions that can be overridden in user code: void d_stub (void) __attribute__ ((weak)); if (d_stub) {
}
d_stub();
When applied to data, causes the declaration to be emitted as a weak symbol rather than a global symbol. This is primarily of use in defining variables that can be overridden in user code: int debug __attribute__ ((weak)) = 0;
alias
Applies only to functions: The required functionality is to provide an alias name for a given function. It is often used in conjunction with the weak requirement to define an alternate weak name for a given function.
void useful_func (void) {
/* ... Do something ... */
} void useful (void) __attribute__ ((alias("useful_func")));
8027948 Rev 15 103/166
GNU C extensions supported by stxp70cc
Note:
UM1237
packed
Applies only to data: Specifies that a variable or structure should have the smallest possible alignment - one byte for a variable, and one bit for a field, unless a larger value with the aligned
attribute is specified.
The specified data alignment is applied during data layout, and the code generator emits safe sequence of instructions to avoid causing a misalign trap.
struct foo { char a; int x __attribute__ ((packed)); };
used
•
•
The GCC manual specifies that the used attribute may only apply to functions. For
stxp70cc it may also apply to variables.
The used attribute, attached to a function, means that the code must be emitted for this function, even if this function appears never to be referenced.
This attribute, attached to a variable, means that the definition must be emitted for the variable even if it appears that the variable is not referenced.
The used attribute follows the same syntax as any GCC attribute.
For a procedure: static int Foo() __attribute__ ((used)) ;
For uninitialized data: static foo __attribute__((used)) ;
For initialized data: static foo __attribute__((used)) = 2 ;
The assembly has been specifically extended to support this attribute:
.type Foo, @function, used
.type foo, @object, used
A motivation for using this attribute is to avoid the deletion of an unreferenced symbol by the dead code, dead data or IPA optimization. This can be useful for debugging purposes (for instance, a function dumping a specific data structure that is only called interactively from debugging sessions is removed if not marked as ‘used’, since the compiler does not find any reference to it).
constructor and destructor
Applies only to functions: The constructor attribute causes the function to be called automatically before execution enters main(). Similarly, the destructor attribute causes the function to be called automatically after exit().
void initdata (void) __attribute__ ((constructor)); void terminatedata (void) __attribute__ ((destructor));
104/166 8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
5.2.2 Optimization
This section only applies to functions.
noreturn
Enables a function to be declared that cannot return, such as abort or exit. It is a useful indication to optimizers.
void byebye () __attribute__ ((noreturn));
malloc
Used to tell the compiler that a function returns a pointer that cannot alias anything. It is a useful indication to optimizers.
void * get_block (int) __attribute__ ((malloc));
Note:
The visibility attributes are supported as follows:
__attribute__((__visibility__("visibility-type")))
__attribute__((visibility("visibility-type"))) where visibility-type can be default, hidden, protected, internal. default
Default visibility is the normal case for ELF. This value is available for the visibility attribute to override other options that may change the assumed visibility of symbols.
hidden protected internal
Hidden visibility indicates that the symbol is not placed into the dynamic symbol table. This means that no other module (executable or shared library) can reference it directly.
Protected visibility indicates that the symbol is placed in the dynamic symbol table, but that references within the defining module bind the local symbol. This means that the symbol cannot be overridden by another module.
Internal visibility is similar to hidden visibility, but has additional processor-specific semantics. For the STxP70, this means that the function is never called from another module.
Hidden symbols cannot be referenced directly by other modules but they can be referenced indirectly by function pointers. By indicating that a symbol cannot be called from outside the module, the compiler may for instance omit the load of a PIC register since it is known that the calling function has already defined the correct value.
8027948 Rev 15 105/166
GNU C extensions supported by stxp70cc UM1237
interrupt and interrupt_nostkaln
•
•
•
The interrupt attribute specifies that a function is an interrupt routine. This imposes: a save/restore of all registers at entry/exit of the function an rte instruction is used to return from the routine (instead of an rts) a proper stack alignment at entry/exit of the routine
The interrupt_nostkaln attribute has the same effect, except that it does not perform any stack realignment.
void __attribute__ ((interrupt)) it_routine_1(...)
{
}
...
format_arg
The format_arg attribute specifies that a function takes a format string for a printf, scanf
, strftime or strfmon style function and modifies it, so that the result can be passed to a printf, scanf, strftime or strfmon style function.
extern char * my_dgettextprintf (void *my_domaint,
const char *my_format) __attribute__ ((format_arg(2)));
mode
This attribute specifies the data type for the declaration whichever type corresponds to the mode. Refer to the GNU Compiler Collection Internals document for the definitions of modes, http://gcc.gnu.org/onlinedocs/gccint .
Use the keywords __byte__, __word__ and __pointer__ to indicate the mode corresponding to these quantities.
unsigned int qi __attribute__ ((mode (QI))); unsigned int w __attribute__ ((mode (__word__)));
106/166 8027948 Rev 15
UM1237 GNU C extensions supported by stxp70cc
5.2.5 Built-ins
A built-in is used in the same way a function call, but is expanded by the compiler very early in the intermediate representation, instead of doing a function call. On STxP70, most machine and extension instructions can also be addressed using built-ins. Please refer to
Chapter 7: Built-in functions on page 115
for further information.
__builtin_constant_p
This built-in tests if a value is a constant at compile time.
int x;
#define C 1 int main () {
if (__builtin_constant_p (C) == 1)
printf ("c is proved to be a constant\n");
if (__builtin_constant_p (x) == 0)
}
printf ("x is a not proved to be a constant\n");
return 0;
__builtin_return_address
__builtin_return_address
gets the return address of the currently executing function. void bar () {
printf ("RA = 0x%08x\n", (int)__builtin_return_address (0));
}
__builtin_expect
long __builtin_expect (long exp, long c)
__builtin_expect
provides the compiler with branch prediction information.
The return value is the value of exp, which should be an integral expression. The value of c must be a compile-time constant. The semantics of the built-in are that it is expected that exp == c
.
For example: if (__builtin_expect (exp, 0)) indicates that a call to foo() is not expected as exp should be 0.
__builtin_classify_type
__builtin_classify_type(object)
ignores the value of the object and considers
only its data type. It returns an enum describing what kind of type object is. See
8027948 Rev 15 107/166
GNU C extensions supported by stxp70cc UM1237
Figure 24. __builtin_classify_type example
enum type_class __builtin_classify_type(object) enum type_class
{
no_type_class = -1,
void_type_class, integer_type_class, char_type_class,
enumeral_type_class, boolean_type_class,
pointer_type_class, reference_type_class, offset_type_class,
real_type_class, complex_type_class,
function_type_class, method_type_class,
record_type_class, union_type_class,
array_type_class, string_type_class, set_type_class,
file_type_class, lang_type_class
};
108/166 8027948 Rev 15
UM1237 GNU ASM
The stxp70cc compiler accepts “extended inline assembly” asm, as part of C programs.
This chapter only summarizes the main features of the asm implementation and describes its limitations. It is not a substitute for the GNU documentation.
6.1 Syntax
Note:
General syntax
asm(template : output operands : input operands : clobber list); or
__asm__(template : output operands : input operands : clobber list);
•
•
•
•
Where: template
is the assembler instruction, defined as a string constant output operands
is a list of comma separated output operands input operands
is a list of comma separated input operands clobber list
is a list of comma separated clobbered operands
The template section contains plain assembler, and uses ordinary STxP70 assembler syntax, with the notable exception of the %i (i is a positive integer) notation that refers to the ith output or input operand.
Multiple consecutive strings are automatically concatenated to enable a readable and correct template input. Multiple assembler instructions can be put together in a single asm template, separated by explicit newline characters ‘\n’.
If there are no output operands but there are input operands, two consecutive colons must be used in place of the output operands.
•
•
In the output and input list: each operand is described by an “operand constraint string” followed by a C expression in parentheses the available constraints are the following:
– r
general purpose register operand
– b
boolean register operand
– i
immediate integer operand, including symbolic constants only known at assembly time
– n
immediate integer operand, known at compile time
– g
guard register
– fpx_FX
FPx register (STxP70-4 only)
– the type attached to a scalar or SIMD audio extension (for instance, MP2x_VP or
MP2x_VX
)
8027948 Rev 15 109/166
GNU ASM UM1237
• an operand constraint can be prefixed by the following modifiers:
–
=
write-only operand, used for output operands
–
&
early clobber operand, does not prevent the use of =
– + operand is used for both input and output
•
•
•
•
In the clobber list: general registers are referred to by ri (where i has the range [0,31]), they map to the corresponding Ri hardware registers [0,31]
(c)
FPx extension registers are referred to by fi (where i has range [0,15]), they map to the corresponding Fi hardware registers guard registers are referred to by gi (where i has the range [0,7]), they map to the corresponding Gi hardware guard registers scalar or SIMD audio extension registers are referenced by a name determined by the extension and level
Syntax of scalar/SIMD audio extension register lists
The STxP70 core accepts scalar and SIMD audio extensions with multi-level register files.
The syntax has been extended to support such extension registers.
For non-SIMD registers (that is, registers with level “X” only), a register name is constructed using the following template:
<registerfile_name><register_id>
•
•
Where:
<registerfile_name>
is the name of the extension register file
<register_id>
is the number of the register
For example, when considering a register file T with a single level hierarchy, the registers are referenced as "T0", "T1", "T2" and so forth.
For SIMD register files, register names are constructed according to the following template:
<regfile_name><reg_id_max_level>_<regfile_min_level><reg_subid_min_
level>
•
•
•
•
Where:
<regfile_name>
is the name of the scalar or SIMD audio extension register file
<reg_id_max_level>
is the number of the register at the highest hierarchy level
(level “X”)
<regfile_min_level>
is a letter specifying the smallest level accessible for the register file:
– "X" for a single level register file
– "P" for a 2-level register file
– "Q" for a 4-level register file
<reg_subid_min_level>
is the offset of the register at the smallest hierarchy level.
110/166 c. If the configuration only includes 1 bank (16 registers), then the range is only [0,15]
8027948 Rev 15
UM1237
Note:
GNU ASM
For example, when considering the register file V of the MP2x extension, with a two level hierarchy, registers are referenced as "MP2x_V0_P0", "MP2x_V0_P1",
"MP2x_V1_P0", "MP2x_V1_P1"
and so forth.
Registers are always specified at the smallest hierarchy level. Therefore, to disable the full
V0 register, both subparts "V0_P0" and "V0_P1" must be specified in the clobber list.
Register file disambiguation
Due to the limited length of register file names, different register files may have similar names. To distinguish between the different register files, the register file name can be prefixed by an optional string, if necessary. The prefix has the following syntax:
%<registerfile_name><registerfile_smallest_level>%
•
•
Where:
<registerfile_name>
is the name of the scalar or SIMD audio extension register file
<registerfile_smallest_level>
is a letter specifying the smallest level accessible for the register file:
– “X” for a single level register file
– “P” for a 2-level register file
– “Q” for a 4-level register file.
6.2 Assumptions
•
•
The following assumptions apply.
Output operand expressions must be lvalues.
The compiler assumes that the input is consumed before the outputs are produced, unless an output operand has the ‘&’ constraint modifier (also called “early clobber”).
The compiler does not assign the same register to an input operand and an early clobber operand. However, the compiler may assign the same register to an input operand and to a non-early clobber output operand.
6.3 Volatile
The volatile syntax is either: asm volatile (template : output operands : input operands : clobber list); or:
__asm__ volatile (template : output operands : input operands : clobber list);
The volatile keyword indicates that an instruction has side effects. A volatile statement is not deleted if it is reachable. The order of volatile asm statements and, or other volatile accesses is preserved. A consecutive sequence of volatile asm statements may not stay perfectly consecutive, since some other instructions may be scheduled in between. To achieve the effect of keeping instructions perfectly consecutive, use a single asm instruction.
An asm statement without any operand or clobbers will be treated identically to a volatile asm
statement, the same as for an asm statement without an output operand.
8027948 Rev 15 111/166
GNU ASM UM1237
6.4 Restrictions
•
•
•
•
The following restrictions apply.
The compiler does not parse the assembler instruction template; this means that it does not check if it is valid assembler input.
Up to 10 operands, results and clobbered registers are allowed.
Multiple alternative constraints are not supported.
At -O3 and -O4 optimization levels, the loop nest optimizer is disabled for loops containing asm statements.
6.5
6.6
Differences between the STxP70 core versions
•
•
•
The VLIW/VLIS STxP70-4 is designed to be assembly compatible with STxP70-3, except for a few instructions. This means that assembly statements written for STxP70-3 should work on STxP70-4. The main exceptions will be related to: the MAKE and MORE instructions, which should be replaced by a unique MAKE32 one on
STxP70-4 the SIMD comparisons, which are no longer supported on STXP70-4 the “;;” pattern to be used to separate bundles of instructions. For compatibility reasons, this pattern becomes mandatory on both STxP70-3 and STxP70-4. Code without “;;” is still accepted on STxP70-3, but this deprecated syntax is strongly discouraged.
GNU ASM optimization
The compiler unrolls loops containing GNU asm statements. The compiler is not aware of the resource requirements introduced by the opaque asm statement, therefore the unrolling decision may be less precise compared with other situations.
It is possible to prevent the compiler from unrolling by using either an option or a #pragma.
If the asm statement contains any control-flow, it must be contained completely within the asm
statement.
See
Section 3.2.1: #pragma unroll (n) on page 45
for information on #pragma unroll.
112/166 8027948 Rev 15
UM1237 GNU ASM
6.7 Example
illustrates a typical use of asm statement on STxP70 core.
Figure 25. Example of an asm statement
unsigned int foo(unsigned int * ptr)
{
unsigned int res;
unsigned int count;
unsigned int val;
asm (
" setls L1, L_2 ;;\n\t"
" setlc L1, 8 ;;\n\t"
" setle L1, L_3+-4 ;;\n\t"
" make %0, 0 ;;\n\t"
" make %1, 0 ;;\n\t"
"L_2:\n\t"
" lw %2, @(%3 !+ 4) ;;\n\t"
" cmpneu g0, %2, 0 ;;\n\t"
"g0? bset %0, %0, %1 ;;\n\t"
" add %1, %1, 1 ;;\n\t"
: "=&r" (res), "=&r" (count), "=&r" (val), "+r" (ptr)
:
); return res;
}
The example in
delivers the assembly code given in
.
Figure 26. Example output of an asm statement
.entry
.global foo foo:
L_BB1_foo:
or R4, R0, 0 ;;
setls L1, L_2 ;;
setlc L1, 8 ;;
setle L1, L_3+-4 ;;
make R1, 0 ;;
make R2, 0 ;;
L_2:
lw R3, @(R4 !+ 4) ;;
cmpneu g0, R3, 0 ;;
g0? bset R1, R1, R2 ;;
add R2, R2, 1 ;;
L_3:
L_BB2_foo:
or R0, R1, 0 ;; rts
8027948 Rev 15 113/166
GNU ASM
6.8
UM1237
Parsing and optimization of GNU assembly statement
The STxP70 compiler is capable of parsing, analyzing and optimizing the content of the
GNU assembly statements. The main optimizations it can achieve are those carried out at the lowest level of the compiler, for example scheduling, removal of useless instructions, constant propagation.
By default, the compiler does not perform any parsing and optimization of user defined assembly statements. This parsing and optimization feature can be enabled with the option
-mparse-asmstmts
.
Some GNU assembly statements are used internally by the compiler to map extension instructions from C code. By default, those specific internal assembly statements are parsed and optimized by the compiler. This parsing and optimization feature can be disabled with the option -mparse-meta-asmstmts.
114/166 8027948 Rev 15
UM1237 Built-in functions
7.1
The stxp70cc compiler recognizes a number of built-ins. These are used to generate assembly language statements that cannot otherwise be expressed through standard ANSI
C/C++.
The built-ins are specified and called just like standard ANSI C/C++ functions and procedures, using standard types. However, they are treated in a special way by the compiler. The built-ins apply to the STxP70 core instructions, X3 instructions, floating point
FPx extension instructions, as well as scalar and SIMD audio extension (MPx) instructions.
On the core, FPx and MPx extension, built-ins may be needed to make use of instructions that the compiler cannot capture automatically, or to work around a missing optimization.
For technical reasons the set of core/X3 built-ins does not currently cover the full set of instructions. For instance, the load/store instructions are not available as built-ins. This also includes specific load/store instructions such as the lsetub instruction. Instructions that do not exist as built-ins can still be mapped by using the GNU assembly statements, see
Chapter 6: GNU ASM on page 109
.
Header files and C-models files
•
•
•
Several header and source files are provided to use built-ins for the core and for the X3, FPx and MPx extensions.
A header file named builtins_<extension>.h contains the definitions of the built-
ins themselves, as described in
Section 7.2: Naming built-ins on page 116
.
A header and a source file named builtins_model_<extension>.h and builtins_model_<extension>.c
respectively. These files contain the declaration and the definition of the STxP70 built-ins, modelled as C functions, and acting as executable specifications. This has the benefit that models can be used to develop specialized algorithms (DSP, video, and so on) on a workstation, and these can be immediately and safely ported to the STxP70 core and extensions.
Finally, a generic header file named <extension>.h facilitates the use of built-ins or
Section 7.3: Using built-ins from C on page 120
. It includes the two headers mentioned above, plus the definition of some macros providing a unified view of built-ins and C-models. Only the generic header file for a given extension needs to be included in the application source code (see
).
•
•
•
•
The <extension> suffix is one of: sx
for STxP70 core x3
for STxP70 X3 extension fpx
for FPx floating point and integer arithmetics extension the alias of the audio scalar or SIMD extension, for instance MP2x
The header and source files mentioned above are delivered with the current compiler distribution (except for the audio scalar and SIMD extensions).
8027948 Rev 15 115/166
Built-in functions UM1237
Note:
•
•
•
The STxP70 built-ins make use of a flexible common naming scheme. The names of intrinsic built-ins and the corresponding C-models are complementary, and either are invoked (depending upon context) by using a dedicated simplified macro.
The basic built-ins defined in file builtins_<extension>.h all have names in the form:
__builtin_<extension>_<mnemonic>[_<operand_type>]
.
Similarly, the names of the C-models found in the files builtins_model_<extension>.[c|h]
are:
__cmodel_<extension>_<mnemonic>[_<operand_type>]
.
Finally, the generic macro defined in file <extension>.h gives a unified view of builtins and C-models. Its simplified name is built as:
<extension>_<mnemonic>[_<operand_type>]
.
•
•
•
•
<extension>
is the alias of the core or the extension and is one of the following: sx
for core x3
for X3 extension fpx
for floating point extension the alias of a SIMD extension, for instance MP1x
<mnemonic>
is the actual mnemonic of the instruction as it appears in the instruction set of either the core or the extension.
•
•
•
•
<operand_type>
is optional. It appears only in builtin-ins for the core, and for X3 and FPx extensions. It is necessary when the given instruction may accept different types of operands; for instance, either a register or a literal. In such cases, this part of the name denotes the type of the operand, and may be one of the following: this element is absent if the instruction exists with only one type of operand r
denotes an operand in a general purpose register iN
denotes a literal operand of size N bits g
denotes the instruction is guarded (used in X3 built-in names)
The operand types may appear in the name of the built-in in an order that differs from the order of the corresponding operands in the assembly instruction. For instance, writing the following built-in:
x3_cancelg_i8_i2_g(0x1, 0x5);
leads to the emission of the following assembly code:
cancelg b1, 5 ;;
The header files are located in the directory
<toolsdir>/stxp70cc/4.1/include/models
. This directory is pointed to by default when the code is compiled using stxp70cc. The <toolsdir> denotes the root folder of the toolset.
The C-models source files are located in the directory
<toolsdir>/stxp70cc/4.1/src/models
.
116/166 8027948 Rev 15
UM1237
Note:
7.2.2
Note:
Built-in functions
Example:
•
•
The core instruction addbp exists with a second operand that is either a register or a literal.
The corresponding built-ins are named as follows: int __builtin_sx_addbp_r(int, unsigned short)
for register operand int __builtin_sx_addbp_i8(int, unsigned short)
for u8 operand
•
•
The C-models have similar names: int __cmodel_sx_addbp_r(int, unsigned short)
for register operand int __cmodel_sx_addbp_i8(int, unsigned short)
for u8 operand
•
•
Finally, the unified macros for these built-ins and C-models are: sx_addbp_r
when used for a register operand sx_addbp_i8
when used for an u8 operand
The presence of the two leading underscores on each name denotes (according to the
ISO/IEC 9899 C Standard) that no such name should be defined by the user. More specifically:
“All identifiers that begin with an underscore and either an upper case letter or another underscore are always reserved for any use.”
Types and special built-ins for audio scalar/SIMD extensions
The built-ins for audio scalar or SIMD extensions may require data types that cannot be mapped to C native types. Vector operations may also be present on those extensions. This means that the naming scheme is slightly different from the scheme used for the core or on the other extensions.
The naming convention for data type names reflects this scheme. The naming convention uses an alias for the MPx that is dedicated to audio applications, which is currently either a scalar (MP1x) or an SIMD (MP2x) extension.
The instructions for these extensions are not currently mapped automatically by the compiler. They can only be invoked by using built-ins.
Data types
•
•
•
Scalar and SIMD audio extensions include two register banks at most. Each bank may have up to three consecutive “levels”, numbered from 0 to 2: level 0 corresponds to the full width of the register bank level 1 corresponds to the two halves of the register level 2 corresponds to the four quarters of the register
Furthermore, the register width is 2 n
bits, ranging from 8 bits to 512 bits inclusive.
The names of the data types that can be allocated to such banks take this structure into account. They are built using the following template:
<extension>_<registerfile_name><register_level>
8027948 Rev 15 117/166
Built-in functions UM1237
•
•
•
Where:
<extension>
is the alias of the SIMD extension
<registerfile_name>
is the name of the SIMD extension register file
<register_level>
is a letter denoting the type that can be allocated to this level:
–
X
stands for the full register width at level 0
–
P
stands for the sub-parts at level 1 (two halves)
–
Q
stands for the sub-parts at level 2 (four quarters). It is not instantiated on the current MPx
Aligned data types
Since the data types of those extensions are likely to be larger than the default alignment of the stack (64 bits), some variants are also provided which impose a consistent alignment.
Those aligned types have the special suffix _aligned tailed to their names.
Example:
•
•
•
•
The MP2x extension contains a register bank called V with data accesses of 128 bits or
64 bits that supports two vector data types:
MP2x_VX
is a 128-bit data type
MP2x_VP
is a 64-bit data type
MP2x_VQ
is not instantiated
MP2x_VX_aligned
is a 128-bit data type, aligned to a 128-bit boundary
Special macros
The MP1x and MP2x extensions are all provided with a set of dedicated memory access and register move instructions. The latter can be invoked using dedicated macros that allow easy accesses to the register bank of the extension.
Example:
•
•
•
In the lines below, __part__ denotes the subpart of the wider register that can be represented by either a literal or a variable. _word_i_ denotes a 32-bit word to be assigned to the subpart i of the corresponding register.
Make macro builds a constant in extension register:
–
MP2x_make_VX(_VX_, _word_3_, _word_2_, _word_1_, _word_0_);
–
MP2x_make_VP(_VP_, _word_1_, _word_0_);
–
MP2x_make_VQ -> not instantiated
Compose macro composes register subparts into a wider one:
–
MP2x_compose_2xVP(_VX_, _VP_1_, _VP_0_);
–
MP2x_compose_4xVQ -> not instantiated
–
MP2x_compose_2xVQ -> not instantiated
Split macro decomposes a register subpart into narrower ones:
–
MP2x_split_2xVP(_VX_, _VP_1_, _VP_0_);
–
MP2x_split_4xVQ -> not instantiated
–
MP2x_split_2xVQ -> not instantiated
118/166 8027948 Rev 15
UM1237 Built-in functions
•
•
Insert macro inserts a register subpart into a wider one:
–
MP2x_insert_VP_into_VX(_VP_, _VX_, _part_);
–
MP2x_insert_VQ_into_VX -> not instantiated
–
MP2x_insert_VQ_into_VP -> not instantiated
Extract macro extracts a register subpart into a wider one:
–
MP2x_extract_VP_from_VX(_VP_, _VX_, _part_);
–
MP2x_extract_VQ_from_VX -> not instantiated
–
MP2x_extract_VQ_from_VP -> not instantiated
Specialized macros
Specialized versions of the insertion and extraction macros are provided to handle cases where the subpart of the wider register can be hard coded in the built-in name itself.
•
•
In the lines below, the macros do not accept an explicit __part__ parameter. The syntax of the name implicitly corresponds to a given subpart (for instance
MP2x_insert_VP_into_VX0
takes the complete 64-bit register _VP_ and inserts it in the lowest half of the 128-bit register _VX_).
Insert macro inserts a register subpart into a wider one:
–
MP2x_insert_VP_into_VX0(_VP_, _VX_);
–
MP2x_insert_VP_into_VX1(_VP_, _VX_);
–
MP2x_insert_VQ_into_VX0-> not instantiated
–
MP2x_insert_VQ_into_VX1-> not instantiated
–
MP2x_insert_VQ_into_VX2-> not instantiated
–
MP2x_insert_VQ_into_VX3-> not instantiated
–
MP2x_insert_VQ_into_VP0-> not instantiated
–
MP2x_insert_VQ_into_VP1-> not instantiated
Extract macro extracts a register subpart from a wider one:
–
MP2x_extract_VP_from_VX0(_VP_, _VX_);
–
MP2x_extract_VP_from_VX1(_VP_, _VX_);
–
MP2x_extract_VQ_from_VX0-> not instantiated
–
MP2x_extract_VQ_from_VX1-> not instantiated
–
MP2x_extract_VQ_from_VX2-> not instantiated
–
MP2x_extract_VQ_from_VX3-> not instantiated
–
MP2x_extract_VQ_from_VP0-> not instantiated
–
MP2x_extract_VQ_from_VP1-> not instantiated
8027948 Rev 15 119/166
Built-in functions
7.3 Using built-ins from C
UM1237
This section explains the usage of the include files that are particular to built-ins and Cmodels.
All STxP70 built-ins prototypes are available in the include files presented in
To make use of the built-ins of the core, X3 extension, FPx extension or SIMD extensions in an application, the relevant header files (as listed below) must be included in the application sources.
#include <sx.h> // for the core,
#include <x3.h> // for the X3 extension,
#include <fpx.h> // for the FPx arithmetic extension,
#include <MP2x.h> // for the MP2x SIMD audio extension.
By default, the stxp70cc compiler generates machine instructions corresponding to the built-in functions found in the source code.
Example:
#include <sx.h>
... int fct(int a, int b)
{ int c; c=sx_lzc(a); // leading zero count return c;
}
The above code produces the following assembly code, where the lzc instruction of the core has been properly mapped.
.global fct fct: // 0x0
L_BB1_fct: // 0x0 lzc R0, R0 rts
In this case, it is equivalent to write the source code as:
#include <builtins_sx.h>
... int fct(int a, int b)
{ int c; c=__builtin_sx_lzc(a); return c;
}
This is because the macro sx_lzc is just mapped on the full built-in __builtin_sx_lzc by default as soon as the code is compiled for an STxP70 target.
120/166 8027948 Rev 15
UM1237
7.3.2
7.3.3
Built-in functions
Standard use of built-in C-models
•
•
By default, the C-model files are designed to permit the use of the C-model on any host machine except the STxP70. There is no need to modify the source code. However, it is necessary to: add the path of the inc directory of the compiler in the toolset installation to the list of include paths add the file containing the source of the C-models to the list of source files to be compiled
Example:
Assuming that the toolset is installed in a directory named /home/myfolder, and a small file containing calls to core built-ins is to be compiled with C-models, using a GCC compiler, then the command line should contain the -I directive and the following source file.
gcc -I<tools_dir>stxp70cc/4.1/include/models \
<tools_dir>/stxp70cc/4.1/src/models/builtins_model_sx.c ...
Use of built-in C-models on STxP70 target
In a few cases, it may be necessary to compile application code using C-models, rather than actual machine instructions, even on the STxP70 target. This may be useful, for example, for testing or debugging purposes.
This can be done either by calling the C-model explicitly, or by using the macro instead
(thereby avoiding having to make any change to the source code). In the example given in
, the following lines should appear:
#ifdef __SX__ // code is compiled for a STxP70 target
#undef __SX__ // hide the target and use the non STxP70 settings
#include <sx.h>
#define __SX__ // return to the regular settings for STxP70
... int fct(int a, int b)
{ int c; c=sx_lzc(a); // leading zero count C model is used return c;
}
8027948 Rev 15 121/166
MPx native support
8 MPx native support
8.1
Note:
UM1237
Goal of the MPx scalar support
•
•
•
The goal of the MPx native support is to generate MPx code automatically from standard C code. The compiler: detects variables that can beneficially be allocated to the MPx register file inserts required type conversions in the internal representation (also called “alien type conversion”) detects some patterns of instructions that can beneficially be replaced by MPx integer or fractional instructions.
Legacy source code that already contains variables explicitly allocated to MPx register file and calls to MPx built-ins are not affected by these changes. It is compiled as before and the generated assembly remains the same.
The SIMD variants of the MPx benefit from the same level of (scalar) support as the scalar variant. This means that the SIMD aspects of those variants are not dealt with by the compiler.
•
•
These new features allow the porting of applications to the MPx with less effort than previously, because: the extension type is no longer required, except in specific cases the use of intrinsics is more limited
In addition to pure audio applications, long long arithmetic also benefits from this support.
This chapter describes the scope of the MPx support, and explains how it can be used.
Examples are provided to help with comprehension.
8.2 Control of the MPx native support
•
•
•
By default, native support of the MPx is enabled in the compiler when: the code is compiled for the MPx when the option -Mextension=MP1x is set optimization level is equal to either -O2, -O3, -O4 or -Os the mapping of fractional instructions is enabled using the option
-Mextoption=MP1x:enablefractgen
(formerly called -Menablefractgen or
-Mfractsupport
Section 8.3.4: Pattern recognition for integer and fractional data types on page 125
).
It is possible to disable this native support by using the option: -Mnoextgen
122/166 8027948 Rev 15
UM1237 MPx native support
8.3
Note:
8.3.1
Pragmas are provided to provide fine-grain control of MPx support. They allow the developer to enable or disable MPx support in a given set of functions, declared as arguments to the pragmas, overriding the option passed to the compiler.
•
•
The syntax is as follows:
#pragma disable_extgen (foo1, foo2)
disables MPx scalar support in functions foo1 and foo2 in the file where it is placed, even if option
-Mextension=MP1x
is set and optimization level is higher than -O1.
#pragma force_extgen (foo1, foo2)
forces MPx scalar support in functions foo1
and foo2 in the file where it is placed, even if option -Mnoextgen is set.
Those file scope pragmas must be placed at the beginning of a file. They affect all variants of the MPx (that is, both the scalar and SIMD variants).
•
•
A more focused version is also provided:
#pragma disable_specific_extgen (extname, foo1, foo2)
disables scalar support on specified extension in functions foo1 and foo2 in the file where it is placed, even if option -Mextension=MP1x is set and optimization level is higher than
-O1
#pragma force_specific_extgen (extname, foo1, foo2)
forces scalar support on specified extensions in functions foo1 and foo2 in the file where it is placed, even if option -Mnoextgen is set
Those file scope pragmas must be placed at the beginning of a file. They affect all variants of the MPx (that is, both the scalar and SIMD variants).
Scope of the MPx native support
•
•
•
This section presents an overview of the features available in MPx native support. It consists of three main levels: built-in based support (already present in toolset 3.2.0) support of type equivalence between long long integer and MPx data types (new in toolset 3.3.0) automatic MPx code generation on MPx instructions and long long integer arithmetic (new in toolset 3.3.0)
Besides the overview presented in this section, the latter two levels are documented in detail in sections
Chapter 8.5: Automatic code generation
The native support now includes a limited pattern recognition facility, which can detect more complex patterns like mac for both integer and fractional data types.
Built-in based support with MPx_Vx type
This feature has been available since toolset release 3.2.0. With this level of support, the developer explicitly uses MPx built-ins and MPx types to write an application for the MPx, as in the following example C code:
MPx_Vx a, b, c;
MPx_ADDD(c, a, b);
8027948 Rev 15 123/166
MPx native support
8.3.2
UM1237
This code places three 64-bit variables, a, b and c, in the MPx_Vx register set. It uses the
MPx addition instruction to add a and b, storing the result in c. Since it uses built-ins and specific data types, this code is neither generic nor portable to another processor.
Support of type equivalence between long long and MPx_Vx
The MPx_Vx type matches the MPx registers, and is therefore semantically equivalent to the long long native type of the C language. In order to limit the work needed to port applications to the MPx, the compiler handles the semantic equivalence between MPx_Vx and long long. This means that the user can declare variables as long long type instead of MPx_Vx. The compiler is responsible for placing them in the MPx registers, if there is a benefit to be gained.
With this support, the C code in the example above can be simplified as follows: long long a, b, c;
MPx_ADDD(c, a, b);
This C code is more portable, as it does not involve any specific type. Only the intrinsic
(MPx_ADDD) is still specific. The code generated by the compiler is the same as the code generated with MPx types.
Warning: The heuristics currently used to place variables into MPx registers are based on a quite systematic behavior: as soon as a variable appears as a MPx_Vx parameter in a MPx built-in, then it is placed in a MPx register. The explicit use of
MPx_Vx
type in new code should be avoided and the long long
data type used instead. More details can be found in
Section 8.6: Important remarks and known limitations on page 129
8.3.3 Automatic MPx code generation on long long arithmetic
The MPx instruction set includes long long integer arithmetic instructions (add, sub, shift, and so forth). In previous versions of the toolset, it was necessary to use built-in functions to map those instructions. In order to limit the effort when porting applications to the MPx, the current version of the compiler automatically maps these operations to MPx instructions.
The above example (
) can now be written in standard C: long long a, b, c; c = a + b;
•
•
The compiler now ensures both: the placement of the variables a, b and c in the MPx registers the mapping of the MPx_ADDD instructions
•
•
In addition to pure arithmetic operations, the MPx also provides instructions that: clear the contents of a MPx register copy the contents of one MPx register into another
124/166 8027948 Rev 15
UM1237
8.3.4
MPx native support
The compiler also maps the following instructions when dealing with either an assignment to zero or a copy operation: long long a = 0; // mapped to a MPx register clear instruction long long b = c; // mapped to a MPx register copy instruction
Pattern recognition for integer and fractional data types
The compiler provides pattern recognition capabilities to detect a set of complex patterns and map them to their equivalent MPx instructions. These capabilities address both integer and fractional instructions.
The list of recognized instructions is provided in
.
mahll mshll shlrr2x shrr2x andcd mph mpw maw msw
Table 28.
Pattern recognition
Mnemonic mafw msfw mpfw
Equivalent source code Comment
ll1+((long long)i1*i2)<<1 ll1-((long long)i1*i2)<<1
((long long)i1*i2)<<1
Requires
-Mextoption=MP1x:enable fractgen
Requires
-Mextoption=MP1x:enable fractgen
Requires
-Mextoption=MP1x:enable fractgen
(long long)((int)ll1+(int)ss1*ss2)
32b MAC with 16b multiplicands
(long long)((int)ll1-(int)ss1*ss2)
32b MAC with 16b multiplicands
(long long)i1<<i2
(int)(ll1<<i2)
(ll1 & (!ll2)
-
-
-
(long long)ss1*ss2 i1*i2 ll1+(long long)i1*i2 ll1-(long long)i1*i2
-
-
-
32b multiplier when no X3/FPx
Note: The three first rows correspond to fractional instructions, which are subject to specific limitations (
Section 8.6.6: Limitations regarding mapping of fractional instructions on page 131
). Their mapping is therefore only performed if the dedicated flag
-Mextoption=MP1x:enablefractgen
is set.
8027948 Rev 15 125/166
MPx native support UM1237
The example code listed here summarizes the equivalences that are accepted or rejected by the compiler front-end when MPx support is enabled.
Figure 27. Summary of type equivalence with MPx support
// declaration of variables
MPx_VX gvx; // forced to MPx long long gll; // candidate to placement in MPx registers int gi; // to be placed in GPR
// Initialisation of global variables
MPx_VX gvx_2 = 1234LL; // Accepted
MPx_Vx gvx_3 = (long long) 11.3f: // Accepted
MPx_Vx gvx_array[4] = {1, 10, -1, -10} foo(long long In) {
...
// Assignments of local variable using function parameters
MPx_Vx A = In;
// Assignment of local variable using a constant*
MPx_Vx B = 12LL;
// Constant assignment of global variables gvx = 0LL; // Accepted gvx = 1234LL; gvx = 0x12LL;
// Accepted
// Accepted gvx = 0; gvx = 1234; gvx = 0x12;
// Accepted
// Accepted
// Accepted
// Variable assignment of global variables gvx = gll; // Accepted gll = gvx; // Accepted gvx = (unsigned long long)gi;// Accepted gvx = (long long)gi; gi = (int)gvx; gi = (unsigned int)gvx;
// Accepted
// Accepted
// Accepted
// Unary/binary operator (not planned to be supported, use long long var instead) gvx = gvx + gvx; // Not supported (error msg from front-end)
// Usage of long long variable in builtin calls
MPx_ADDD(gll, gll, gll); // Accepted
// Usage of long long variable in builtin calls (in/out param)
MPx_MAFW(gvx, 1, 2); // Accepted
// Usage of long long constant in builtin calls
MPx_ADDD(gll, 1234LL, 123LL); // Accepted
126/166
The result of instructions and built-ins in their functional form is always considered unsigned by convention. Though, the actual type might be signed, and not explcitly visible to the compiler. This must be taken into account expecially when writting comparisons.
For example, the following code is incorrect: if (MP1x_SUBS_f(a, b) < 0) {
8027948 Rev 15
UM1237 MPx native support
Because the MP1x_SUBS_f() result is unsigned, the comparison is considered by the compiler as always false and the corresponding block is therefore deadcoded.
The main recommendation for built-ins usage is to avoid the functional form and use only the procedural version in which the type of the result is given explicitely by the developer, for example: int res = MP1x_SUBS_f(a, b) if (res < 0) {
Alternatively, it is also possible to explicitely cast the builtin result to the proper type: if ((int)MP1x_SUBS_f(a, b) < 0) {
However, the first method described using the procedural version is the preferred method.
8.5
8.5.1
Automatic code generation
Scope and principle
•
•
Some of the instructions available on the MPx map operations from C code. This limits the need for intrinsics, and contributes to performance enhancements. Two cases are possible.
The operation derived from the C code matches one of the instructions of the MPx. For instance, this is the case with 64-bit addition, which can be mapped on the MPx ADDD instruction.
The operation derived from the C code fits a sequence of instructions which may belong to either the core or the MPx instruction set. For instance, a 64-bit “min” operation does not exist on the MPx, but it can be emulated using a sequence of instructions involving both core and MPx instructions (MPx and core comparisons).
These sequences are called “meta-instructions”.
The second case is especially useful, because it makes more extensive use of the MPx instructions with lower effort at developer level. In addition to the pure audio applications for which it is designed, MPx support can also bring significant gains in applications that handle long long
arithmetic.
8027948 Rev 15 127/166
MPx native support UM1237
8.5.3
•
•
•
•
•
•
•
•
•
•
•
•
In the current release of the compiler, the following C operations are directly mapped to individual MPx instructions:
64-bit signed and unsigned addition mapped to ADDD
64-bit signed and unsigned subtraction mapped to SUBD
64-bit left shift signed and unsigned mapped to SHLRD
64-bit arithmetic right shift signed mapped to SHRRD
64-bit arithmetic right shift unsigned mapped to SHRURD
64-bit logical right shift signed and unsigned mapped to SHRURD
64-bit negate signed and unsigned mapped to NEGD
64-bit bitwise NOT signed and unsigned mapped to NOTD
64-bit bitwise OR signed and unsigned mapped to ORD
64-bit bitwise AND signed and unsigned mapped to ANDD
64-bit bitwise exclusive OR (XOR) signed and unsigned mapped to XORD
64-bit bitwise negate OR (NOR) signed and unsigned mapped to NORD
Operations mapped to meta-instructions
•
•
•
•
•
•
•
The following operations of the C language are mapped to or emulated by meta-instructions: the ten 64-bit signed and unsigned comparisons (equal to, not equal to, greater than, less than, greater than or equal, less than or equal) the 64-bit signed and unsigned min the 64-bit signed and unsigned max the 64-bit absolute value the 64-bit signed and unsigned multiplication (takes two 64-bit operands and returns a
64-bit result) the 32-bit signed and unsigned multiplication (takes two 32-bit operands and returns a
32-bit result)
(d) the 32-bit to 64-bit conversions
The number of actual instructions present in each meta-instruction depends on the complexity of the computation: for instance, comparisons are implemented in two instructions at most, whereas the 64-bit multiplication takes about 25 instructions.
128/166 d. This mapping allows 32-bit multiplications to be mapped to the MPx multiplier in case the X3 or FPx 32-bit multiplier is not present in the configuration. Note however that in this case the resulting code is less efficient than with the 32-bit multiplier, since it requires one more instruction to extract the lower 32-bit part of the result.
8027948 Rev 15
UM1237
8.6 Important remarks and known limitations
MPx native support
Note:
8.6.2
8.6.3
•
•
As already mentioned in the warning in
, the MPx_Vx type should be avoided
when writing new code. The following combinations are especially discouraged: simultaneous use of long long and MPx_Vx types in the same function
C long long arithmetic applied to variables declared as MPx_Vx
For instance, the compiler considers the following code illegal:
MPx_Vx a; long long b, c; c = a + b;
These restrictions do not affect legacy code, as this is only based on a combination of MPx types and built-ins.
Long long passed as function parameters
The ABI of the STxP70 core specifies that function arguments are passed in the core registers. This applies to long long variables as well, and this must be taken into account when making the choice to declare a variable as either MPx or long long type.
Consider the following code: extern int bar ( long long ); int foo ( long long a ) { return ( bar ( a ) );
}
In this example, it makes no sense to store the long long variables in MPx_Vx registers, as the core registers are used for the function call in any case.
Long long life span crossing function call
The STxP70 ABI states that MPx registers are all considered to be scratch registers. This means that they do not retain their values across a function call.
Consider the following code: int foo() { long long a; a = 0L; bar(); a = a + b;
[...]
}
In this example, if a is promoted to MPx_Vx for its full life span, it may be spilled
(e)
by the register allocator, which is extremely costly. A developer must bear this in mind when writing e. “Spilled” means that the contents of the register are temporarily stored in memory and then restored when needed.
8027948 Rev 15 129/166
MPx native support
8.6.4
UM1237
MPx code. Note that the cost is neither assessed nor handled by the compiler, so it is the developer’s responsibility to use the most efficient placement.
Efficiency of code in meta-instructions
Currently, the compiler does not optimize the code in the meta-instructions. In those parts of code, the compiler performs register allocation, but it does not schedule the instructions, nor does it perform any advanced optimizations. Even if the code has been designed for efficiency, it is possible that sub-optimal patterns may exist in the final code if MPx native support is enabled.
This limitation might be overcome in future versions of the tools.
•
•
The current pattern recognition algorithm is limited, and only able to recognize the expressions if: the conversions are made explicit by casts, and correspond to the exact model of the instruction to be recognized it is located in a single C statement
Exact type conversions
For example, in the following code, the maw instruction is not recognized because of implicit type conversions: long long mac; int a, b; mac+= a*b; // multiplication result is 32bit
However, the maw instruction is recognized in the following code: long long mac; int a, b; mac+= (long long)a*b; // multiplication result is 64bit
Single statement expressions
A pattern is more likely to be recognized if it occurs within a single statement. For example, avoid code that resembles the following, as it may result in missed opportunities to map the
maw instruction: long long mac; int a, b; long long tmp; tmp= (long long)a*b; mac+= tmp;
On the other hand, the maw instruction is always recognized in the code below: long long mac; int a, b; mac+= (long long)a*b;
130/166 8027948 Rev 15
UM1237 MPx native support
The automatic mapping of fractional instructions is disabled by default. It is enabled only if the flag -Mextoption=MP1x:enablefractgen
(f)
is set.
Take care when enabling the automatic mapping of fractional instructions. It may induce two changes to the behavior:
1.
The fractional instructions of the MPx are likely to modify the value of the saturation flag. Consequently it is not safe to enable these instructions if the code contains built-ins that use saturation. This change is clearly a non-conservative one.
2. The use of fractional instructions modifies the behavior of overflow. The wrap-around performed in the scope of integer arithmetic is changed into clamping. Notice that this change is still conservative, as it remains compliant with the C standard. Though, it introduces discrepancies between the core and the MPx with regard to the result of arithmetic overflow. For example, the multiplication of 0x7FFFFFFFFFFFFFFF with
0x7FFFFFFFFFFFFFFF
provides the following results:
– without mapping fractional instructions: 0x0000000000000001,
– with mapping of fractional instructions: 0x7FFFFFFC00000001.
Warning: The automatic recognition and mapping of fractional instructions should be enabled only if the following conditions are met:
- source code does not already contain built-ins that may read the saturation flag (otherwise, the semantics may not be preserved)
- clamping is acceptable for handling arithmetic overflow
The mapping of saturated arithmetic and the mapping of the cross register left shift instructions are not supported by the compiler.
f.
The name of this option has changed: it was formerly named -Menablefractgen or -Mfractsupport, which was not accurate enough. The former name is still recognized, but its use is strongly discouraged.
8027948 Rev 15 131/166
MPx native support
8.7 Examples
UM1237
Consider a simple function that performs the addition and shift of two long long input parameters, and returns the result as a long long integer: long long fct(long long a, long long b)
{
long long tmp;
tmp = a + b;
tmp = tmp << 2;
return tmp;
}
No MPx support
When MPx is not present and MPx support is not enabled (stxp70cc -O3 test.c), then the code generated relies solely on core instructions and runtimes:
.global fct fct:
L_BB1_fct:
make R4, 0 ;;
addcu R4, R4, R4 ;;
addcu R0, R0, R2 ;;
make R2, 2 ;;
addcu R1, R1, R3 ;;
.global __shll
.type __shll, @function
jr __shll ;;
MPx support
When MPx is present and MPx support is enabled (stxp70cc -O3 -Mextension=MP1x test.c
), then MPx instructions are mapped where needed:
.global fct fct:
L_BB1_fct:
XRF0RR2X V0, R1, R0 ;;
XRF0RR2X V1, R3, R2 ;;
ADDD V0, V0, V1 ;;
SHLID V0, V0, 2 ;;
XRF0CSX2R R0, V0, V0 ;;
XRF0CSX2R R1, V0, V0 ;;
rts ;;
Note: 1 The moves between the core and the MPx registers are introduced to deal with ABI constraints. Those instructions are necessary only because the addition is insulated in a function. They are not present in successive long long arithmetic operations, and do not represent any extra cost. (Consequently, they are shown here in italic.)
2 The MPx instructions are mapped automatically (ADDD, SHLID) to perform long long operations.
132/166 8027948 Rev 15
UM1237 MPx native support
Consider a piece of code that involves long long operations that do not fit a single MPx instruction. The following example is a function to find the maximum value between two alternatives, a and b.
long long fct(long long a, long long b)
{
long long tmp;
if(a>b) tmp=a;
else tmp=b;
return(tmp);
}
No MPx support
When MPx is not present and MPx support is not enabled (stxp70cc -Os test.c), the code generated relies only on core instructions and runtimes:
.global fct fct:
L_BB1_fct:
cmpeq G0, R1, R3 ;;
cmpgtu G1, R0, R2 ;;
andg G0, G0, G1 ;;
cmpgt G1, R1, R3 ;;
org G0, G0, G1 ;;
G4? or R4, R2, 0 ;;
G0? or R4, R0, 0 ;;
G0? or R3, R1, 0 ;;
or R1, R3, 0 ;;
or R0, R4, 0 ;;
rts ;;
The core of the computation are those instructions that are not in italic. The sequence contains three comparisons and two boolean operations (GMI).
8027948 Rev 15 133/166
MPx native support
8.7.3
UM1237
MPx support
When MPx is present and MPx support is enabled (stxp70cc -Os -Mextension=MP1x test.c
), only two comparisons are needed. (The instructions in italic are not taken into account, as they are mainly needed because of the encapsulation of the code in a function.)
.global fct fct:
L_BB1_fct:
XRF0RR2X V3, R1, R0 ;;
XRF0RR2X V2, R3, R2 ;;
cmpgtx2r R0, V3, V2 ;;
cmpne G0, R0, 0 ;;
L__0_4:
G4? XRF0CSX2R R0, V0, V2 ;;
G0? XRF0CSX2R R2, V1, V3 ;;
G4? XRF0CSX2R R1, V0, V0 ;;
G0? or R0, R2, 0 ;;
G0? XRF0CSX2R R2, V1, V1 ;;
G0? or R1, R2, 0 ;;
rts ;;
Case of the 32-bit multiplication
Consider the function below, which performs the multiplication of two 32-bit integers and returns the result as a 32-bit integer: int fct(int a, int b)
{ return (a*b);
}
The resulting assembly code depends on compiler options and core configuration.
No X3 multiplier, no MPx support
If code is compiled without the X3 32-bit multiplier and without the MPx native support
(stxp70cc -O3 -Mconfig=mult:no test.c), then a runtime is called:
.global fct fct:
L_BB1_fct:
.global __mulw
.type __mulw, @function
jr __mulw ;;
134/166 8027948 Rev 15
UM1237
Note:
MPx native support
X3 multiplier, no MPx support
If code is compiled with the X3 32-bit multiplier, and without the MP1x support (stxp70cc
-O3 -Mconfig=mult:yes test.c
), then the 32-bit multiplication available in X3 is mapped:
.global fct fct:
L_BB1_fct:
mp R0, R0, R1 ;;
rts ;;
No X3 multiplier, MPx support
If code is compiled without the X3 32-bit multiplier, but with the MPx support enabled
(stxp70cc -O3 -Mextension=MP1x test.c), then the MPx 64-bit multiplier emulates a 32-bit multiplication. This requires one more instruction to extract the proper 32-bit result:
.global fct fct:
L_BB1_fct:
mpw V2, R0, R1 ;;
xrf0csx2r R5, V2, V2 ;;
L__0_2:
or R0, R5, 0 ;;
rts ;;
If both the X3 32-bit multiplier and the 64-bit MPx multiplier can be used to map a 32-bit multiplication, then the X3 multiplier is preferred.
8027948 Rev 15 135/166
Relocatable loader library
9 Relocatable loader library
UM1237
This chapter describes how dynamic loading is implemented using the relocatable loader library RL_LIB for the STxP70.
list a number of acronyms and definitions used within this chapter.
Table 29.
Acronyms
Acronym
DLL
DSO
GOT
GP
PC
PIC
PID
Term
Dynamic link library
Dynamic shared object
Global offset table
Global pointer – alias of R13 register in STxP70 ABI
Program counter register
Position independent code
Position independent data
Table 30.
Definitions
Term
Preemption
Relocation
Definition
Sometimes you may need to use some of the functions or data items from a shareable object, but may wish to replace others with your own definitions. For example, you may want to use the standard C runtime library shareable object, libc.so, but to use your own definitions for the heap management routines malloc() and free(). In this case it is important that calls to malloc() and free() within libc.so call your definition of the routines and not the definitions present in libc.so. Your definition should override, or preempt, the definition within the shareable object. This feature of shareable objects is called symbol preemption.
Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. In other words, relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image.
136/166 8027948 Rev 15
UM1237 Relocatable loader library
This section provides an introduction to the concepts used for dynamic linking.
•
•
All code within a dynamic link library (DLL) should be position independent (PIC). This allows the text segment of the DLL to remain pure so that it can be shared among many processes. Position-independence imposes two requirements on generated code:
Code that forms an absolute address referring to any address in the DLL’s text or data segments is not allowed, because the code would have to be relocated at load time, making it non-sharable. All branches must be PC-relative, instruction and references to the data segment and to constants and literals in the text segment must be relative to a base pointer (typically GP).
Code that references symbols that are or may be imported from other loaded modules must use indirect addressing through a global offset table (GOT). The linker is expected to resolve procedure calls by creating import stubs, and the compilers must generate indirect loads and stores for data items that may be dynamically bound. In both cases, the indirection is made through the global offset table, allocated by the linker, and initialized by the dynamic loader. The global offset table is described in
Procedure calls and long branch stubs
through to
Materializing function pointers on page 138
Procedure calls and long branch stubs
•
•
•
Normal procedure calls can be prepared with the call instructions, which use PC-relative addressing. There are three possible cases at link time:
If the target is not within the same module, or if it is subject to preemption by an earlier definition from another loaded module, the linker must allocate an import stub and resolve the relocation of the call instruction to the stub.
If the target is known to be within the same module and the displacement is small enough, the call instruction can be statically resolved to the call target.
If the target is within the same load module, but the displacement is too far for the call instruction, the linker must allocate a long branch stub. The long branch stub itself must satisfy the PIC requirements. If the target is within range of the stub, the stub may use a PC-relative goto instruction; otherwise, it must load the address of the target from the global offset table.
Access to the data segment
The DLL’s data segment must be accessed through the GP value that must be set by a DLL procedure before any use. The GP value is used to access both global offset tables and statically allocated data.
•
There are several cases:
Global variables that are imported from another load module, or that are subject to preemption by an earlier definition in another load module, must be accessed indirectly through the global offset table. The compiler must generate code to load a pointer from the global offset table, using GP-relative addressing mode, and then access the data item using that pointer. The compiler does not have to allocate the global offset table; there are relocations defined in the object file format that instruct the linker to allocate a global offset table slot and to supply the GP-relative address of that slot.
8027948 Rev 15 137/166
Relocatable loader library UM1237
•
Statically allocated variables of local scope, or global variables whose definitions are not subject to pre-emption, may be accessed directly with GP-relative addressing mode.
Access to constants and literals in the text segment
Constants and literals allocated in the text segment may be accessed with GP-relative addressing, or with indirect addressing through the global offset table.
Materializing function pointers
Function pointers may be materialized by indirect addressing through the global offset table.
Pointers to functions that are not subject to preemption may be materialized using
GP-relative addressing. Function pointers may not be materialized from immediate operands.
When the linker determines that a procedure call refers to an entry point in a different load module, it resolves the reference locally by building an import stub with the same name as the intended target. The import stub contains code that points to an entry point inside the
global offset table, and transfers control, as described in Section Calling sequence.
Control is then transferred if the compiler gets enough information to know that a particular entry point is in a different load module, it may generate a calling sequence that obviates the need for the linker to build an import stub. However, this calling sequence is ABI specific, and is not specified in this document.
The dynamic loader is a component of the operating system software that locates all load modules belonging to an application, loads them into memory, and binds the symbolic references among them. Most of the operations of the dynamic loader is specific to the particular operating system environment, and is further described in the ABIs for those environments. The common run-time architecture has been designed to minimize the amount of work involved in the binding process, by concentrating most of the relocation required in the global offset tables, and by prohibiting any items in the text segment that may require dynamic relocation.
9.1.4 Rationale
Code in main programs may be absolute or position independent. If an absolute program imports data from a DLL, the linker is forced to allocate the data in the main program’s data segment statically (this is commonly called the “.dynbss hack”). When data imported from
DLLs is allocated in the main program’s data segment, the program may be subject to future compatibility problems when the DLL is replaced with a newer version. This issue may be avoided by requiring main programs to be position independent, at the cost of some efficiency in the main program. This compatibility/performance trade-off is not made in the common run-time architecture; it is left to the specific ABI.
138/166 8027948 Rev 15
UM1237 Relocatable loader library
Direct and indirect procedure calls are described in the following sections.
Direct procedure calls follow the sequence of steps shown in
paragraphs describe these steps in detail.
1.
Preparation for call. Values in scratch registers that must be kept alive across the call must be saved. They can be saved by either copying them into preserved registers or by saving them onto the memory stack.
The parameters must be set up in registers and memory as described in the Subroutine
linkage and parameter passing chapter of the STxP70 Application binary interface
manual (7937486).
2. Procedure call. All direct calls are made with a call relative instruction, which writes the link register (also known as LK) for the return link.
For direct local calls the PC-relative displacement to the target is computed at link time.
Compilers may assume that the standard displacement field in the call instruction is sufficiently wide to reach the target of the call. If the displacement is too large, the linker must supply a branch stub at some convenient point in the code; compilers must guarantee the existence of such a point by ensuring that code sections in the relocatable object files are no larger than the maximum reach of the call instruction.
Direct calls to other load modules cannot be statically bound at link time, so the linker must supply an import stub for the target procedure; this import stub obtains the address of the target procedure from the global offset table. The call instruction can then be statically bound using the PC-relative displacement to the import stub.
The call instruction saves the return link address in the link register, which is aliased to general purpose register R14.
3. Import stub (direct external calls only). The import stub is allocated in the load module of the caller, so that the call instruction may be statically bound to the address of the import stub. The import stub obtains the address of the target procedure’s entry point from the global offset table. In position-independent code (PIC), it must access the global offset table using the current GP (which means that the GP must be valid at the point of call). In absolute code, it can access the global offset table using an absolute reference, so the GP does not need to be valid at the point of call. The import stub then branches to the target entry point.
The detailed operation of an import stub is ABI specific.
When the target of a call is in the same load module, an import stub is not used.
However, for position-independent code, the GP value must still be valid for the caller at the point of call, so that if the target is an internal function, it can assume that the GP value is already correctly set.
The compiler may choose to generate calling code that performs the functions of the import stub. This saves a branch compared to using the import stub, but is less efficient than a direct call within the same load module. Therefore, the compiler should only do this if it deduces that call target is in a separate load module, or that there is a high probability of this.
8027948 Rev 15 139/166
Relocatable loader library UM1237
4. Procedure entry. The prologue code in the target procedure is responsible for allocating a frame on the memory stack, if necessary.
If it is a non-leaf procedure, it must save the link register in the memory stack frame.
The prologue must also save any preserved registers that will be used in this procedure.
If it is a position-independent procedure that makes calls or accesses global data, then it must establish the GP value in the GP register. The GP register (R13) is a preserved register, and therefore must be saved before being modified. A position-independent internal function may assume that the GP register already contains the correct value.
A position-independent leaf procedure that accesses global data is not required to put the GP value in R13, it may use a scratch register instead, thus avoiding the need for saving and restoring register R13.
5. Procedure exit. The epilogue code is responsible for restoring the link register and any preserved registers that were saved.
If a memory stack frame was allocated, the epilogue code must deallocate it. Finally, the procedure exits by branching through the link register with the return instruction.
6. After the call. Any saved values should be restored.
Figure 28. Direct procedure calls
Caller Callee
Prepare the call
- setup arguments
- save registers
Import stub
-bad entry address
-goto
Entry
- allocate memory frame
- save return link
- save registers
Call
- call
After the call
- restore registers
Procedure body
Exit
- restore registers
- restore return link
- destroy memory frame
- return
140/166
Indirect procedure calls follow nearly the same sequence, except that the branch target is set indirectly. This sequence is best shown in
.
1.
Preparation for call. Indirect calls are built by loading the entry point address into the link register. Values in scratch registers that must be kept alive across the call must be saved, which can be done by either copying them into preserved registers or by saving them on the memory stack. The parameters must be set up in registers and memory as described in the Subroutine linkage and parameter passing chapter of the STxP70
Application binary interface manual (7937486).
8027948 Rev 15
UM1237 Relocatable loader library
2. Procedure call. All indirect calls are made with the call indirect instruction, which reads and writes the link register. The call instruction saves the return link address in the link register.
3. Procedure entry, exit, and return. The remainder of the calling sequence is the same as for direct calls.
Figure 29. Indirect procedure calls
Caller
Callee
Prepare the call
- load entry address
- setup arguments
- save registers
Entry
- allocate memory frame
- save return link
- save registers
Call
- call
After the call
- restore registers
Procedure body
Exit
- restore registers
- restore return link
- destroy memory frame
- return
8027948 Rev 15 141/166
Relocatable loader library
9.3
Note:
Introduction to the relocatable loader library
UM1237
The relocatable loader library (RL_LIB) supports the creation and loading of DSOs
(dynamic shared objects, also known as load modules) in an embedded environment.
RL_LIB implements DSOs as defined in the standard for supporting ELF System V Dynamic
Linking.
For applications that do not rely on advanced OS features (such as file systems, virtual memory management and multi process segment sharing), use RL_LIB as an alternative to
the standard ELF System V Dynamic Loader (libdl.so).
9.3.2
The ELF System V ABI supports several run-time models. Only some run-time models are suitable for embedded systems without the support of traditional operating system services.
The run-time model for an application dictates the method used for linking and loading.
RL_LIB implements the R_Relocatable run-time model. The application has a main module and several load modules. The main module is statically linked and loaded. The load modules are loaded on demand (by explicit calls to the loader) at run-time. The load modules are loaded at an arbitrary address and dynamic symbol binding is applied by the loader for symbols undefined in the load modules. In the hierarchy of loaded modules, the dynamic symbol binding traverses the modules from the bottom up.
Relocatable run-time model
•
•
•
•
•
•
•
•
•
•
The R_Relocatable run-time model, as implemented by RL_LIB, has the following features: one main module loaded at application startup by the system several load modules that can load at run-time and unload after use several modules can be resident at the same time a loaded module can load and unload other load modules (as for the main module) load modules can be loaded anywhere access to symbols in loaded modules from the loader through a call to the loader library the loader performs dynamic symbol binding when loading a module and symbols are searched in the load modules hierarchy bottom-up (to the main module) sharing of code and data objects between modules is achieved by linking to the objects in a common ancestor the loader library is statically linked with the main module the system support archive library should be linked with the main module
shows an example of an application that has four load modules A, B, C and D.
142/166 8027948 Rev 15
UM1237 Relocatable loader library
Figure 30. Example of an application with four load modules
printf
Module B main printf malloc printf
*exec_A
Module A malloc
*exec_B
*exec_C
*exec_D malloc
Module C printf malloc
Module D
Note:
In
, curved arrows (from load modules to parent module) represent load time
symbol-binding performed while the load module loads. Straight arrows (from loader module to loaded module) represent explicit symbol address resolution performed through the loader library API.
The following describes a possible scenario.
1.
At run-time, the main module loads the module A into memory through the rl_load_file()
function.
2. The loader, in the process of loading A into memory, binds the symbol printf
(undefined in A) to the printf function defined in main.
3. The main program uses the rl_sym() function to retrieve a pointer to the function symbol exec_A in A.
4. For A, the main program loads the module D and references to printf are resolved to the printf in main. In addition, references to malloc in D are also resolved to the malloc
in main.
5. The main program retrieves a pointer to exec_D in D using the rl_sym() function.
6. The main program (at some point) invokes the function exec_A.
7. The function loads the two modules B and C.
8. The undefined reference to printf in B is resolved to the printf in main (the loader searches first in A, and then in main).
9. The undefined reference to malloc in C is resolved to the malloc in A (the loader searches for and finds it in A). Note that the malloc function called from D (malloc of main) is then different from the malloc function called from B (or C, or A) which is the malloc
of A.
10. After retrieving symbol addresses using the rl_sym() function, module A can indirectly call functions or reference data in B and C.
At any time, the main module or the module A can unload one of the loaded modules.
8027948 Rev 15 143/166
Relocatable loader library UM1237
The relocatable code generation model
•
•
The relocatable code generation model is the same as the code generation model for the
System V model with the following differences.
No symbol can be preempted. Dynamic symbol binding always searches the current module first. This has the effect that a module containing a symbol definition can be sure that it will use this definition. For example, this enables inlining in load modules.
Weak references are treated the same way as undefined references in load modules.
Therefore, when traversing the module tree bottom-up, the first definition found is taken.
9.4 Relocatable loader library API
The relocatable loader library supports loading and unloading a module and for accessing a symbol address in a module by name. The relocatable loader library is provided as a library librl.a
and its associated header file rl_lib.h.
The functions defined in this API are explained in the following sections.
All the functions manipulating a load module use a pointer to the rl_handle_t type. This is an abstract type for a load module handle.
A load module handle is allocated by the rl_handle_new() function and deallocated by the rl_handle_delete() function.
The main module handle is statically allocated and initialized in the startup code of the main module.
A module handle references one loaded module at a time. To load another module from the same handle, the previous module must first be unloaded.
144/166 8027948 Rev 15
UM1237 Relocatable loader library
rl_handle_new
Definition:
rl_handle_t *rl_handle_new(
const rl_handle_t *parent,
int mode);
Allocate and initialize a new handle
Arguments:
parent mode
mode
The handle of the parent module.
Determines the RL_LIB chunk mode. Valid values for mode are:
RL_ONE_CHUNK_MODE
(defined to be 0)
RL_MULTIPLE_CHUNK_MODE
(defined to be 1)
Returns:
Description:
The newly initialized handle.
The rl_handle_new() function allocates and initializes a new handle that can be used for loading and unloading a load module.
The handle of the parent module to which the loaded module will be connected is specified by the parent argument.
In RL_MULTIPLE_CHUNK_MODE, the mode argument activates two separate memory allocators: rl_text_memalign for text segments and rl_data_memalign for data segments. In RL_ONE_CHUNK_MODE, the mode argument activates one global memory allocator rl_memalign, for any segment type.
Generally, a load module will be attached to the module using this function, therefore a handle will typically be allocated as follows: rl_handle_t *new_handle = rl_handle_new(rl_this(),
RL_ONE_CHUNK_MODE);
rl_handle_delete
Definition:
int rl_handle_delete(
rl_handle_t *handle);
Arguments:
Finalize and deallocate a module handle
handle
The handle to deallocate.
Returns:
Description:
Returns 0 for success, -1 for failure.
The rl_handle_delete() function finalizes and deallocates a module handle.
The handle must not hold a loaded module. The loaded module must have been first unloaded by rl_unload() before calling this function. If successful, the value returned is 0. Otherwise the value returned is -1 and the error code returned by rl_errno()
is set accordingly.
8027948 Rev 15 145/166
Relocatable loader library UM1237
rl_this
Definition:
Arguments:
Returns:
Description:
Return the handle for the current module
rl_handle_t *rl_this(void);
None.
The handle for the current module.
The rl_this() function returns the handle for the current module. If called from the main module, it returns the handle of the main module. If called from a loaded module, it returns the handle that holds the loaded module.
This function is used when allocating a handle with rl_handle_new(). It can also be used, for example, to retrieve a symbol in the current module: void *symbol_ptr = rl_sym(rl_this(), "symbol");
rl_parent
Definition:
Arguments:
Returns:
Description:
Return the handle for the parent of the current handle
rl_handle_t *rl_parent(void);
None.
The handle for the parent of the current handle.
The rl_parent() function returns the handle for the parent of the current handle
(as returned by rl_this()).
It may be used, for example, to find a symbol in one of the parent modules: void *symbol_in_parents = rl_sym_rec(rl_parent(), "symbol");
rl_load_addr
Definition:
Return the memory load address of a loaded module
const char *rl_load_addr(
rl_handle_t *handle);
Arguments:
handle
The handle for the loaded module.
Returns:
Description:
The memory load address of the loaded module, or NULL.
The rl_load_addr() function returns the memory load address of a loaded module. It returns NULL if the handle does not hold a loaded module or if the handle passed is the main program handle.
rl_load_size
Definition:
Return the memory load size of a loaded module
unsigned int rl_load_size(
rl_handle_t *handle);
Arguments:
handle
The handle for the loaded module.
Returns:
Description:
The memory load size of the loaded module, or 0.
The rl_load_size() function returns the memory load size of a loaded module. It returns 0 if the handle does not hold a loaded module or if the handle passed is the main program handle.
146/166 8027948 Rev 15
UM1237 Relocatable loader library
rl_file_name
Return the filename associated with the loaded module handle
Definition:
const char *rl_file_name(
rl_handle_t *handle);
Arguments:
handle
The handle for the loaded module.
Returns:
Description:
The filename associated with the loaded module handle, or NULL.
The rl_file_name() function returns the filename associated with the loaded module handle. It returns NULL if no filename is associated with the current loaded module, if the handle does not hold a loaded module or if the handle passed is the main program handle.
rl_set_file_name
Definition:
int rl_set_file_name(
rl_handle_t *handle,
const char *f_name);
Arguments:
handle f_name
Specify a filename for the handle
The handle for the module.
The filename to specify for the handle.
Returns:
Description:
Returns 0 for success, -1 for failure.
The rl_set_file_name() function is used to specify a filename for a handle. This filename is attached to the next module that will be loaded. It can be used to specify a filename for modules loaded from memory or to force a different filename for a module loaded from a file.
This function returns 0 if the filename was successfully set, or -1 and the error code returned by rl_errno() is set accordingly if a module is already loaded or if the application runs out of memory.
8027948 Rev 15 147/166
Relocatable loader library UM1237
rl_load_buffer
Definition:
int rl_load_buffer(
rl_handle_t *handle,
const char *image);
Arguments:
handle image
Load a relocatable module into memory
The handle for the module.
The image of the load module.
Returns:
Description:
Returns 0 for success, -1 for failure.
The rl_load_buffer() function loads a relocatable module into memory from the image referenced by image.
It allocates the space for the loaded module in the heap, loads the segments from the memory image of the loadable module, links the module to the parent module of the handle and relocates and initializes the loaded module.
This function calls the action callback functions for RL_ACTION_LOAD after loading and before executing any code in the loaded module.
The value 0 is returned if the loading was successful. The value -1 is returned on failure and the error code returned by rl_errno() is set accordingly.
rl_load_file
Definition:
Load a relocatable module into memory from a file
int rl_load_file(
rl_handle_t *handle,
const char *f_name);
Arguments:
Returns:
Description:
handle f_name
The handle for the module.
The file from which to load the relocatable module.
Returns 0 for success, -1 for failure.
The rl_load_file() function loads a relocatable module into memory from the file specified by f_name.
It opens the specified file with an fopen() call, allocates the space for the loaded module in the heap, loads the segments from the file, links the module to the parent module of the handle, relocates and initializes the loaded module. The file is closed with fclose() before returning. This function calls the action callback functions for the RL_ACTION_LOAD after loading and before executing any code in the loaded module.
0
is returned if the load was successful, -1 is returned on failure and the error code returned by rl_errno() is set accordingly.
148/166 8027948 Rev 15
UM1237 Relocatable loader library
rl_load_stream
Load a relocatable module into memory from a byte stream
Definition:
typedef int rl_stream_func_t (
void *cookie,
char *buffer,
int length); int rl_load_stream(
rl_handle_t *handle,
rl_stream_func_t *stream_func,
void *stream_cookie);
Arguments:
Returns:
Description:
handle stream_func stream_cookie
The handle for the module.
The user specified callback function.
The user specified state.
Returns 0 for success, -1 for failure.
The rl_load_stream() function loads a relocatable module into memory from a byte stream provided through a user specified callback function stream_func and the user specified state stream_cookie.
The callback function must be of type rl_stream_func_t. It is called multiple times by the loader to retrieve the load module data in the buffer buffer of length length until the module is loaded into memory. The loader always calls the callback function with a buffer length strictly greater than 0. The stream_cookie argument passed to rl_load_stream
is passed to the callback function in its cookie parameter. The cookie
parameter is intended to be used by the callback function to update a private state.
The callback function must return the number of bytes transferred. If the returned value is less than the given buffer length or is -1, rl_load_stream() will in turn return an error and the error code returned by rl_errno() is set accordingly.
The rl_load_stream() function allocates the space for the loaded module from the heap, loads the segments by calling the callback function, links the module to the parent module of the handle, relocates and initializes the loaded module. This function calls the action callback functions for RL_ACTION_LOAD after loading and before executing any code in the loaded module.
0
is returned if the load was successful, -1 is returned on failure and the error code returned by rl_errno() is set accordingly.
This function can be used as an alternative to rl_load_buffer() or rl_load_file()
to allow any loading method to be implemented.
8027948 Rev 15 149/166
Relocatable loader library
rl_unload
Definition:
Arguments:
UM1237
The following example illustrates how the rl_load_file() function may be implemented using the rl_load_stream() function:
/* User implementation of the callback function that read from a file. */
static int rl_stream_read(FILE *file, char *buffer, int length)
{
int nbytes;
nbytes = fread(buffer, 1, length, file);
}
return nbytes;
...
{
/* Loads the module from a file.*/
FILE *file;
int status;
file = fopen(f_name, "rb");
if (file == NULL) { /*... error... */ }
status = rl_load_stream(handle, (rl_stream_func_t
*)rl_stream_read, file);
if (status == -1) { /*... error... */ }
fclose(file);
}
...
Unload a previously loaded relocatable module
int rl_unload(
rl_handle_t *handle);
Returns:
Description:
handle
The handle for the module.
Returns 0 for success, -1 for failure.
The rl_unload() function unloads a previously loaded relocatable module. It finalizes, unlinks, and frees allocated memory for the loaded module. This function calls the action callback functions for RL_ACTION_UNLOAD before unloading and after having executed finalization code in the module.
The return value is 0 if the unloading is successful, otherwise the return value is -1 and the error code returned by rl_errno() is set accordingly.
150/166 8027948 Rev 15
UM1237 Relocatable loader library
rl_sym
Definition:
Return a pointer reference to the symbol in the loaded module
void *rl_sym(
rl_handle_t *handle,
const char *name);
Arguments:
Returns:
Description:
handle name
The handle for the loaded module.
The symbol in the loaded module.
The pointer reference to the symbol.
The rl_sym() function returns a pointer reference to the symbol named name in the loaded module specified by handle. It searches the dynamic symbol table of the loaded module and returns a pointer to the symbol. The handle parameter can be the handle of any currently loaded module, or the handle of the main module.
If the symbol is not defined in the loaded module, NULL is returned. It is not generally an error for this function to return NULL. For example, the user may conditionally call a specific function only if it is defined in the module.
In this function, as well as in the rl_sym_rec() function, the name parameter must be the mangled symbol name. For instance, on some targets, C names are mangled by prefixing the name with an underscore (_). For example, to return a reference to the printf() function, the symbol name passed to rl_sym() will be “_printf”.
rl_sym_rec
Return a pointer reference to the symbol in the loaded module or one of its ancestors
Definition:
void *rl_sym_rec(
rl_handle_t *handle,
const char *name);
Arguments:
handle name
The handle for the loaded module.
The symbol in the loaded module.
Returns:
Description:
The pointer reference to the symbol.
The rl_sym_rec() function returns a pointer reference to the symbol named name in the loaded module specified by handle or one of its ancestors.
This function searches the dynamic symbol table of the loaded module and returns a pointer to the symbol if found. If the symbol is not found, the function iteratively searches in the dynamic symbol table of the parent module until the symbol is found.
The handle parameter can be the handle of any currently loaded module, or the handle of the main module.
If the symbol is not defined in the loaded module or one of its ancestors, NULL is the returned. It is not generally an error for this function to return NULL.
The name parameter must be the mangled symbol name as for the rl_sym() function.
8027948 Rev 15 151/166
Relocatable loader library UM1237
rl_foreach_segment
Definition:
Iterate over all the segments of loaded module and call the supplied function
typedef rl_segment_info_t_ rl_segment_info_t; typedef int rl_segment_func_t (
rl_handle_t *handle,
rl_segment_info_t *seg_info,
void *cookie); int rl_foreach_segment(
rl_handle_t *handle,
rl_segment_func_t *callback_fn,
void *callback_cookie);
Arguments:
Returns:
Description:
handle callback_fn
The handle for the module.
The user specified callback function.
callback_cookie
The argument to pass to the function.
Returns 0 for success, -1 for failure.
The rl_foreach_segment() function iterates over all the segments of the loaded module handle and calls back the user supplied function. For each segment, the function callback_fn is called with the following parameters.
handle
The handle passed to the function.
seg_info
The segment information pointer filled with the current segment information.
cookie
The argument passed to the function.
The segment information returned in seg_info is a pointer to the following structure: typedef unsigned int rl_segment_flag_t; struct rl_segment_info_t_ {
const char *seg_addr;
unsigned int seg_size;
rl_segment_flag_t seg_flags;
};
The user callback function must return 0 on success or -1 on error.
In the case where the callback function returns an error, the rl_foreach_segment()
function returns -1 and the error code returned by rl_errno
is set to RL_ERR_SEGMENTF. Otherwise the function returns 0.
152/166 8027948 Rev 15
UM1237 Relocatable loader library
rl_add_action_callback
Definition:
Add a user action callback function to the user action callback list
typedef unsigned int rl_action_t;
#define RL_ACTION_UNLOAD 2
Arguments:
Returns:
Description:
typedef int rl_action_func_t (
rl_handle_t *handle,
rl_action_t action,
void *cookie); int rl_add_action_callback(
rl_action_t action_mask,
rl_action_func_t *callback_fn,
void *callback_cookie); action_mask
The set of actions for which the callback function must be called.
callback_fn
The user specified callback function.
callback_cookie
The argument to pass to the function.
Returns 0 for success, -1 for failure.
The rl_add_action_callback() function adds a user action callback function to the user action callback list. It can be called multiple times with different callback functions. The same callback function cannot be added more than once.
For each defined action, each callback function is called in the order it was added into the callback list. The callback functions are not attached to a particular module and are called for any further loaded/unloaded modules.
This function returns 0 on success and -1 on failure. It does not set any error codes.
This function can fail if a callback function is already in the callback list or if the program goes out of memory.
The rl_action_t type defines the action flags for module loading/unloading and is passed to the action function callback. The action flags can be OR-ed to create an action mask that can be passed to the function rl_add_action_callback(). The action defined are:
RL_ACTION_LOAD
The callback is called just after the module has been loaded in memory and cache has been synchronized. No module code has been executed.
RL_ACTION_UNLOAD
The callback is called just before the module is unloaded from memory. No module code will be executed after this point.
RL_ACTION_ALL
The callback will be called for any action.
8027948 Rev 15 153/166
Relocatable loader library UM1237
The type for the user action callback function is rl_action_func_t. The parameters passed to the callback function when it is called are: handle
The handle that performed the action.
action performed.
cookie
The parameter passed to rl_add_action_callback()
.
The callback function returns 0 on success and -1 on failure. In the case of failure, the loading (or unloading) of the module is undone and the error code returned by rl_errno()
is set to RL_ERR_ACTIONF.
rl_delete_action_callback
Definition:
Remove the given function from the action callback list
int rl_delete_action_callback(
rl_action_func_t *callback_fn);
Arguments:
Returns:
Description:
callback_fn
The user specified callback function.
Returns 0 for success, -1 if the callback was not present in the callback list.
The rl_delete_action_callback() function removes the specified callback function from the action callback list. This function returns 0 if the callback was removed, or -1 if it was not present in the callback list. No error code is set.
rl_errno
Definition:
Return the error code for the last failed function
int rl_errno(
rl_handle_t *handle);
Arguments:
Returns:
Description:
handle
The handle for the module.
The error code for the last failed function.
The rl_errno() function returns the error code for the last failed function.
Table 31
lists the possible codes.
Table 31.
Errors returned by rl_errno()
Error code
RL_ERR_NONE
RL_ERR_MEM
RL_ERR_ELF
Diagnostic
Possible error causing function
No previous call has failed.
Ran out of memory (rl_memalign(), rl_text_memalign()
or rl_data_memalign() failed).
The load module is not a valid ELF file.
rl_load_buffer()
, rl_load_file()
, rl_load_stream()
, rl_set_file_name() rl_load_buffer()
, rl_load_file()
, rl_load_stream()
, rl_set_file_name()
154/166 8027948 Rev 15
UM1237 Relocatable loader library
Table 31.
Errors returned by rl_errno() (continued)
Error code Diagnostic
Possible error causing function
RL_ERR_DYN
RL_ERR_SEG
RL_ERR_REL
RL_ERR_RELSYM
RL_ERR_SYM
RL_ERR_FOPEN
RL_ERR_FREAD
RL_ERR_STREAM
RL_ERR_LINKED
RL_ERR_NLINKED
RL_ERR_SEGMENTF
RL_ERR_ACTIONF
The load module is not a dynamic library.
rl_load_buffer()
, rl_load_file()
, rl_load_stream()
, rl_set_file_name()
The load module has invalid segment information.
The load module contains invalid relocations.
rl_load_buffer()
, rl_load_file()
, rl_load_stream()
, rl_set_file_name() rl_load_buffer()
, rl_load_file()
, rl_load_stream()
, rl_set_file_name()
A symbol was not found a load time.
rl_errarg()
returns the symbol name.
rl_load_buffer()
, rl_load_file()
, rl_load_stream()
, rl_set_file_name()
The symbol is not defined in the module.
rl_errarg()
returns the symbol name.
rl_sym()
, rl_sym_rec()
The file cannot be opened by rl_fopen()
.
Error while reading the file in rl_fread()
.
Error while loading the file from a stream.
rl_load_file() rl_load_file() rl_load_stream()
Module handle is already linked.
Module handle is not linked
Error in segment function callback.
Error in action function callback. rl_load_file()
, rl_load_buffer()
, rl_load_stream()
, rl_handle_delete() rl_unload()
, rl_sym(), rl_sym_rec()
, rl_foreach_segment() rl_foreach_segment() rl_load_file()
, rl_load_buffer()
, rl_load_stream()
8027948 Rev 15 155/166
Relocatable loader library
rl_errarg
Definition:
Arguments:
UM1237
Return the name of the symbol that could not be resolved
const char *rl_errarg(
rl_handle_t *handle);
Returns:
Description:
rl_errstr
Definition:
Arguments:
handle
The handle for the module.
The name of the symbol that could not be resolved.
If rl_errno() returns either RL_ERR_RELSYM or RL_ERR_SYM, the rl_errarg() function returns the name of the symbol that could not be resolved.
Return a string for an error code
const char *rl_errstr(
rl_handle_t *handle);
Returns:
Description:
handle
The handle for the module.
A string for the error code.
The rl_errstr() function returns a readable string for the error code reported by rl_errno()
. For example:
...
void *sym = rl_sym(handle, "symbol"); if (sym == NULL) fprintf(stderr, "failed: %s\n", rl_errstr(handle));
...
If symbol is not defined in the module referenced by handle then the following message is displayed: failed: symbol not found: symbol
156/166 8027948 Rev 15
UM1237 Relocatable loader library
9.5 Customization
The relocatable loader library defines a number of functions that it uses internally for providing services such as heap memory management and file access. To provide custom implementation of these functions, the application in the main module can override these functions.
Note:
These functions allocate free space for the load module image and for the handle objects: void *rl_malloc(int size); void *rl_memalign(int align, int size); void *rl_text_memalign(int align, int size); void *rl_data_memalign(int align, int size); void rl_free(void *ptr);
•
•
•
Where: rl_memalign
is valid only in RL_ONE_CHUNK_MODE rl_text_memalign
is valid only in RL_MULTIPLE_CHUNK_MODE for text segments rl_data_memalign
is valid only in RL_MULTIPLE_CHUNK_MODE for data segments
The default behavior for these functions is to call the standard C library functions malloc()
, memalign() and free() respectively.
If providing a custom implementation, override all three functions.
Note:
The rl_load_file() function uses these functions to open, read and close a file handle: void *rl_fopen(const char *f_name, const char *mode); int rl_fclose(void *file); int rl_fread(char *buffer, int eltsize, int nelts, void *file);
The default behavior for these functions is to call the standard C library functions fopen(), fread()
and fclose() respectively.
If providing a custom implementation, override all three functions and link them with the main program.
To build a relocatable library that can be loaded by the RL_LIB loader, additional compile time and link time options must be used.
The following is a simple example of building a hello world loadable module: stxp70cc -o rl_hello.o -fpic -Mgot=small -c rl_hello.c
stxp70cc -o rl_hello.rl --rlib rl_hello.o
Alternatively, the compile and link phases can be carried out with a single command: stxp70cc -o rl_hello.rl -fpic -Mgot=small --rlib rl_hello.c
To build a main module suitable for loading a relocatable library, specific link time options are required. No specific compile time option are required for the main module.
8027948 Rev 15 157/166
Relocatable loader library
9.6.1
Note:
UM1237
The following is an example of building a main module: stxp70cc -o prog.o prog.c
stxp70cc -o prog.exe --rmain prog.o
The compile and link phases can be carried out with a single command: stxp70cc -o prog.exe --rmain prog.c
Importing and exporting symbols
For the relocatable loader system to function, the main module (or a loaded module) must provide services to the other load modules. To avoid a load error when loading a module, it is usual for the referenced symbols to be linked into the main module.
When the services are present in a library, the main module imports the corresponding symbols at link time. However, to import symbols, the linker requires an import script.
•
•
stxp70-rltool generates a list of symbols in the form of an import or export script from the specified input files. Where, the input files are either load modules (relocatable libraries) or a text file containing a list of symbols:
An import script is generated from a list of symbols specified in the file symbol_list
(where, symbol_list must have only one symbol on each line), or from one or more load module files. In the latter case, the stxp70-rltool utility generates an import script from the set of symbols that the load modules require.
An export script can be generated to reduce the size of the dynamic symbol table in the main module or load modules. An export script is not mandatory as all global symbols are exported by default.
The export script defines the set of symbols (and only these) that must be exported to the other modules through the dynamic symbol table. These symbols are then accessible by the load time symbol binding process and by the calls to rl_sym() and rl_sym_rec()
.
This utility has both a generic driver stxp70-rltool as well as version specific commands to invoke it: stxp70v3-rltool and stxp70v4-rltool. All versions of the utility are documented in the STxP70 utilities reference manual (8210925).
stxp70v3-rltool and stxp70v4-rltool are identical in terms of options and arguments.
Using the relocatable loader import/export utility
This section provides some examples of using the relocatable loader import/export utility.
•
•
Two common scenarios where an import script might be generated are:
When the required services are well defined and a list of symbols can be passed to the
stxp70-rltool utility.
When the list of services is not defined but the load modules are available and can be passed to the stxp70-rltool utility. The stxp70-rltool utility generates an import script from the set of symbols that the load modules require.
The following command generates an import script from a list of symbols specified in the file prog_import.lst
(one symbol per line): stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -s -o prog_import.ld prog_import.lst
158/166 8027948 Rev 15
UM1237 Relocatable loader library
The following command generates an import script that the main module can load from a list of load modules, liba.rl and libb.rl: stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -o prog_import.ld liba.rl libb.rl
Use the import script to link the main module, for example: stxp70cc -o prog.exe --rmain object_files.o prog_import.ld
•
•
Two common scenarios where an export script might be generated are:
When an import script is required for the module, the export script can be generated at the same time. This is because the symbols to export are generally those that are imported.
For a load module that has a well known external interface, the export script can be generated from a list of symbols to export.
The following example shows how to generate an export script and import script for a list of modules that is then used when linking the main module. Only the symbols from liba.rl and libb.rl are imported into the main module and exported by it.
stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -e -o prog_import_export.ld liba.rl libb.rl
stxp70cc -o prog.exe --rmain object_files.o prog_import_export.ld
To generate an export script for a load module with a well defined interface specified in the file liba_export.lst (one symbol per line): stxp70-rltool -mcore=[stxp70v3|stxp70v4] -e -s -o liba_export.ld liba_export.lst
stxp70cc -o liba.rl --rlib *.o liba_export.ld
When compiling a load module with the -fpic -Mgot=small option, some overhead occurs in the generated code to access functions and data objects. Compiler options and
C language extensions can be used to reduce this overhead.
Relocatable libraries are not subject to symbol preemption, therefore, when generating position independent code, the -fvisibility=protected option can be used in addition to -fpic -Mgot. The -fvisibility=protected option enables the inlining of global functions and can be used as a default option for compiling relocatable libraries. For example: stxp70cc -o a.o -fpic -Mgot=small -fvisibility=protected a.c
In addition to this option, fine grain visibility can be specified with the
__attribute__((visibility(...))
GNU C extension at the source code level.
For example, if the external interface of a load module is well defined in a header file, the
__attribute__((visibility("protected"))
can be attached to each function of the external interface. To specify that all other defined functions are internal to the load module, on the command line, use the -fvisibility=hidden option. This combination of options optimize references from the same file to global objects that are not part of the interface.
To specify the visibility of each symbol externally with the given <file>, use the mvisibility-decl=<file>
option. In the case where the external services required by a module (default visibility) and the external services provided by the module (protected visibility) are known, all other functions or data objects can be declared as internal (hidden visibility). This option can be used to specify these visibility declarations. In this case, only
8027948 Rev 15 159/166
Relocatable loader library UM1237
the functions that are external have an associated overhead. The other internal functions have a very reduced overhead.
For a full inter-procedural optimization of the relocatable library, use the -ipa option. In this case, when combined with the declaration of external functions, the library is generated with a minimal overhead for the dynamic linking support.
For detailed information on the visibility specification, refer to the compiler options documentation and to the ELF System V Dynamic Linking ABI.
The debugging of dynamically loaded modules is possible in the same way as for System V dynamic shared objects. The main module debugging information loads at load time of the application. The load modules debugging information loads at load time of the load modules.
To update debugging information, the loader maintains a list of loaded modules together with their filenames (the file contains the debugging information) and the load address of the module. Each time a new module loads, the loader calls a specific function. The debugger has to set a breakpoint on this specific function and, when the breakpoint is hit, traverse the list to find new loaded modules and load the debugging information.
For the STxP70 toolset, the debugger implements the required mechanism for the automatic debugging of loaded modules.
To find the file that contains the debug information, the loader must know the path to the load module. This is automatic in the case of rl_load_file() as the filename is specified in the interface. For the rl_load_buffer() and rl_load_stream() functions, the user must set the filename with a call to the rl_set_file_name() function.
For example, the following code enables automatic debugging of a load module loaded with rl_load_buffer()
:
{
int status;
rl_handle_t *handle = rl_handle_new(rl_this(), 0);
if (handle == NULL) { /* error */ }
#ifdef DEBUG_ENABLED
rl_set_filename(handle, "path_to_the_file_for_the_module");
#endif
status = rl_load_buffer(handle, module_image);
}
if (status == -1) { /* error */ }
...
160/166 8027948 Rev 15
UM1237 Relocatable loader library
The action callbacks may be used with a profiling support library, or alternatively, a user defined package can be informed that a segment has just been loaded or is on the point of being unloaded by using the user action callback interface.
Below is an example that iterates over the segment list and declares the executable segments to a profiling support library on the loading/unloading of a module.
static int segment_profile(rl_handle_t *handle, rl_segment_info_t
*info,
{
rl_action_t action = *((rl_action_t *)cookie);
const char *file_name = handle_file_name(handle);
if (file_name != NULL && (info->seg_flags & RL_SEG_EXEC) {
if (action == RL_ACTION_LOAD) {
/* Call profiling interface for adding a code region. */
profiler_add_region(file_name, info->seg_addr, info-
>seg_size);
}
if (action == RL_ACTION_UNLOAD) {
/* Call profiling interface for removing a code region. */ info->seg_size);
}
}
}
return 0; static int module_profile(rl_handle_t *handle, rl_action_t action,
{
rl_foreach_segment(handle, segment_profile, (void *)&action);
}
return 0; int main()
{
...
if (rl_add_action_callback(RL_ACTION_ALL, module_profile,
NULL)==-1){
fprintf(stderr, "rl_add_Action_callback failed\n"); exit(1);
}
...
status = rl_load_file(handle, file_name);
...
}
return 0;
8027948 Rev 15 161/166
Relocatable loader library
9.9 Memory protection support
UM1237
When a new library segment has loaded into memory or is on the point of being unloaded from memory, a system library (or the user) can use the user-action callback interface to install a memory protection scheme.
To set user protection support, use the user-action callback, see
Section 9.8: Profiling support
A basic MUTEX implementation is provided in the STxP70 targeting of the pre-compiled
RL_LIB, delivered with the toolset. In addition, because there is no cache activated on the
STxP70, specific functions such as bsp_cache_purge_data and bsp_cache_invalidate_instruction (which respectively purge the data cache and handle instruction cache invalidation) are not implemented.
It is the programmer’s responsibility to implement those functions depending on the platform and STxP70 architecture used.
provides details of the files’ location in the toolset
distribution.
Table 32.
RL_LIB source file location
Functionality Source file
STxP70 v3 MUTEX implementation
STxP70 v3 Cache management
STxP70 v4 MUTEX implementation
STxP70 v4 Cache management
<RL_LIB_root>/librl/config/stxp70v3/sys_mutex.[c|h]
<RL_LIB_root>/librl/config/stxp70v3/targ_elf.[c|h]
<RL_LIB_root>/librl/config/stxp70v4/sys_mutex.[c|h]
<RL_LIB_root>/librl/config/stxp70v4/targ_elf.[c|h]
162/166 8027948 Rev 15
UM1237 Compiler bugs
This chapter describes the different categories of compiler bugs and how they should be reported to STMicroelectronics.
10.1 Identifying a compiler bug
•
•
•
•
•
•
The following cases are compiler or toolset bugs: the compilation phase ends with an assertion message the compilation phase ends with a system error message (core dump, bus error) the compilation phase produces an output that cannot be assembled the compilation phase never ends, or at least does not end in a reasonable amount of time the compiler produces an error message for code that is valid input the compiler produces code that does not compute the expected results (but see
•
The following case is possibly not a compiler or toolset bug.
The code is functional under a specific optimization level, but not under another. This may be due to an existing code bug that is only exposed by aggressive optimization.
10.2 Checks performed by user
•
•
•
•
•
The following checks should be performed on your code before reporting a bug: check that the code works correctly on at least one other compiler, on another host check that the code does not access out-of-bound memory check that the source code does not raise any warning when compiled with the -Wall option check that the source code does not make assumptions that may be false: specifically check restrict annotations, and optimization pragmas check that the code does not exercise language edges or does not violate language standards: an example of undefined behavior is to assume a specific behavior of shift operators when the shift amount is negative or bigger than the size of the type shifted
8027948 Rev 15 163/166
Compiler bugs UM1237
10.3 Workaround
The following can be carried out to temporarily work-around a compiler bug.
1.
Demote the optimization level to -O1 or -O0 when compiling the specific file creating the problem, either in category 1 or 2. (See
.)
2. Remove the optimization pragmas or restrict annotations.
3. Finally, check that you have an up-to-date compiler release.
10.4 Reporting a compiler bug
Carry out the following if a compiler bug is encountered.
1.
Obtain your compiler version by running the command stxp70cc -version.
2. If the compiler bug is in category 1 (see
), prepare a pre-processed input
file that can reproduce the problem.
3. If the compiler bug is in category 2 (see
Makefile that can reproduce the problem.
4. Supply the full command line that generates the problem.
5. Report the result of the following command in the shell that you use: uname -a.
6. Prepare a description of the expected result and the actual result.
7. Report all the above information through your local ST Field Applications Engineer
(FAE).
Finally, when in doubt, it is preferable that a possible bug is reported than ignored.
10.5 Known bugs and limitations
Please refer to the Release note supplied with the toolset for an up-to-date list of bugs and limitations.
164/166 8027948 Rev 15
UM1237 Revision history
Table 33.
Document revision history
Date Revision Changes
Earlier revision history entries deleted as they are no longer pertinent.
05-Mar-2012
17-May-2012
11
12
Update for STxP70 toolset 2012.1.
Updated
to remove references to STxP70 assembler documents. The assembler as is documented in the GNU documents, supplied with the toolset.
Updated
to change -INLINE:none to -INLINE:off.
Added the option -INLINE:size_static and updated the description of -
INLINE:all
.
Added
Inlining static functions on page 57
.
Update for STxP70 toolset 2012.1 patch 001.
Table 15: Code generation options on page 31
and added -mlib-nofloat.
Table 19: C99 support in stxp70cc on page 42
updated throughout.
19-Sep-2012
28-Jan-2013
13
14
Update for STxP70 toolset 2012.2.
Updated
to add config options bypass and bhb.
Updated
to add -o4 optimization option.
Updated
, --deadcode and -f[no]unroll-loops options.
Updated
to add -maggressive_unroll option.
Updated
optimization levels.
Updated
, -INLINE:size_static to add -o4 optimization level.
Added
Section 4.2: Loop unrolling on page 63
.
Updated
, -IPA:mem_placement to include -o4 optimization level.
Updated
Section 6.4: Restrictions on page 112
Updated
Section 8.2.1: Compiler options on page 122
Update for STxP70 toolset 2012.2. Update 01.
Updated
to add mode argument.
Updated
to expand description of RL_ERR_MEM error code.
Updated
Section 9.5.1: Memory allocation on page 157
08-May-2013 15
Update for STxP70 toolset 2013.1.
Corrected syntax for FPx registers in
Chapter 6: GNU ASM on page 109
Added options to control warnings generated for -fpack-struct in
and updated description of -fpack-struct in
.
Updated the description of -f[no-]math-errno in
reflect its changed behavior in this toolset release.
Added GNU assembly parsing options at the end of
.
Added
Section 6.8: Parsing and optimization of GNU assembly statement on page 114
.
8027948 Rev 14 165/166
UM1237
Please Read Carefully:
Information in this document is provided solely in connection with ST products. STMicroelectronics NV and its subsidiaries (“ST”) reserve the right to make changes, corrections, modifications or improvements, to this document, and the products and services described herein at any time, without notice.
All ST products are sold pursuant to ST’s terms and conditions of sale.
Purchasers are solely responsible for the choice, selection and use of the ST products and services described herein, and ST assumes no liability whatsoever relating to the choice, selection or use of the ST products and services described herein.
No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted under this document. If any part of this document refers to any third party products or services it shall not be deemed a license grant by ST for the use of such third party products or services, or any intellectual property contained therein or considered as a warranty covering the use in any manner whatsoever of such third party products or services or any intellectual property contained therein.
UNLESS OTHERWISE SET FORTH IN ST’S TERMS AND CONDITIONS OF SALE ST DISCLAIMS ANY EXPRESS OR IMPLIED
WARRANTY WITH RESPECT TO THE USE AND/OR SALE OF ST PRODUCTS INCLUDING WITHOUT LIMITATION IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE (AND THEIR EQUIVALENTS UNDER THE LAWS
OF ANY JURISDICTION), OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
ST PRODUCTS ARE NOT AUTHORIZED FOR USE IN WEAPONS. NOR ARE ST PRODUCTS DESIGNED OR AUTHORIZED FOR USE
IN: (A) SAFETY CRITICAL APPLICATIONS SUCH AS LIFE SUPPORTING, ACTIVE IMPLANTED DEVICES OR SYSTEMS WITH
PRODUCT FUNCTIONAL SAFETY REQUIREMENTS; (B) AERONAUTIC APPLICATIONS; (C) AUTOMOTIVE APPLICATIONS OR
ENVIRONMENTS, AND/OR (D) AEROSPACE APPLICATIONS OR ENVIRONMENTS. WHERE ST PRODUCTS ARE NOT DESIGNED
FOR SUCH USE, THE PURCHASER SHALL USE PRODUCTS AT PURCHASER’S SOLE RISK, EVEN IF ST HAS BEEN INFORMED IN
WRITING OF SUCH USAGE, UNLESS A PRODUCT IS EXPRESSLY DESIGNATED BY ST AS BEING INTENDED FOR “AUTOMOTIVE,
AUTOMOTIVE SAFETY OR MEDICAL” INDUSTRY DOMAINS ACCORDING TO ST PRODUCT DESIGN SPECIFICATIONS.
PRODUCTS FORMALLY ESCC, QML OR JAN QUALIFIED ARE DEEMED SUITABLE FOR USE IN AEROSPACE BY THE
CORRESPONDING GOVERNMENTAL AGENCY.
Resale of ST products with provisions different from the statements and/or technical features set forth in this document shall immediately void any warranty granted by ST for the ST product or service described herein and shall not create or extend in any manner whatsoever, any liability of ST.
ST and the ST logo are trademarks or registered trademarks of ST in various countries.
Information in this document supersedes and replaces all information previously supplied.
The ST logo is a registered trademark of STMicroelectronics. All other names are the property of their respective owners.
© 2013 STMicroelectronics - All rights reserved
STMicroelectronics group of companies
Australia - Belgium - Brazil - Canada - China - Czech Republic - Finland - France - Germany - Hong Kong - India - Israel - Italy - Japan -
Malaysia - Malta - Morocco - Philippines - Singapore - Spain - Sweden - Switzerland - United Kingdom - United States of America
www.st.com
166/166 8027948 Rev 15
advertisement
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Related manuals
advertisement
Table of contents
- 145 rl_handle_new
- 145 rl_handle_delete
- 146 rl_this
- 146 rl_parent
- 146 rl_load_addr
- 146 rl_load_size
- 147 rl_file_name
- 147 rl_set_file_name
- 148 rl_load_buffer
- 148 rl_load_file
- 149 rl_load_stream
- 150 rl_unload
- 151 rl_sym
- 151 rl_sym_rec
- 152 rl_foreach_segment
- 153 rl_add_action_callback
- 154 rl_delete_action_callback
- 154 rl_errno
- 156 rl_errarg
- 156 rl_errstr