STxP70 compiler - STMicroelectronics

UM1237

User manual

STxP70 compiler

Overview

The purpose of the STxP70 compilation driver (stxp70cc) is to manage the stages of the compilation process: preprocessing, compiling into assembly language, assembling and linking. The assembler file is compiled using stxp70-as and linked using stxp70-ld to provide an STxP70 binary image. All these phases are hidden using the driver tool

stxp70cc.

•

•

•

•

•

•

This user manual provides detailed information to enable users to write efficient code optimized to run on the STxP70 processors and to compile and link it ready for execution by

sxrun. The manual covers:

stxp70cc driver options pragmas supported by stxp70cc compiler optimization techniques

GNU C language extensions

GNU asm construct built-in functions

The load/run tool sxrun and the STxP70 debugger sxgdb are described in the STxP70

Professional Toolset user manual (7833754).

May 2013 8027948 Rev 15 1/166

www.st.com

Contents

Contents

UM1237

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Documentation suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Conventions used in this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1

2

STxP70 development system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.1

Toolset overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2

Toolset software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.1

1.2.2

Example command-lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Compiling code for STxP70-3 or STxP70-4 . . . . . . . . . . . . . . . . . . . . . . 14

stxp70cc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1

Invoking the compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1

Input and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2

Command-line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1

2.2.2

2.2.3

2.2.4

2.2.5

2.2.6

2.2.7

2.2.8

Getting help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Overall options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

stxp70cc core selection option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 stxp70cc compiler generic options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

C preprocessor options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

C dialect options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Warning options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Debugging options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.9

Profiling options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.10

Code coverage options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.11

Call trace instrumentation options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.12

Optimization options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.13

Code generation options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.14

-OPT options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.2.15

Inlining options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.2.16

Interprocedural analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.2.17

Position independent code generation (PIC) . . . . . . . . . . . . . . . . . . . . . 38

2.2.18

Sending options to a specific phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2/166 8027948 Rev 15

UM1237

3

4

Contents

2.2.19

Directory and library options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.2.20

Environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.3

Predefined macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4

C99 support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.1

Pragmas short description and syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2

Loop optimization pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.1

3.2.2

3.2.3

3.2.4

3.2.5

3.2.6

3.2.7

#pragma unroll (n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

#pragma ivdep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

#pragma loopdep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

#pragma loopmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

#pragma looptrip (n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

#pragma hwloop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

#pragma loopmin<itercount> (minc) . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2.8

3.2.9

#pragma loopmax<itercount> (maxc) . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Code generation pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2.10

Heuristic pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.3

Miscellaneous pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.3.1

3.3.2

3.3.3

3.3.4

3.3.5

3.3.6

#pragma ident “string” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

#pragma weak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

#pragma disable_extgen ( fct1, fct2, ... ) . . . . . . . . . . . . . . . . . . . . . . . . 53

#pragma force_extgen ( fct1, fct2, ... ) . . . . . . . . . . . . . . . . . . . . . . . . . . 53

#pragma disable_specific_extgen ( extname[, fct1, fct2, ... ] ) . . . . . . . . 53

#pragma force_specific_extgen ( extname[, fct1, fct2, ... ] ) . . . . . . . . . 54

Optimization guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.1

Inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.1.1

4.1.2

4.1.3

4.1.4

Single file inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

stxp70cc inlining options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Extern inline functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Inlining pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2

Loop unrolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.1

4.2.2

4.2.3

Default unrolling policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Advanced control of the unroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Precedence rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

8027948 Rev 15 3/166

Contents

5

UM1237

4.2.4

Built-in assume and pragma loopmod . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.3

Memory dependences in C programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4

Aliasing rules in C/C++ programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5

Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.5.1

4.5.2

4.5.3

4.5.4

Profiling data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Using profiling data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Special case of programs that never exit . . . . . . . . . . . . . . . . . . . . . . . . 71

Amount of heap required for profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.6

Code coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.7

Call trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.7.1

4.7.2

Instrumenting functions: -finstrument-functions . . . . . . . . . . . . . . . . . . . 74

Instrumenting calls to functions: -minstrument-calls . . . . . . . . . . . . . . . 74

4.8

Interprocedural analysis optimization (IPA) . . . . . . . . . . . . . . . . . . . . . . . 76

4.8.1

4.8.2

4.8.3

Using IPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

IPA command line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Limitations and special cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.9

Floating-point code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.9.1

4.9.2

4.9.3

4.9.4

Precision of floating-point arithmetic in programs . . . . . . . . . . . . . . . . . 79

Controlling the precision of floating-point . . . . . . . . . . . . . . . . . . . . . . . . 79

Use of STxP70 with FPx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Examples of floating-point arithmetic on the STxP70 . . . . . . . . . . . . . . 80

4.10

Application configuration files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.10.1

General description and purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.10.2

Description and syntax of an ACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.10.3

ACF grammar description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.10.4

Using the ACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.10.5

Behavior of -macf-template option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.10.6

Scope and known limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

GNU C extensions supported by stxp70cc . . . . . . . . . . . . . . . . . . . . . . 90

5.1

Extensions to the C language family . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.1.1

5.1.2

5.1.3

5.1.4

5.1.5

Statements and declarations in expressions . . . . . . . . . . . . . . . . . . . . . 90

Locally declared labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Labels as values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Naming an expression's type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Referring to a type with typeof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4/166 8027948 Rev 15

UM1237

6

Contents

5.1.6

5.1.7

5.1.8

Generalized Lvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Conditionals with omitted operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Double-word integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.1.9

Hexadecimal floats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.1.10

Specifying a register for a local variable . . . . . . . . . . . . . . . . . . . . . . . . 93

5.1.11

Array of length zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.1.12

Array of variable length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.1.13

Macro with variable number of arguments . . . . . . . . . . . . . . . . . . . . . . . 95

5.1.14

Strings literals with embedded newlines . . . . . . . . . . . . . . . . . . . . . . . . 97

5.1.15

Non-Lvalue arrays may have subscripts . . . . . . . . . . . . . . . . . . . . . . . . 97

5.1.16

Arithmetic on void and function pointers . . . . . . . . . . . . . . . . . . . . . . . . 98

5.1.17

Non-constant initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.1.18

Compound literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.1.19

Designated initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.1.20

Case ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.1.21

Cast to a union type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.1.22

Dollar signs in identifier names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.1.23

Prototypes and old-style function definitions . . . . . . . . . . . . . . . . . . . . 100

5.1.24

C++ comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.1.25

Character ESC in constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.1.26

Inquiring on alignment of types or variables . . . . . . . . . . . . . . . . . . . . 100

5.1.27

Incomplete enum type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.1.28

Function names as strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.2

Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.2.1

5.2.2

5.2.3

5.2.4

5.2.5

Placement and layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Visibility attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Miscellaneous attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

Built-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

GNU ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.1

Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.2

Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111

6.3

Volatile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111

6.4

Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112

6.5

Differences between the STxP70 core versions . . . . . . . . . . . . . . . . . . .112

8027948 Rev 15 5/166

Contents

7

8

UM1237

6.6

GNU ASM optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112

6.7

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113

6.8

Parsing and optimization of GNU assembly statement . . . . . . . . . . . . . .114

Built-in functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.1

Header files and C-models files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115

7.2

Naming built-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116

7.2.1

7.2.2

General naming scheme, relevant files . . . . . . . . . . . . . . . . . . . . . . . . 116

Types and special built-ins for audio scalar/SIMD extensions . . . . . . . 117

7.3

Using built-ins from C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.3.1

7.3.2

7.3.3

Using built-ins on an STxP70 platform . . . . . . . . . . . . . . . . . . . . . . . . 120

Standard use of built-in C-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Use of built-in C-models on STxP70 target . . . . . . . . . . . . . . . . . . . . . 121

MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.1

Goal of the MPx scalar support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2

Control of the MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2.1

8.2.2

Compiler options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Function pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.3

Scope of the MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.3.1

8.3.2

8.3.3

8.3.4

Built-in based support with MPx_Vx type . . . . . . . . . . . . . . . . . . . . . . 123

Support of type equivalence between long long and MPx_Vx . . . . . . . 124

Automatic MPx code generation on long long arithmetic . . . . . . . . . . . 124

Pattern recognition for integer and fractional data types . . . . . . . . . . . 125

8.4

Type equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.5

Automatic code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.5.1

8.5.2

8.5.3

Scope and principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Operations mapped to single MPx instructions . . . . . . . . . . . . . . . . . . 128

Operations mapped to meta-instructions . . . . . . . . . . . . . . . . . . . . . . . 128

8.6

Important remarks and known limitations . . . . . . . . . . . . . . . . . . . . . . . . 129

8.6.1

8.6.2

8.6.3

8.6.4

8.6.5

8.6.6

Avoid mixing MPx and long long . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Long long passed as function parameters . . . . . . . . . . . . . . . . . . . . . . 129

Long long life span crossing function call . . . . . . . . . . . . . . . . . . . . . . 129

Efficiency of code in meta-instructions . . . . . . . . . . . . . . . . . . . . . . . . . 130

Mapping exact conversions and single statement expressions . . . . . . 130

Limitations regarding mapping of fractional instructions . . . . . . . . . . . 131

6/166 8027948 Rev 15

UM1237

9

10

Contents

8.6.7

Unsupported mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

8.7

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8.7.1

8.7.2

8.7.3

Direct mapping of long long arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 132

Meta-instruction, case of a long long max . . . . . . . . . . . . . . . . . . . . . . 133

Case of the 32-bit multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Relocatable loader library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

9.1

Introduction to dynamic linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

9.1.1

9.1.2

9.1.3

9.1.4

Position-independent code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Import stubs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

The dynamic loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

9.2

Calling sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9.2.1

9.2.2

Direct calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Indirect calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

9.3

Introduction to the relocatable loader library . . . . . . . . . . . . . . . . . . . . . 142

9.3.1

9.3.2

Run-time model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Relocatable run-time model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

9.4

Relocatable loader library API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

9.4.1

9.4.2

rl_handle_t type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Function descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

9.5

Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9.5.1

9.5.2

Memory allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

File management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9.6

Building a relocatable library or main module . . . . . . . . . . . . . . . . . . . . 157

9.6.1

9.6.2

Importing and exporting symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Optimization options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

9.7

Debugging support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

9.8

Profiling support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

9.9

Memory protection support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

9.10

STxP70 targeting of RL_LIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Compiler bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

10.1

Identifying a compiler bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

10.1.1

Category 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

10.1.2

Category 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

8027948 Rev 15 7/166

Contents

11

UM1237

10.2

Checks performed by user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

10.3

Workaround . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

10.4

Reporting a compiler bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

10.5

Known bugs and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

8/166 8027948 Rev 15

UM1237

Preface

Preface

This document is part of the documentation suite detailed below. Comments on this or other manuals in the documentation suite should be made by contacting your local

STMicroelectronics sales office or distributor.

Documentation suite

STxP70 compiler user manual (8027948)

This manual describes the C compiler for STMicroelectronics STxP70 cores.

STxP70 Professional Toolset user manual (7833754)

This document explains the toolset architecture and provides information about how to develop and debug applications running on STxP70 systems.

Advanced debugging with the STxP70-4 instruction-accurate simulator

(Doc ID 024404)

This document describes the commands implemented in the instruction-accurate simulator for debugging applications.

STxP70 utilities reference manual (8210925)

This document provides in a single volume, command line reference for each of the generic,

STxP70-3 and STxP70-4 utilities provided with the STxP70 toolset that are not documented elsewhere. For each utility, the manual provides a command line synopsis, a brief description of the utility, the complete list of options that are available, and its return value.

Building STxP70 libraries application note (8226669)

This document explains how to produce a set of standard libraries for the STxP70 compilation tools optimized for the user’s specific purposes.

Conventions used in this guide

General notation

•

•

•

•

•

The notation in this document uses the following conventions: sample code

, keyboard input and file names,

variables, code variables and code comments, equations

and math,

screens, windows, dialog boxes and tool names,

instructions.

8027948 Rev 15 9/166

Preface UM1237

Software notation

•

•

•

•

•

•

•

Syntax definitions are presented in a modified Backus-Naur Form (BNF).

Terminal strings of the language, that is those not built up by rules of the language, are printed in teletype font. For example, void.

Non-terminal strings of the language, that is those built up by rules of the language, are printed in italic teletype font. For example, name.

If a non-terminal string of the language starts with a non-italicized part, it is equivalent to the same non-terminal string without that non-italicized part. For example, vspace-name

.

Each phrase definition is built up using a double colon and an equals sign to separate the two sides (‘::=’).

Alternatives are separated by vertical bars (‘|’).

Optional sequences are enclosed in square brackets (‘[’ and ‘]’).

Items which may be repeated appear in braces (‘{’ and ‘}’).

10/166 8027948 Rev 15

UM1237

1 STxP70 development system

Note:

STxP70 development system

The purpose of the stxp70cc compilation driver is to translate a program written in the C language into the STxP70 assembly language so that is suitable for assembly, linking, and execution. The assembler file is compiled using stxp70-as and linked using stxp70-ld

(a)

to provide an STxP70 binary image. All these phases are hidden using the driver tool

stxp70cc.

The stxp70cc compilation driver and core compiler are common to both STxP70 versions 3 and 4. A specific command line and GUI option can be used to generate code for either

target. See Section 1.2.2: Compiling code for STxP70-3 or STxP70-4 on page 14

.

The stxp70cc compiler uses the GNU C language parser, and implements state-of-the art compiler optimizations. Thanks to this GNU C language parser, the stxp70cc compiler is closely compatible with the GNU C compiler, both at the driver level, and on C language extensions (GNU Compiler Collection project; see

http://www.gnu.org/software/gcc/gcc.html

). The processor-independent compiler optimizations available in the stxp70cc compiler are mostly inherited from the Open64 project hosted on SourceForge; see

http://open64.sourceforge.net

. Other compiler optimizations that are specific to the STxP70 family of processors have been developed by

STMicroelectronics.

•

•

•

•

•

•

•

•

These include: use of hardware loop mechanisms of the STxP70 core (hardware loops and

JRGTUDEC instructions) use of the special addressing modes of the STxP70 core use of the memory space defined in the STxP70 ABI in order to increase memory accesses efficiency aggressive instruction selection including mapping of the user boolean variables to the branch registers instruction scheduling aggressive transformation of loops compiler intrinsics and built-ins support compiler to support X3, FPX and MPx extensions

The binary image can be executed on a STxP70 hardware target or by using the sxrun simulator or the sxgdb debugger. The binary format used for the image is ELF and the debug format is DWARF2.

Where applicable, the available options are accessible through a command-line interface similar to the UNIX style. This will be familiar to most gcc and cc users. The toolset is installed in a directory structure which also follows the UNIX structure, that is bin and lib.

Wherever possible, compatibility with the options of the former sxcc compiler has been preserved.

The compiler supports the ANSI C89 standard and partially supports the ANSI C99 standard, see

Section 2.4: C99 support on page 41

.

a. For usage information see the GNU linker document “Using ld” that is supplied with the toolset.

8027948 Rev 15 11/166

STxP70 development system UM1237

The STxP70 Professional Toolset is a set of tools that allow C programs compiled for an

STxP70 target to be simulated on a host workstation or executed on an STxP70 target.

The STxP70 Professional Toolset is mainly intended for tool developers, for operating system development and for applications that require modeling interrupts and real-time behavior. It includes the whole set of tools that manipulate STxP70 object files, including the

STxP70 assembler, compiler, linker, load/run tool, debugger and archiver. Here, STxP70 assembler files are translated to STxP70 object files that the linker merges to produce an

STxP70 executable image. This image file does not run natively on the host workstation and requires an interpreter to be executed. See

Section 1.2.1: Example command-lines on page 13

for details.

Figure 1

shows the main components of the STxP70 Professional

Toolset (when IPA is not used).

Figure 1.

Components of the STxP70 Professional Toolset interfaces

.c

source files

STxP70 C Compiler

STxP70 assembler files (.s)

STxP70 assembler

(stxp70-as)

STxP70 object file STxP70 object file

STxP70 linker

(stxp70-ld)

STxP70 binary (.elf)

STxP70 libraries target board boot and sysconf files

STxP70 load/run tool

(sxrun)

STxP70 debugger

(sxgdb)

12/166 8027948 Rev 15

UM1237

1.2 Toolset software requirements


The stxp70cc compiler produces an STxP70 object file in STxP70 object file formats (ELF).

See the relevant chapter in the STxP70 ABI manual (7937486) for details.

Assuming that we want to compile two files file1.c and file2.c into an STxP70 executable a.elf, the set of commands to issue is:

$[1] stxp70cc –c file1.c

$[2] stxp70cc –c file2.c

$[3] stxp70cc –o a.elf file1.o file2.o

This assumes that the user has sourced the appropriate shell file in the <tools-

dir>/bin

folder. In most cases, the one needed is STxP70.csh. This ensures that all needed configuration environment variables are properly set.

Command [1] causes the following steps to be executed:

<tools-dir>/bin/stxp70cc # stxp70cc driver

<tools-dir>/lib/cpp <cpp_flags> file1.c file1.i # C preprocessor

<tools-dir>/lib/cmplrs/<C compiler> <C Compiler flags> file1.i file1.s

# C compiler

<tools-dir>/bin/stxp70-as <stxp70-as_flags> file1.s file1.o # STxP70 Assembler

Command [2] causes the following steps to be executed:

<tools-dir>/bin/stxp70cc # stxp70cc driver

<tools-dir>/lib/cpp <cpp_flags> file2.c file2.i # C preprocessor

<tools-dir>/lib/cmplrs/<C compiler> <C Compiler flags> file2.i file2.s

# C compiler

<tools-dir/bin/stxp70-as <stxp70-as_flags> file2.s file2.o # STxP70 Assembler

Command [3] causes the link stage to be executed. Please refer to the STxP70 linker user

manual for further details.

Once steps [1] to [3] are completed, an STxP70 executable binary a.elf is generated. This can be executed using the stand-alone driver for the load/run tool (available as sxrun) in the following way:

$[4] sxrun a.elf

This causes the a.elf STxP70 binary to be “interpreted” by the sxrun command. The simulator also provides some minimal tracing, cycle counting and statistics facilities.

8027948 Rev 15 13/166


1.2.2 Compiling code for STxP70-3 or STxP70-4

UM1237

By default, the code is compiled for STxP70-3. However a dedicated command line option can be used to compile code for STxP70-4. In the example below, lines [1] and [2] generate a version 3 executable and line [3] generates an executable for version 4:

$[1] stxp70cc file1.c

$[2] stxp70cc -mcore=stxp70v3 file2.c


Except for a few instructions, the STxP70-3 and STxP70-4 are assembly compatible. They are not binary compatible. More details are provided in following sections.

Warning: The assembly codes provided as an example in this document make use of the STxP70-3 assembly syntax. On

STxP70-4, it is now possible to form bundles of one or two instructions. Two successive bundles must be separated by a “;;” pattern. Two successive lines not separated by a “;;” are considered as a single bundle, meaning the two instructions will be emitted in the same cycle.

14/166 8027948 Rev 15

UM1237

2 stxp70cc

stxp70cc

The stxp70cc compiler is similar to any command-line compiler. It is either invoked from a command line interpreter or from a Makefile and implicitly recognizes files by their extension.

2.1

2.1.1

Invoking the compiler

The C compiler is invoked using the stxp70cc command: stxp70cc {<argument>} where:

<argument> = <option> | <input_file>

Examples: stxp70cc -S file.c # produces file.s stxp70cc -c file.c # produces file.o

Conflicting options are resolved by using the last option on the command line.

Input and output

File extension naming conventions are summarized in

Table 1

and

Table 2

.

Table 1.

.c

.h

.i

.s

.S

Input names conventions

Extension Convention

C language source file to be pre-processed and compiled

C language header file

C language source file already pre-processed

Assembly language source file to be assembled

Assembly language source file to be pre-processed and assembled

Table 2.

Output names conventions

Extension Convention

.s

.o

Assembly language output file

Object file

Produced by option(s)

-S

-c

The final executable file does not need to have a specific file extension. If no output file name is specified through the -o option, the executable generated is named a.out.

Examples: stxp70cc file.c # generates the executable a.out stxp70cc file.c -o file.u # generates the executable file.u

8027948 Rev 15 15/166

stxp70cc UM1237

This section provides information on the command line options of stxp70cc.

If the compiler driver is given the -help option, it displays the list of available options, and then terminates.

Additionally, the -help option can be followed by an additional keyword separated from the help option by a colon. All entries matching the keyword are displayed on the standard output, for example: stxp70cc -help:W

This command displays all options containing the -W string. In this example, all options related to the emission of compiler warnings are listed.

The options in

Table 3

control the type of processing performed by stxp70cc and the output

it generates, for example: an executable, an object file, an assembler file, a pre-processed file, an archive or a dependency list.

Output files produced by these options default to

<original_file_name>.<output_extension>

and can be renamed using the -o option.

Table 3.

-c

-S

-E

-v

--version

-dumpversion

-keep

-keep_dir

Overall options

Option Description

Compile or assemble the source file, but do not link.

Stop after the compilation phase.

Stop after the preprocessing phase. Output is send to stdout.

Print on stderr the commands executed to run the compilation phases. The message generated indicates the release identity.

Display the version numbers of the invoked compiler and stop.

Print the compiler front-end version (for example, 3.3.3) and stop.

Keep intermediary files produced by the compilation phases in the current folder.

Used in combination with -keep or -Mkeepasm, this option specifies the location to be used to store intermediate files.

16/166 8027948 Rev 15

UM1237

2.2.3

stxp70cc

stxp70cc core selection option

•

•

•

The STxP70 tools delivered in the STxP70 toolset R4.0.0 and higher, support both STxP70-

3 and STxP70-4. The STxP70-4 is different from STxP70-3 in three ways: it implements a variable length encoding of the instruction set (VLIS) it implements dual issue it supports dual arithmetic and logic unit (ALU) configuration

The -mcore option must be used to select the version of the core. By default, the code is compiled for STxP70-3. The STxP70-3 and STxP70-4 are assembly compatible, except for a few instructions.

In the examples below, line [1] and [2] generate a version 3 executable and line [3] generates an executable for version 4:

$[1] stxp70cc file1.c



Table 4.

stxp70v3 stxp70v4

The core selection -mcore option


Assembly, object and binary files are generated for single issue, fixed length encoding STxP70-3

Assembly, object and binary files are generated for single/dual issue, variable length encoding STxP70-4

Note:

2.2.4

The set of options that must/can be set is strongly dependent on the core selected. This is especially true for the configuration and code generation options presented in the tables of the next section. Namely, the STxP70-4 can be configured for single or dual issue, as well as single or dual ALU. Each of those choices corresponds to specific compiler options.

stxp70cc compiler generic options

Prefixes of generic options

The options in

Table 5

provide generic means to pass fine grain options to either phase of

the compiler.

Table 5.

Generic options

Option

-Msxflag

-W<phase>,<arg>

-Y<phase>,<path>

Description

stxp70cc interprets the -Msxflag option as an extra code generation or environment option. The list of possible sxflags listed in

Table 6

. It should

be noticed that, due to the GNU front-end, the -M prefix is also used for dependency handling options.

This option is used to pass arguments to a specific phase. The phase names are p, f, b, a, l for pre-processor, front-end, back-end, assembler and linker respectively.

This option is used to change the path to one of the phases. The phase names are p, f, b, a, l, I, S, L for pre-processor, front-end, back-end, assembler, linker, include, startup, libraries respectively.

8027948 Rev 15 17/166

stxp70cc UM1237

Code generation/configuration and environment options with -M prefix

Table 6

lists the options that can be used with the -M flag. These options have a special

status, as they ensure backward compatibility with the sxcc compiler. Due to the differences in compiler internals, some options have been adapted or removed.

The options that accept further controls are described in the following pages.

Several of the options are able to place certain data items into specific areas of memory called special data areas. See

Section 5.2.1: Placement and layout on page 101

for information about the special data areas.

The options in

Table 6

correspond to code generation and environment options that can be

set using the generic -M flag.

Table 6.

Generic options with -M flag


config[=context:<n>|

regbank:<n>|

mult:<n>|

bypass:<n>|

bhb:<n>|

efuif:<n>|

mfuif:<n>|

extmemif:<n>|

itcnodes:<n>|

noevc|

evcglobal:<n>|

evclocal:<n>|

hwloop:<n>|

dcache:<n>|

dmsize:<n>|

pcache:<n>|

pmsize:<n>|

pixel:<n>|

pixelsize:<n>|

rompatch:<n>|

maxszmis:<n>|

minadmis:<n>|

vliw:<n>]

Defines the processor configuration. Further information on these

controls can be found in

Code generation and configuration controls on page 19

. The assembler performs some consistency checking based

on this configuration option.

The last option (vliw) is only available on STxP70-4.

It is possible to combine several suboptions in a single -Mconfig option bundle. In this case, suboptions must be separated by a “,”. For instance: -Mconfig=vliw:no,noevc da[={<n>|all}] sda[={<n>|all}] tda[={<n>|all}] enablefractgen

Places certain data items in the data area (GP-based on 32 Kbytes)

Places certain data items in the small data area (GP-based on 4 Kobjects)

Places certain data items in the low memory or tiny data area (32-Kbyte size)

Deprecated, and replaced by extoption. Allows the compiler to

generate fractional instructions of the MPx. Refer to

Chapter 8: MPx native support on page 122

.

18/166 8027948 Rev 15

UM1237 stxp70cc

Table 6.

Generic options with -M flag (continued)


extension[=fpx|MP1x]

[:novliw|

single|

dual] extoption=extension:

option

extrcdir=directory_

path

farcall got[=small|standard| large] hwloop[=option] itstackalign=<n> keepasm mode16 lib16 lib32 nostartup

Connects extension (MP1x, fpx), using the specified VLIW configuration. When STxP70-3 is used, only the novliw suboption can be specified.

MP1x

has been supported through intrinsics and specific types since compiler version 3.2.0. Version 3.3.0 introduces so called “native support”, which provides automatic code generation from pure C

language. Refer to


.

Pass a given option to the extension. Refer to


.

Use the stxp70extrc user-defined extension definition file from the directory found on directory_path.

Specifies that all calls and jumps are far (with absolute addresses).

Select the global offset table (GOT) model for position independent code and data (PIC and PID) generation. See

Chapter 9: Relocatable loader library on page 136

.

Controls use of hardware loops feature.

Further information can be

found in

Code generation and configuration controls on page 19

.

This option instructs the compiler to align the stack of interruption routines to the specified boundary, as a number of bytes. Default is 8

(that is, 64 bits).

Preserve intermediate files. Files are located in local folder by default.

The -keep_dir option can be used in combination to specify a different folder where intermediate files must be stored.

Instructs the compiler to use the 16 register set (instead of the default

32 register set). Notice that contrary to the -Mconfig option, this option is not a configuration option. No checking is made at assembler level regarding the register indices.

Instructs the compiler to link in a version of the C library that uses the

16 register set.

Instructs the compiler to link in a version of the C library that uses the

32 register set.

Instructs the linker not to use standard boot sequence but one provided by the user.

The control for the options listed in

Table 6

can be found in

Code generation and configuration controls

and

Environment controls on page 25

.

Code generation and configuration controls

The code generation and configuration controls are listed below.

-Mconfig[=context:<n>|regbank:<n>|mult:<n>|efuif:<n>|mfuif:<n>|

extmemif:<n>|itcnodes:<n>|noevc|evcglobal:<n>|evclocal:<n>|

hwloop:<n>|dcache:<n>|dmsize:<n>|pcache:<n>|pmsize:<n>|pixel:<n>|

pixelsize:<n>|rompatch:<n>|maxszmis:<n>|minadmis:<n>|vliw:<n>]

Use -Mconfig to specify the configuration of the STxP70 core IP. The subflags to this option are listed in

Table 7

.

8027948 Rev 15 19/166

stxp70cc UM1237

Table 7.

Subflags allowed in the -Mconfig option

Subflag Description

context:<n> regbank:<n> mult:<n> bypass:<n> bhb:<n> efuif:<n> mfuif:<n> extmemif:<n>

Defines context number. Where n can be: 1 | 2 | 4 | 8.

Defines register bank number. Where n can be: 1 | 2.

Defines multiplier implementation. Where n can be: yes | no.

Note that using the FPX enables the multiplier as well.

Defines memory bypass configuration. Where n can be: no | mem2_exe

. (mem2_exe indicates a bypass is implemented between the memory2 and execution stages of the pipeline. When a bypass is present, the load-use penalty is one cycle instead of two cycles when the bypass is not implemented.)

Defines branch history buffer configuration. Where n can be: yes | no

.

Defines extension functional unit interface width. Where n can be: no | 32 | 64 | 128 | 256 | 512

.

Defines MFU interface width. Where n can be: no | 32 | 64 | 128 | 256 | 512

.

Defines external memory interface width. Where n can be: no | 32 | 64

.

itcnodes:<n>

Defines ITC number of nodes. Where n can be: no | 8 | 16 | 32.

noevc implementation.

evcglobal:<n> evclocal:<n>

Defines EVC number of global events. Where n can be:

4 | 8 | 16 | 32

.

Defines EVC number of local events. Where n can be:

4 | 8 | 16 | 32

.

hwloop:<n> dmsize:<n>

Defines hardware loop implementation. Where n can be: no | bycxt | forall

.

Defines data memory size. Where n can be: no | 512 | 1k | 2k | 4k | 8k | 16k | 32k | 64k |

128k | 256k | 512k | 1M | 2M | 4M

.

dcache:<n> pmsize:n<n> pcache:<n> pixel:<n> pixelsize:<n> rompatch:<n> maxszmis:<n>

Defines data cache implementation. Where n can be: yes | no.

Defines program memory size. Where n can be: no | 512 | 1k | 2k | 4k | 8k | 16k | 32k | 64k |

128k | 256k | 512k | 1M | 2M | 4M

.

Defines program cache implementation. Where n can be: yes | no.

Defines the pixel mode implementation. Where n can be: yes | no.

Defines the pixel data size. Where n can be: 8 | 10 | 12 | 14.

Defines the ROM patch controller implementation. Where n can be: yes | no

.

Defines the size of the largest memory access supporting misalignment. Where n can be no | 2 | 4 | 8 | 16 | 32 | 64.

20/166 8027948 Rev 15

UM1237 stxp70cc

Table 7.

Subflags allowed in the -Mconfig option (continued)

Subflag Description

minadmis:<n> vliw:<n>

Defines the minimal address alignment at which misaligned memory accesses are supported. Where n can be: no | 2 | 4 | 8 | 16 | 32

.

This STxP70-4 specific option indicates the number of issues and

ALUs available on the core. The value of n can be: no | singlecoreALU | dualcoreALU

. The values of those options must be interpreted as follows:

– no: the core is single issue, single ALU,

– singlecoreALU: the core is dual issue, single ALU,

– dualcoreALU: the core is dual issue, dual ALU.

If the vliw option is not set and code is compiled for STxP70-4, then the default behavior corresponds to -Mconfig=vliw:no.

By default, -Mconfig enables four contexts, two register banks, multiplier, no memory bypass, no branch history buffer (BHB), 32-bit EFU interface, 32-bit MFU interface,

32-bit external memory interface, eight ITC nodes, EVC with 16 global and 16 local events, two hardware loops for all contexts, 4 Mbytes data memory, no data cache,

4 Mbytes program memory, no pcache, no pixel support, no ROM patch support, no misaligned memory access and single issue architecture.

-Mda[={ <n> | all }]

Place data objects of aggregate alignment <= n bytes in the region of memory called the medium data area (DA). It is possible to generate optimized (that is, shorter) addresses for data in the medium data area. (GP-based addugp is used instead of

make and more.)

The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size constraint. -Mda is equivalent to -Mda=all.

Notice that -Mda options are ignored if IPA memory placement is enabled. Refer to

Section 4.8: Interprocedural analysis optimization (IPA) on page 76

for further details.

-Msda[={ <n> | all }]

Place data objects of aggregate alignment <= n bytes in the region of memory called the small data area (SDA). It is possible to generate optimized (that is, shorter) addresses for data in the small data area. (GP-based addressing mode can be used, thus constructing the address and performing the access itself in the same instruction.)

The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size constraint. -Msda is equivalent to -Msda=all.

In the case of a structure that contains fields of different types, the decision of where to place the variable depends on the alignment of the largest data types, whereas the choice of the section to be used depends on the size of the smallest field. This means that a structure with both int and char fields is placed if option is either -Msda=all or -Msda=4. If placement is achieved, then the structure is placed in SDA1.

Notice that -Msda options are ignored if IPA memory placement is enabled. Please refer to



8027948 Rev 15 21/166

stxp70cc UM1237

-Mtda[={ <n> | all }]

Place data objects of aggregate alignment <= n bytes in the region of memory called the low memory data area (TDA). It is possible to generate optimized (that is, shorter) addresses for data in the low memory area. Addresses in the TDA area are encoded using a maximum of 15 bits and therefore may be constructed using a single make instruction.

The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size constraint. -Mtda is equivalent to -Mtda=all.

-Mdarange=[minSize],maxSize

Use data area (DA) addressing mode on selected variables with a size between

minSize

and maxSize bytes.

-Msdarange=[minSize],maxSize

Use small data area (SDA) addressing mode on selected variables with a size between

minSize


-Mtdarange=[minSize],maxSize

Use tiny data area (TDA) addressing mode on selected variables with a size between

minSize


-Menablefractgen

Enables generation of the fractional instructions when MP1x is present. This option was formerly named -Mfractsupport. These two options are now deprecated, and

replaced by the suboption -Mextoption. Refer to


for further details on this option.

-Mextension[=fpx|MP1x][:novliw|single|dual]

Only the X3 extension is connected by default. (This means that the corresponding option x3 is no longer available.)

Connect extension fpx to the core to enable floating point arithmetic. Activating fpx allows the compiler to generate floating point extension specific instructions, which includes native floating point (32-bit) arithmetic instructions and some integer instructions (such as multiply, divide) that completes core integer support.

MP1x

has been supported in the compiler since version 3.2.0 using built-in functions and specific data types. Version 3.3.0 introduces the so-called “native support” of the

MPx extension. This means that the compiler can generate code that makes use of

MPx registers and instruction from pure C code (that is, even if no MPx built-in functions and types are present). More details can be found in


.

The vliw configuration can be specified for the extension. On extension for STxP70-3, only the novliw configuration can be used.

-Mextoption

Used to pass different options to the extensions. Refer to


for further details on this option.

22/166 8027948 Rev 15

UM1237 stxp70cc

-Mextrcdir=directory_path

Specifies where to find a particular extension package, which may be a location outside the user workspace. The -Mextrcdir option enables the user to switch between different extensions, stored in different locations. Full directory paths are recommended but are not mandatory.

The directory path specified to -Mextrcdir must include the sub-directory _STxP70-

Extension_

where the stxp70extrc file is located. (This is the directory/file structure used by sximport when the extension is imported. sximport creates/updates an extension configuration file called stxp70extrc and puts it in the subdirectory _STxP70-Extension_. stxp70extrc indicates where different files relating to the extension are located, for example header files, libraries).

For example: stxp70cc -Mextension=MP1x -Mextrcdir=My_Extrcdir/_STxP70-Extension_

This command sets the directory path to find the extension package in

My_Extrcdir/_STxP70-Extension_

.

The compiler checks that the location specified by -Mextrcdir contains the file stxp70extrc

.

If the -Mextrdir option is not specified, the {SX}/sxext/_STxP70-Extension_ directory is used by default.

The STxP70 Utilities manual (8210925) documents several utilities that interact with the extension package, for example sximport, stxp70-elfdump, stxp70objcopy

.

The STxP70 User-defined extension methodology guide (8175272), “How to integrate

an Extension in an application” chapter, gives further information about extension libraries.

-Mfarcall

Specify that all calls are far. The compiler generates a calling sequence composed of a make/more/calla

sequence instead of callr.

•

•

-Mhwloop[=option]

Controls hardware loop code generation. The default, (-Mhwloop specified with no suboptions), is equivalent to:

-Mhwloop=all

if core configuration includes hardware loops

-Mhwloop=jrgtudeconly

if core configuration does not include hardware loops

option

can be any of the values listed in

Table 8

.

8027948 Rev 15 23/166

stxp70cc UM1237

Table 8.

List of options for -Mhwloop


none

Disables hardware loop and special jump code generation. By hardware loop, we mean setle/ls/lc structures; by special jump, we mean jrgtudec special jumps. However, hardware loops forced by means of pragmas are still generated if supported by core configuration.

jrgtudeconly setle/ls/lc

hardware loop code generation.

However, hardware loops forced by means of pragmas are still generated if supported by core configuration.

hwlooponly jrgtudec

special jumps loop code generation. A warning is generated if core configuration does not have hardware loops.

all

Enables hardware loops for all loops wherever possible. A warning is generated if core configuration does not have hardware loops.

Hwloops are discarded in -O0 and -O1.

-Mitstackalign=<n>

By default, the stack of interruption routines (IT) is aligned to an 8-byte/64-bit boundary.

As a consequence, extra instructions are added to IT prolog and epilog to handle this realignment. Since IT are often speed-critical parts of code, this may be a severe drawback.

This option instructs the compiler to align the stack of IT to a smaller boundary

(typically: 4 bytes/32 bits) to avoid the overhead in prolog and epilog of those routines.

Several methods are provided for controlling the alignment of the stack. For interruption routines, the precedence is as follows, in decreasing order:

– aligned_stack

attribute, which specifies the alignment of the stack of a given interruption routine

– interrupt_nostackalign

attribute, which indicates that the stack of a given interruption routines is to be aligned on a 4-byte/32-bit boundary

–

-Mitstackalign

option

– default (8 bytes/64 bits)

For any other function (not an interruption routine), the precedence is as follows, in decreasing order:

– aligned_stack attribute

– default (8 bytes/64 bits)

You may want to refer to

Section 5.2: Attributes on page 101

for further details on the

attributes which control the alignment of the stack of functions and interruption routines.

-Mmode16

The STxP70 compiler generates code for a context with 32 registers. Selecting the -

Mmode16

option switches to context with 16 registers. Note that the impact of this option is slightly different than that of -Mconfig=regbank:1. Namely, no assumption is made on the core configuration regarding register banks, and no checking is performed at assembly level to ensure that only the lower bank is used.

24/166 8027948 Rev 15

UM1237

2.2.5

stxp70cc

-Mnoextgen[=ext1,ext2,...]

Disables the code generation for specified extensions. This option has only effect when

MPx

are used. It has no effect with fpx.

Environment controls

The environment controls are listed below.

-Mlib16

-Mlib32

-Mnostartup

Instructs the compiler to link with a version of the C library that uses

16 registers of the core. This is the default behavior when using

16 registers contexts.

Instructs the compiler to link with a version of the C library that uses

32 register set of the core.

Instructs the linker not to use standard boot.o file at link time. It is then the user’s responsibility to provide a boot object file at link time.

C preprocessor options

The preprocessor is run on each C source file before actual compilation. The options in

Table 9

control how the sources are preprocessed.

Table 9.

Preprocessor options


-E

-C

-CC

-P

-Ddef

-Ddef=defn

-M

-MM

-MG

-H

-dM

-dD

-dN

-fpreprocessed

Only the preprocessor is run.

The preprocessor does not discard comments.

The preprocessor copies comments inside macros to the output file when the macro is expanded. This is intended for use by applications which place metadata or directives inside comments. Use with the -E option.

The preprocessor does discard #line information. Use with the -E option.

Define the macro definition with the string 1 as the definition.

Define the macro definition as defn.

Generates a list of object file dependencies suitable for a makefile.

Similar to -M, but ignores system header files, that is, header files included by <header.h>.

Along with -M or -MM, treat missing files as generated in the local directory.

Display the name and path of the header in use.

Print a list of macro definitions in use after preprocessing. Use with the -E option.

Print a list of macro definitions in use while preprocessing. Use with the -E option.

Same as -dD, except that the macro arguments are not shown. Use with the

-E

option.

Indicate to the preprocessor that the input file has already been preprocessed.

8027948 Rev 15 25/166

stxp70cc

2.2.6

UM1237

C dialect options

The option -std=value instructs the compiler front-end to select the appropriate C language dialect to use. For instance, the C99 restrict keyword is only recognized with the -std=c99 option. However, this keyword also exists as a GNU extension keyword, either __restrict or __restrict__ that are recognized by default. Possible values for std

are listed in

Table 10

.

Table 10.

C dialect options


-std=iso9899:1990

-std=iso9899:199409

-std=iso9899:1999

-std=c89

-std=c99

-std=gnu89

-std=gnu99

Same as -ansi

ISO C as modified in amendment 1

ISO C 99

Same as -std=iso9899:1990

Same as -std=iso9899:1999

This is the default, iso9899:1990 + gnu extensions iso9899:1999 + gnu extensions

Note:

Diagnostic messages can be requested from the compiler to notify potentially erroneous or dangerous C program constructions.

Table 11

lists a subset of the GCC options.

Table 11.

General warning options

-Wall

Option

-w

-Werror

-pedantic

-pedantic-error

Description

Enables all warnings.

Disables all warnings.

Turns warnings into errors.

Issues all warnings needed for strict ANSI C compliance.

Turn all pedantic warnings into errors.

All the options in

Table 12

give the positive form of the option. The negative form of each option can be constructed by replacing the -W prefix with a -Wno prefix, for example -

Wnoformat

disables the printing of warning messages associated with calls to the printf and scanf family of library functions.

The online help and “man” page of the stxp70cc driver lists the full set of possible warning options.

26/166 8027948 Rev 15

UM1237 stxp70cc

Table 12.

Detailed warning options

Option

-m[no-]warn-packstruct

-m[no-]warn-smartpackstruct

-Waggregate-return

-Wbad-function-cast

-Wcast-align

-Wcast-qual

-Wchar-subscripts

-Wcomment

-Wconversion

Description

-mwarn-packstruct

this option enables the emission of warnings/errors when option -fpack-struct is set (see

Table 15 on page 31

). The warnings emitted are the most

conservative ones, and based on the evaluation of a risk that a misalignment occurs.

-mno-warn-packstruct

this option disables the emission of warnings/errors when option -fpack-struct is set. This is the default behavior.

-mwarn-smart-packstruct

this option enables only the emission of smarter warnings/errors when option

-fpack-struct

is set (see


). The

warnings are more accurate ones: some of them are filtered if the compiler can assess that a misalignment cannot occur, due to the layout of the structure.

-mno-warn-smart-packstruct

this option disables the emission of smarter warnings/errors when option

-fpack-struct

is set. This is the default behavior.

Warn if any functions that return structures or unions are defined or called.

Warn whenever a function call is cast to a non-matching type.

Warn whenever a pointer is cast such that the required alignment of the target is increased.

Warn whenever a pointer is cast so as to remove a type qualifier from the target type.

Warn if an array subscript has type char.

Warn if nested comments are detected.

Warn if a prototype causes a type conversion that is different from what would happen in the absence of a prototype.

-Werror-implicit-functiondeclaration

Output error when a function is used but not declared.

-Wformat

-Wimplicit

Check calls to the printf and scanf family of library functions.

-Wimplicit-int

and -Wimplicit-functiondeclaration

.

-Wimplicit-functiondeclaration

Warn when a function is used but not declared.

-Wimplicit-int

-Wlarger-than-number

-Wlong-long

-Wmissing-braces

-Wmissing-declarations

Check that all declarations specify a type, which is int by default in C89.

Warn if an object is larger than number bytes.

Warn if long long type is used. Only active along with pedantic

.

Warn if an aggregate or union initializer is not fully bracketed.

Warn if a global function is defined without a previous declaration.

8027948 Rev 15 27/166

stxp70cc UM1237

Table 12.

Detailed warning options (continued)

-Wmissing-noreturn

-Wmissing-prototypes

-Wmultichar

-Wnested-externs

-Wpacked

-Wpadded

-Wparentheses

-Wpointer-arith

-Wredundant-decls

-Wreturn-type

-Wshadow

-Wsign-compare

-Wstrict-prototypes

-Wswitch

-Wtrigraph

-W[no-]uninitialized

-Wunknown-pragmas

-Wunused

Option

-Wwrite-strings

Description

Warn about functions which might be candidates for attribute noreturn

.

Warn if a global function is defined without a previous prototype declaration.

Warn if a multi-character constant is used.

Warn if an extern declaration is encountered within a function.

Warn if a structure is given the packed attribute, but the packed attribute has no effect on the layout or size of the structure.

Warn if padding is included in a structure, either to align an element of the structure or to align the whole structure.

Warn if parentheses are omitted in certain contexts.

Warn about anything that depends on the “size of” a function type or of void.

Warn if anything is declared more than once in the same scope.

Warn when a function is defined with a return-type that defaults to int.

Warn whenever a local variable shadows another variable.

Warn when a comparison between signed and unsigned values could produce an incorrect result.

Warn if a function is declared or defined without specifying the argument types.

Warn whenever a switch statement may be incomplete.

Warn if any trigraphs are encountered that might change the meaning of the program.

Warn if an un-initialized automatic variable is detected.

Optimization must be enabled (see

Section 2.2.12 on page

29

) in order for -Wuninitialized or -Wall to report uninitialized variables. See also the entries for the -trapuv and

-zerouv

options in

Section 2.2.13 on page 31

.

-W[no-]uninitialized

instructs the compiler not to warn about uninitialized variables.

Warn when a #pragma is encountered which is not understood by stxp70cc.

Warn whenever a static function, a label, a parameter, a value is not used.

Warn when trying to write to a string constant.

28/166 8027948 Rev 15

UM1237 stxp70cc

Note:

The -g option instructs stxp70cc to generate symbolic information for debugging. DWARF2 format is used.

The -g option may be used with optimization up to level -O2 and with -Os (see

Section 2.2.12: Optimization options ).

Minimal debug information (that is, call frames) are generated whatever options are selected.

The STxP70 compiler (version 3.4.0 and higher) supports profiling options. The dedicated pg

option instructs the compiler to generate gprof profiling information. See

Section 4.5:

Profiling on page 70

for more information on this topic.

2.2.10 Code coverage options

•

•

The stxp70cc compiler (version 3.4.0 and higher) supports code coverage options. Two options are provided.

The -ftest-coverage option instructs the compiler to generate code coverage file for the GNU gcov code coverage utility.

The -fprofile-arcs option instructs the compiler to generate information that allows gcov to reconstruct the program flow graph.

See

Section 4.6: Code coverage on page 72

for further details on this topic.

2.2.11 Call trace instrumentation options

The options -finstrument-functions and -minstrument-calls instruct stxp70cc to generate instrumentation calls. See

Section 4.7: Call trace on page 74

for further details on call trace instrumentation.

The options in

Table 13

control optimization levels.

Table 13.

Optimize options


-O0

No

-O1

-Os

Minimal optimization.

Optimize for code size.

-O2

-O3

-O4

Global optimization, speed orientated.

Aggressive optimization, speed orientated

Aggressive optimization, speed orientated. Enables aggressive loop unrolling when compiling code for

STxP70-4 in dual issue/dual ALU core configuration.

8027948 Rev 15 29/166

stxp70cc UM1237

Note: 1

-O

optimization is equivalent to -O2.

2

-Os

optimization applies the optimizations of -O2, except for those that increase the code size (such as unrolling).

The options in

Table 14

enable finer control of the optimization level.

Table 14.

Advanced options


--deadcode

--no-deadcode

-f[no-]unroll-loops

This option forces the binary optimizations (binopt) performed after link stage. For instance, enabling this option removes non-static functions that are never called in the executable binary file. This is the default behavior when highest optimization level is set (-Os and

-O4

).

This option disables the binary optimizations (binopt) performed after the link stage.

-f[no-]strict-aliasing

-fstrict-aliasing

enables the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C and

C++ this activates optimizations based on the type of expressions. In particular an object of one type is assumed never to reside at the same address as an object of a different type, unless types are almost the same (the aliasing rules are stated in the ANSI C standard, in clause 6.5 (7) Expressions. For example an unsigned int

can alias an int, but not a void * or a double. The types char

and types with the may_alias attribute can alias any other type.

The default is -fstrict-aliasing. If this causes problems in legacy code, use -fno-strict-aliasing to disable it.

-funroll-loops

forces loop unrolling. This is the default at -O2,

-O3 and -O4.

-fno-unroll-loops

disables loop unrolling. This is the default at

-Os

.

Loops with a #pragma unroll directive are not affected by these two options.

See

Section 4.2: Loop unrolling on page 63

for details of the unrolling policy.

30/166 8027948 Rev 15

UM1237 stxp70cc

The options in

Table 15

control various aspects of the code generation.

Table 15.

Code generation options

Option

-gnu3

-gnu4

-macf-decl <act_filename.acf>

-macf-active "string"

-macf-template

{source_filename

1

...}

-fb <name>

-fb_create <name>

-fsigned-char

-funsigned-char

-fsigned-bitfields

-funsigned-bitfields

-fno-signed-bitfields

-fno-unsigned-bitfields

Description

The GCC front-end version 3.3.3 is used.

The GCC front-end version 4.2.0 is used. This is the default for toolset 2011.1 and higher.

Reads acf_filename.acf as an ACF, using the default configuration declared in the file as the active configuration. See

Section 4.10: Application configuration files on page 82

for details.

Use in conjunction with -macf-decl

<act_filename.acf>

. Enables a configuration named

string

to be specified as the active configuration.

string

must be defined in <act_filename.acf>.

See


for details.

Generates the ACF template for the application implemented by the source files specified. The source files must be linkable, and the compilation include a link stage to ensure that template is complete. See


for details.

Not yet supported.

Not yet supported.

-fsigned-char

implements type char as signed.

-funsigned-char

implements char as unsigned.

Note that when the -funsigned-char option is used, the __CHAR_UNSIGNED__ preprocessor symbol is defined.

The compiler default is signed.

These options control whether a bitfield is signed or unsigned, when the declaration does not use either

‘signed’ or ‘unsigned’.

The compiler default is signed.

8027948 Rev 15 31/166

stxp70cc UM1237

Table 15.

Code generation options (continued)


-ffixed-reg=<register-list>

<register-list>

is a list of one or several commaseparated register names or dash-separated register ranges, either general purpose registers or boolean registers. The syntax used for registers is:

– rn for core GPR, where n can be 0 to 31,

– gn for core guards registers, where n can be 0 to 7,

– fn for fpx extension registers, where n can be 0 to 15.

This option makes the given registers fixed registers; that is, the code generated by the compiler never uses them.

There are however, some registers that are used by the compiler for ABI register conventions. See the table of general registers in the STxP70 ABI manual. The registers with a specified use must not be reserved with this option.

Note that specific care must be taken when using this option since low-level library and run-time support code are not specifically built to support non-ABI register usage. For instance, reserving the r5 register does not prevent already compiled library code from using it. Using this option generally requires rebuilding a set of libraries either with the same option (for C/C++ code) or to take into account that this option has been used.

Examples: stxp70cc -ffixed-reg=r6,g0 stxp70cc -ffixed-reg=f12-f15

-mdisabled-reg=<register-list>

This option is similar to -ffixed-reg described above, except that the corresponding registers cannot be used by the register GNU extension or asm statement clobber list.

The syntax of the <register-list> is the same as for option -ffixed-reg above.

Note that the -MMode16 configuration option is based on this option.

32/166 8027948 Rev 15

UM1237 stxp70cc

Table 15.


Option

-fshort-double

-mlib-short-double

-mlib-nofloat

Description

By default, the compiler assumes double precision floating point. This means that floating point constants with implicit type declaration are promoted to double precision. This promotion is propagated in the expression where the constant is used. For example, the expression used to compute C is performed as double precision because of the implicit constant type: float A; float B; float C=A*B*3.45;

If the constant is explicitly declared as a single precision, the expression remains in single precision: float A; float B; float C=A*B*3.45F;

The option -fshort-double instructs the compiler to assume single precision instead.

When the FPx floating point extension is used, this option is required to ensure an efficient code

generation. A warning is emitted if FPx is used without this option.

More details can be found in

Section 4.9: Floating-point code generation on page 79

.

This option instructs the compiler to use single precision floating point libraries.

This option is forced as soon as the -fshort-double option is set. On the STxP70, this option is deprecated, since it is forced to fit the default code generation setting.

It is preserved mainly for legacy reasons.

Instructs the compiler to use the C-library without floatingpoint support. Leads to a much smaller C-library (nearly half the size of default library).

8027948 Rev 15 33/166

stxp70cc UM1237

Table 15.


Option

-fpack-struct

-fshort-enums

-fverbose-asm

-fno-verbose-asm

-falign-functions

-falign-functions=n

-falign-loops

-falign-loops=n

Description

Instructs the compiler to pack structures. The goal of this option is to reduce the memory footprint of the data sections of the objects and binary files. Note that this may induce a need for misaligned accesses, which usually increases the size of the code in text section. Gains in size will be more significant if large arrays of structures are used.

This option should be used by advanced users only. It may conflict with the assumptions or semantics of the source code. For instance:

– if the source code performs some verifications based on the size of a structure, then enabling this option may cause the check to fail

– in some cases, some alignment constraints may no longer hold when the option is set

Some warnings and errors are emitted to prevent the compiler from silently perform ming non-conservative code generation. See the options

-m[no-]warn-packstruct

and

-m[no-]warn-smart-packstruct

in


for controlling warnings.

If you encounter a problem with this option, it is advised to disable it, and check if the issue is still present.

Instructs the compiler to use the shortest integer type required to represent the values of an enumeration. The goal of this option is to reduce the memory footprint of the data sections of the objects and binary files. This option is more likely to have a real impact if it is used in combination with -fpack-struct.

This option should be used by advanced users only. It may conflict with the assumptions or semantics of the source code. For instance:

– if the source code performs some verifications based on the size of a structure, then enabling this option may cause the check to fail

– in some cases, some alignment constraints may no longer hold when the option is set

Some warnings and errors are emitted to prevent the compiler from silently perform non-conservative code generation. If you encounter a problem with this option, it is advised to disable it, and check if the issue is still present.

The -fno-verbose-asm removes extra commentary information in the generated assembly code.

The default is to have verbose asm output.

Align the start of functions to the next power of two greater than n (if n is specified), skipping up to n bytes.

Align the first address of loops to the next power of two greater than n (if n is specified), skipping up to n bytes.

34/166 8027948 Rev 15

UM1237 stxp70cc

Table 15.


-falign-jumps

-falign-jumps=n

-falign-labels

-falign-labels=n

-falign-instructions

-falign-instructions=n

-ffast-math

-f[no-]math-errno

-mreassoc=0

-mreassoc=1

-mreassoc=2

-fpic

--rlib

--rmain

Option

-maggressive_unroll=n

Description

Align the target address of jumps to the next power of two greater than n (if n is specified), skipping up to n bytes.

Align the labels to the next power of two greater than n (if

n

is specified), skipping up to n bytes.

Align the instructions to the next power of two greater than

n

(if n is specified), skipping up to n bytes.

Defines the preprocessor macro __FAST_MATH__ and invokes -f[no-]math-errno.

-fmath-errno

causes the compiler to generate code to set the mathematical error flag in floating point code. The compiler also makes use of slower libm from Newlib libm

with errno setting. This is the default behavior when the FPx floating point extension is not used.

-f[no-]math-errno

causes the compiler not to generate code to set the mathematical error flag in floating point code. The compiler also makes use of fast libm

overrides, for example sqrtf from the FLIP library with no errno setting. This is the default behavior when the FPx floating point extension is used.

No re-associations, folding or simplifications. This is the default.

Accurate simplifications that are correct for finite arithmetic are allowed, for instance, a/a -> 1.0, recip(recip(a)) ->a

.

For example, the transformation a/a -> 1.0 is not valid when a is 0.0 because in this case 0.0/0.0 -> NaN.

Aggressive re-association of expressions is performed to favor the selection of fused multiply-add routines. Such changes in the evaluation order can lead to slightly different results, compared to the original evaluation order.

Generate position independent code (data accesses only).


.

Build a relocatable library that can be loaded by RL_LIB.

See


.

Build a main program suitable for loading relocatable

libraries. See


.

Modify the aggressiveness of the default unrolling policy.

n

is a value in the range [0, 6]. The higher it is, the more

aggressive the unrolling. Refer to


for details about this option and the

values of n.

8027948 Rev 15 35/166

stxp70cc UM1237

Table 15.


-trapuv

-zerouv


Initialize uninitialized local variables to pre-defined values.

-trapuv

helps to find issues that are due to uninitialized variables. This option has a slight performance impact. It affects local scalar, array variables and memory returned by alloca. It does not affect the behavior of globals or memory allocated with malloc.

Integer variables are initialized to 0xdeaddead.

Float variables are initialized to 0xfffa5a5a (NaN, floating-point NaN).

Pointer variables are initialized to 0x0.

A sub-type is given a sub part of the pattern of its original type: char

is initialized to 0xad.

short

is initialized to 0xdead.

long

long is initialized to 0xdeaddeaddeaddeadLL.

double

is initialized to 0xfffa5ffffa5a5a5a (NaN).

Default values of patterns can be controlled as follows:

-DEBUG:trapuv_int_value=0xffffffff

to change integer pattern to 0xffffffff.

-DEBUG:trapuv_float_value=0xeeeeeeee

to change float pattern to 0xeeeeeeee.

-DEBUG:trapuv_pointer_value=0xdddddddd

to change pointer pattern to 0xdddddddd.

Note: Using -trapuv removes the possibility of using -

Wuninitialized

, see


.

Sets uninitialized variables to zero at runtime. This option has a slight performance impact. It affects local scalar, array variables and memory returned by alloca. It does not affect the behavior of globals or memory allocated with malloc.

Note: Using -zerouv removes the possibility of using -

Wuninitialized

, see


.

36/166 8027948 Rev 15

UM1237 stxp70cc

Table 15.


Option

-m[no-]parse-asmstmts

-m[no-]parse-meta-asmstmts

Description

-mparse-asmstmts

causes the compiler to parse and optimize user defined GNU assembly statements. When set, the compiler analyzes the content of GNU assembly statement, and optimizes it if possible.

-mno-parse-asmstmts causes the compiler not to parse and optimize user defined GNU assembly statements. The compiler leaves the instructions of the

GNU assembly statement unchanged, except regarding register allocation. This is the default behavior.

See

Section 6.8: Parsing and optimization of GNU assembly statement on page 114

for details.

-mparse-meta-asmstmts is similar to

-mparse-asmstmts,

but applies only to the GNU assembly statements used internally by the compiler to automatically map the instructions of the extensions. This is the default behavior.

-mno-parse-meta-asmstmts is similar to

-mno-parse-asmstmts

, but applies only to the GNU assembly statements used internally by the compiler to automatically map the instructions of the extensions.

See


for details.

The options -OPT:unroll_size, -OPT:cray_ivdep and -OPT:liberal_ivdep modify the behavior of pragmas and are documented in

Section 3.2.1: #pragma unroll (n) on page 45

and

Section 3.2.2: #pragma ivdep on page 46

.

The -OPT:alias option is documented in

Section 4.3: Memory dependences in C programs on page 65

.

The -inline, -noinline and -INLINE options are provided to control inlining of

functions. They are listed in


and


and described in

Section 4.1.1: Single file inlining on page 55

.

Only functions marked with the inline keyword are subject to inlining unless specified otherwise.

The -ipa option enables interprocedural analysis, and is described in

Section 4.8:

Interprocedural analysis optimization (IPA) on page 76

. This section documents a range of advanced -IPA options that provide control over the optimizations performed.

8027948 Rev 15 37/166

stxp70cc UM1237

Note:

The STxP70 compiler now provides some support for position independent code (PIC) generation and dynamic loading of shared components.

This is a partial support since only data accesses are position independent.

This feature is described in


.

2.2.18 Sending options to a specific phase

The -W<phase>,<arg> option passes the specified argument <arg> to a specific processing phase <phase> of stxp70cc.

Table 16

lists the different values of <phase>.

Table 16.

Possible value for phase

a l p f o

Value of phase Description

Preprocessor cpp

Compiler front-end

Assembler stxp70-as

Linker stxp70-ld

Binary optimizer tool binopt - not yet used by stxp70cc

There must be a comma between the option -W<phase> and the argument and no spaces.

Anything occurring after a space is treated as the next option to stxp70cc. Also the argument is only passed to <phase> if <phase> is normally run from the specified command.

For example: stxp70cc -O3 -Wl,-strict_warn a.out

This command causes the linker to emit strict warnings regarding link files.

38/166 8027948 Rev 15

UM1237 stxp70cc

Table 17

lists the options that select header files, libraries and compiler executables.

Table 17.

Directory options


-Idirectory

Add to the beginning of the search list for include files.

-nostdinc

-l<library>

No predefined include search path.

Search the library named lib<library>.a when linking. The linker looks for the library in the directories specified by the -L options and then in a standard list of directories.

The position of this option on the command line makes a difference. The linker processes object files and libraries in the order that they are specified on the command line. For example, if the following is specified: stxp70cc file1.o file2.o -lmylib then the files are processed in the order file1.o, file2.o, libmylib.a.

However, if the following is specified: stxp70cc file1.o -lmylib file2.o

then the files are processed in the order file1.o, libmylib.a, file2.o.

In this case, file2.o should not refer to any symbols defined in libmylib.a

.

-L<directory>

Add to the beginning of the search list for library files.

-nostdlib

No predefined libraries search path.

The search path for the various phases of the compiler can be overridden by using the option: -Y<phase>,<path> where <phase> can take the values listed in

Table 16

and

<path> is the path of the required tool. There must be a comma and no spaces separating

-Y<phase>

and <path>.

Currently there are no special environment variables that affect stxp70cc.

8027948 Rev 15 39/166

stxp70cc UM1237

Predefined macros are described in

Table 18

.

Note: 1 The list of macros currently defined can be obtained by typing: stxp70cc -E -dM

filename.c where filename.c can be any .c file including an empty file.

2 Do not rely on a macro that is not documented, even if it is currently defined.

3 Some macro values are subject to change because of evolution of compiler design. This may affect, for instance, front-end identification values.

Table 18.

Predefined macros

Name Default definition

__open64__

__GNUC__

__GNUC_MINOR__

__stxp70cc__

__STXP70CC_MINOR__

__STXP70CC_PATCHLEVEL__

__STXP70CC_DATE__

__STXP70CC_VERSION__

__LITTLE_ENDIAN__

_LANGUAGE_C

Defined

3

3

Compiler technology identification

Front end major release identification

Front end minor release identification

Defined, value depends on major compiler version

Defined, value depends on minor compiler version

Compiler identification


Defined, value depends on compiler patch level

Defined, value depends on compiler release date

Defined for C source



Defined, value is an identification string


Defined by default

Endianness identification

Language currently processed is C language.

-no-gcc

-no-gcc

40/166 8027948 Rev 15

UM1237 stxp70cc

Table 18.

Predefined macros (continued)

Name Default definition

_LANGUAGE_ASSEMBLY

__STRICT_ANSI__

Defined for ASM source

Language currently processed is assembly language.

Defined when std=c89

or std=c99

or -ansi

Compiler is in strict ansi mode.

-std

__STDC_VERSION__

__OPTIMIZE__

__OPTIMIZE_SIZE__

Defined when std=c99

with value 199901L

Compiler is in

C99 ansi mode

Defined as soon as optimization is on.

Optimization mode detection.

-Os

Optimization size detection

__INLINE_INTRINSICS

Defined

Intrinsics inlining mode detection.

-std

-O

-Os

-OPT:inline_ intrinsics

__STDC_HOSTED__

__FAST_MATH__

Defined by default. Hosting mode.

Defined when ffast-math option is used.

-f[no-]hosted f[no-]freestanding

Libraries or user code can take advantage of this definition to define alternative sequences of floating point code.

-ffast-math

Note: The C standard guarantees that the __cplusplus symbol is never defined when compiling

C source code.

The stxp70cc compiler supports a subset of the C99 standard. Most features are implicitly available through default compiler command line options, with the notable exception of the restrict

keyword that requires the -std=c99 command line option to be specified.

It is recommended that any code fragment that depends upon C99 specific behavior be guarded by the following preprocessing definitions, which are correctly triggered when the std=c99

command line option is used:

#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)

// Your C99 dependent code here

#else

#error "This source file depends upon C99 features not available with this compiler."

#endif

8027948 Rev 15 41/166

stxp70cc UM1237

Table 19

summarizes the status of the stxp70cc compiler C99 support.

Table 19.

C99 support in stxp70cc

Feature as described in the C99 standard Status

Restricted character set support via digraphs iso646.h

included

Wide character library support

More precise aliasing rules via effective type

Restricted pointer

Variable length arrays

YES

YES

NO: type not supported and library not provided.

YES: provided that the -fnostrict-aliasing

option is not used

YES: provided that the -fnostrict-aliasing

option is not used

PARTIAL: only local allocation, but no other features

Flexible array members

Static and type qualifiers in parameter array declarators

YES

YES

Complex support (<complex.h>) NO

Type generic math macros (<tgmath.h>)

The long long int type and library functions

Increased minimum translation unit

NO: include file not provided

YES

YES

Additional floating-point characteristics (<float.h>) NO

Remove implicit int YES

Reliable integer division YES

Universal character names

Extended identifiers

Hexadecimal floating-point constants

NO

NO

YES

Compound literals

Designated initializers

// comments

YES

YES

YES

Extended integer type and library functions in <inttypes.h> and <stdint.h>

YES

Remove implicit function declaration NO: can get warning

Preprocessor arithmetic done in intmax_t/uintmax_t

Mixed declaration and code

New block scope for selection and iteration statements

Integer constant type rules

Integer promotion rules vararg

macro

YES

YES

YES

YES

YES

YES

42/166 8027948 Rev 15

UM1237 stxp70cc

Table 19.

C99 support in stxp70cc (continued)

Feature as described in the C99 standard Status

The vscanf family of function in <stdio.h> YES

Additional math library functions in <math.h> NO

Floating point environment access in <fenv.h> NO

ISO 60559 Arithmetic support NO

Trailing comma allowed in enum declaration YES

%lf conversion allowed in printf NO

Inline functions

YES: but not fully ansi compliant in the extern inline case

The snprintf family of functions in <stdio.h> YES

Boolean type in <stdbool.h>

Idempotent type qualifiers

NO bool native type but

<stdbool.h>

header provided

YES: but still emits warnings

Empty macro arguments

New struct type compatibility rules (tag compatibility)

Additional predefined macro names

_Pragma

preprocessing operator

Standard pragmas

__func__

predefined identifier

YES

YES

MOST

YES

NO

YES

VA_COPY

macro

Additional strftime conversion specifiers

LIA compatibility annex

Deprecate ungetc at the beginning of a binary file

Remove deprecation of aliased array parameters

Conversion of array to pointer not limited to lvalues

Relaxed constraints on aggregate and union initialization

Relaxed restrictions on portable header names

Return without expression not permitted in function that returns a value (and vice versa)

NO

NO: library support not provided

NO

NO

YES

YES

YES

YES

YES

8027948 Rev 15 43/166

Pragmas

3 Pragmas

UM1237

This chapter provides details of the #pragma directives that are recognized by stxp70cc.

3.1 Pragmas short description and syntax

Table 20.

stxp70cc pragmas

Optimization level

(1)

#pragma unroll

(unroll_amount)

#pragma loopmod(q,r)

#pragma looptrip(n)

#pragma disable_extgen

(fct1,fct2,...)

#pragma force_extgen

(fct1,fct2,...)

#pragma disable_specific_extgen

(extname[,fct1,fct2,...])

Start of a loop body

#pragma ivdep


#pragma loopdep

PARALLEL |

VECTOR | LIBERAL




#pragma loopseq READ |

WRITE

#pragma hwloop none | forcehwloop<loopid> | forcejrgtudec



#pragma loopmin<itercount>

(minc)


#pragma loopmax<itercount>

(maxc)


#pragma frequency_hint

NEVER|FREQUENT

Applies to the function or statement that follows the pragma

#pragma ident "string"

-

#pragma weak

symbol -

-

-

-

Unrolls the loop

unroll_amount

times

Liberalizes dependence analysis

Liberalizes dependence analysis

Provides trip count modularity information

Provides trip count estimation information

Ordering of the READ (or

WRITE) accesses

Controls mapping of HW loops and JRGTUDEC

Controls the guards to be placed around loops

Controls specific cases of

HW loop mapping

Execution frequency hint

-O2

-O3

-O3

-O2

-O2

-O2

-O2

-O2

-O2

-O1

Adds a .comment section to an assembly file.

-O0

Marks a symbol as weak

-O0

Disables native code generation for all extensions in the given functions.

-O2

Enables native code generation for all extensions in the given functions.

-O2

Disables native code generation for specified extensions in the given functions.

-O2

44/166 8027948 Rev 15

UM1237 Pragmas

Table 20.

stxp70cc pragmas (continued)


(1)

#pragma force_specific_extgen

(extname[,fct1,fct2,...])

-

Enables native code generation for specified extensions in the given functions.

-O2

#pragma inline_next

(function)

#pragma noinline_next

(function)

Function call site

Function call site

#pragma inline_function

(function)

Function

#pragma noinline_function

(function)

Function Inlining

(2)

-O1

#pragma inline_file

(function)

#pragma noinline_file

(function)

File

File

#pragma defaultinline

(function)

-

1. This column denotes the lowest optimization level for which the pragma has an effect. For example -O0 means the pragma is applicable even when optimization is switched off. A list of optimization levels is given in

Section 2.2.12: Optimization options on page 29

.

2. All inlining pragmas are described in

Section 4.1.4: Inlining pragmas on page 58

.

3.2 Loop optimization pragmas

This pragma suggests to the compiler the type of loop unrolling that should be done. The pragma is a recommendation to the compiler to add n-1 copies of the loop body to the inner loop. The value of n must be at least 1. If it is 1, then unrolling is not performed.

If the loop that this pragma immediately precedes is an inner loop, then it implies standard inner loop unrolling. See

Figure 2

.

Figure 2.

Inner loop unrolling example

for (i=0; i < 10; i++)

#pragma unroll (2)

for (j=0; j < 10; j++)

a[i][j] = a[i][j]+b[i][j]; becomes: for (i=0; i < 10; i++)

for (j=0; j < 10; j+=2) {

a[i][j] = a[i][j] +b[i][j];

a[i][j+1] = a[i][j+1]+b[i][j+1];

}

8027948 Rev 15 45/166

Pragmas UM1237

If the loop that this pragma immediately precedes is an outer loop that contains only an inner loop, then the compiler attempts to unroll the outer loop and perform loop fusion on the resulting inner loops. This transformation, known as “unroll-and-jam”, is especially useful to create parallel execution opportunities when the innermost loop alone does not present such opportunities. See

Figure 3

.

Figure 3.

Unroll-and-jam example

// Ensure ad[] and sd[] do not alias.

#pragma unroll(2) for (i=0; i<16; i++) {

int sum = 0;

}

for (k=M; k<8+M; k++) {

sum += sd[k]*sd[k-i];

}

ad[i] = sum; becomes: for (i=0; i<16; i+=2) {

int sum0 = 0;

int sum1 = 0;

for (k=M; k<8+M; k++) {

sum0 += sd[k]*sd[k-i];

sum1 += sd[k]*sd[k-i-1];

}

ad[i] = sum0;

}

ad[i+1] = sum1;

•

•

•

The following tips provide information on how to control the desired inner loop unrolling with the pragma unroll value.

A counted loop with a compile-time constant trip count is always fully unrolled if a pragma unroll with a value greater or equal to the loop trip count is specified.

When a counted loop is not fully unrolled, the pragma unroll value is rounded to the greatest power of two lower than the specified unrolling value.

The maximum size of a loop after unrolling is controlled by the command line option -

OPT:unroll_size=<n>

.

46/166

This pragma instructs the compiler to liberalize dependence analysis between memory accesses. The #pragma ivdep applies only to the innermost loops in a set of nested loops; therefore, if it is used on a loop that has an inner loop, the compiler ignores it. By default, this pragma allows the compiler to assume there are no memory dependences between loop iterations.

•

•

The following command line options modify the ivdep semantic.

-OPT:cray_ivdep=TRUE

Only ignore backward memory dependences (Cray semantics).

-OPT:liberal_ivdep=TRUE

Also ignore all memory dependences in the same loop iteration.

8027948 Rev 15

UM1237 Pragmas

For example:

#pragma ivdep for (i = 0; i < n; i++) {

a[b[i]] = a[b[i]]+3; // These dependencies cannot be computed by

}

This pragma instructs the compiler to liberalize dependence analysis between memory accesses, based on the specified type of loop dependences. Contrary to the pragma ivdep described above, the semantics cannot be modified by command line options.

The loopdep pragma takes an argument to tell the compiler which kind of loop dependencies it can ignore, VECTOR, PARALLEL or LIBERAL.

#pragma loopdep VECTOR

#pragma loopdep VECTOR

allows the compiler to assume there are no backward memory dependences between loop iterations. This pragma is equivalent to #pragma ivdep, -OPT:cray_ivdep=TRUE

.

Example:

#pragma loopdep VECTOR for (i = 0; i < n; i++) {

}

a[i] = a[i+k]+3;

In this example, the compiler cannot tell when a[i+k] does not depend on a[i], but this is in fact the case if k is always > 0 in the program. The pragma allows the compiler to assume there are no dependences between the read of a[i+k] in the current loop iteration, and the write of a[i] in the following loop iterations. The compiler could rewrite the loop as: for (i = 0; i < n; i+=2) {

t0 = a[i+k]+3;

t1 = a[i+1+k]+3;

a[i] = t0;

}

a[i+1] = t1;

#pragma loopdep PARALLEL

#pragma loopdep PARALLEL

allows the compiler to assume there are no dependences between any two memory accesses that are in different loop iterations. This pragma is equivalent to:

#pragma ivdep, -OPT:cray_ivdep=FALSE -OPT:liberal_ivdep=FALSE

For example:

#pragma loopdep PARALLEL for (i = 0; i < n; i++)

a[b[i]] = a[b[i]] + 3;

8027948 Rev 15 47/166

Pragmas UM1237

In this example, the compiler cannot tell that either the load or store of a[b[i]] in the current loop iteration does not depend on the load or store of a[b[i]] in a following loop iteration. This is in fact the case if b[i] != b[j] for all i != j. The compiler could rewrite the loop as: for (i = 0; i < n; i+=2) {

t1 = a[b[i+1]] + 3;

t0 = a[b[i]] + 3;

a[b[i+1]] = t1;

}

a[b[i]] = t0;

#pragma loopdep LIBERAL

#pragma loopdep LIBERAL

allows the compiler to assume there are no dependences between any two memory accesses that are either in the same, or different, loop iterations.

This pragma is equivalent to:

#pragma ivdep, -OPT:liberal_ivdep=TRUE

Example:

#pragma loopdep liberal for (i = 0; i < n; i++) {

}

a[j] = b[i];

c[i] = a[i] + 3;

In this example, the compiler cannot tell that the load of a[i] does not depend on the store of a[j]. This is in fact the case if i != j for all values of i and j in the loop iterations.

48/166

This pragma tells the compiler the number of times a loop is taken in terms of a multiple q and a residual r.

The syntax of this pragma is:

#pragma loopmod(q,r) where q is strictly a positive integer, r is a positive integer, r < q.

For example:

#pragma loopmod (4,0)

This tells the compiler that the loop is taken 0, 4, 8, 12 .... times.

#pragma loopmod (4,1)

This tells the compiler that the loop is taken 1, 5, 9, 13 .... times.

When applied to an inner loop, this pragma indicates that the trip count tc, that is the number of iterations that are executed by any execution of the loop can be written as: tc = p q + r with q > 0, r >= 0

Where q is strictly a positive integer. This information helps the compiler in loop unrolling optimization, and in software.

When unrolling loops, the compiler creates multiple loop bodies (the unrolling factor specifies the number of loop bodies created). However, the compiler cannot always

8027948 Rev 15

UM1237

Note:

Pragmas

statically determine the trip count. When it cannot determine the trip count, the compiler must also create residual code in case the unrolling factor is not a divisor of the loop trip count.

However, it is possible for application writers to know the modular properties of some of the loops in their own code. Bringing this accurate information to the compiler, the residual code can be largely removed or better optimized.

Bringing inexact information on the trip count may lead to inexact code. Be careful that the property asserted is valid in all cases.

The following example shows the use of the #pragma loopmod.

void copychar(unsigned char* __restrict p, unsigned char * q,

{

int i ;

assert(sz % 4 == 0) ;

#pragma loopmod(4,0)

for(i=0; i<sz; i++)

p[i] = q[i];

}

The function copychar duplicates a byte stream, whose size must be a multiple of 4.

During unrolling, and without the pragma, the compiler would create a residual loop. This is totally removed when the pragma information is asserted. In this example, the pragma does not provide the compiler with any information about the memory alignment of p or q, which the compiler would need to generate word accesses after unrolling.

This pragma instructs the compiler that the estimate of the number of iterations of the loop

(the loop trip count estimate) is n. This is not an assertion that the loop effectively iterates n times.

•

•

•

•

A number of optimizations are affected by the #pragma looptrip (n), when the compiler has not already determined the exact trip count: basic block frequency estimation uses this information as an approximation of the loop trip count unrolling and cross-iteration optimizations are reduced if the given loop trip count estimate is low software pipelining is limited if the estimate is low automatic data prefetch generation is limited if the estimate is low

One scenario of usage is for ‘for’ loops with trip counts of unknown values where the user knows that the approximate effective value is low:

#pragma looptrip(4) for (i=0; i<n; i++) a[i] = b[i] ;

This example avoids non-beneficial optimizations. On such loops the compiler trip count estimate without the pragma is 100.

8027948 Rev 15 49/166

Pragmas UM1237

A second scenario is for ‘while’ loops where the user knows that the approximate effective trip count is high:

#pragma looptrip(100) while (*p++=*s++)

This example gives a better approximation of the weight of the loop. Generally the compiler trip count estimate for a while loop is very low.

•

•

•

Possible error messages are:

Warning : pragma ‘LOOPTRIP’ : inconsistent with computed value, ignored

Warning : pragma ‘LOOPTRIP’ : not followed by a loop, ignored

Warning : malformed ‘#pragma looptrip (n)’

#pragma hwloop none

#pragma hwloop forcehwloop <loopid>

#pragma hwloop forcejrgtudec

The hwloop pragmas allow fine control of special looping mechanisms available on

STxP70 processor. They are all to be placed before loop statement. They respectively allow: hwloop none

Block the mapping of both hardware loops and JRGTUDEC special instructions.

hwloop forcehwloop

<loopid>

Force a given loop to make use of hardware loop. Notice that the mapping is performed by the compiler only if it is legal to do so. The

loopid

argument is optional. It allows the user to force the use of either of the two hardware loop register. Thus possible values are 0 and 1. The main interest is to force the use of the saved loop register

L0 when a call is present in loop body, but the callee is known to have no side effect on HW loop registers (that is, is HW loop free), thus avoiding to save/restore loop register. It is the user responsibility to ensure that using the specified register is legal. hwloop forcejrgtudec

Force a given loop to make use of the JRGTUDEC special instruction.

The hardware loop pragmas must be placed before the loop statement:

#pragma hwloop forcejrgtudec for(i=0; i<n; i++) { a[i] = ...;

}

50/166 8027948 Rev 15

UM1237 Pragmas

3.2.8

The content of the hardware loop register of the STxP70 core, used to indicate tripcount, has 32-bit dynamics. This register is named LC. The zero value, however, is not legal from a hardware standpoint. Furthermore, no special instruction is available to indicate that the hardware loop must be skipped. Therefore, if the value used to set the LC register is less or equal than zero, a guard is needed.

Use the loopmin pragma to instruct the compiler that the loop tripcount is at least minc. If minc

is 1 or more, then the compiler is allowed to remove the guard that is needed otherwise. This saves both cycles and bytes because of the removal of comparison and branching instructions.

The loopminiter and loopminitercount syntaxes are equivalent. The second one is for legacy code that formerly used the sxcc compiler.

Use this pragma as follows:

#pragma loopmin (1) // loopminitercount can be used as well for(i=0; i<n; i++) { a[i] = ...;

}

#pragma loopmax<itercount> (maxc)

Use the loopmax pragma to instruct the compiler that a loop tripcount is at most maxc. This pragma is not generally useful on an STxP70 core. In a few cases, it is useful as a workaround for hardware problems that exposed problems when actual tripcount exceeded a given range (for instance: 16-bit integer).

Use this pragma as follows:

#pragma loopmaxitercount (1) for(i=0; i<n; i++) { a[i] = ...;

}

#pragma loopseq READ

#pragma loopseq WRITE

This pragma instructs the compiler that the memory READ accesses (or respectively the memory WRITE accesses) as they appear in the loop should be sequenced. This is not an assertion that the accesses must be kept in sequence, for instance, this is not a replacement for volatile accesses where it is mandatory to keep them in order.

The effect of this pragma is that the scheduler serializes all load prefetch operations (or respectively all stores) in the loop. Therefore the memory read (or write) accesses, as written in the C code are kept in order, as long as no aggressive transformation occurs in the loop.

8027948 Rev 15 51/166

Pragmas UM1237

The following scenario can occur when the user wants to keep memory writes in order to take advantage of a combining write buffer:

#pragma loopseq WRITE for(i=0; i<n; i++) { a[i] = ...; a[i+1] = ... ; a[i+2] = ... ; a[i+4] = ... ;

}

The pragma hints that the compiler should keep writes to the array in order. If the loop is unrolled, generating a large number of stores, this improves locality and may take advantage of combining write buffers. By default the compiler does not put restrictions on the ordering of non-overlapping store operations.

A second scenario is when the user has scheduled prefetch and load operations by hand, and wants to ensure that the compiler does not reorder them.

#pragma loopseq READ for(i=0; i<n; i+=S) {

... = a[i] ;

__builtin_prefetch(&a[i+S]) ;

}

The pragma hints that the compiler should keep the load and prefetch in order. In this example, the prefetch is not placed before it is effectively used in the next iteration by the load.

#pragma frequency_hint

This pragma allows the user to specify information about the execution frequency for certain regions of code with the following frequency specifications:

NEVER

This region of code is never or rarely executed. The compiler might move this region of the code away from the normal path. This movement might either be at the end of the procedure or at some point to an entirely separate section.

FREQUENT

This region of code is frequently executed. The compiler might try to put this region in the fall through path.

Example: if (debug) {

#pragma frequency_hint NEVER trace();

}

52/166 8027948 Rev 15

UM1237 Pragmas

3.3.1 #pragma ident “string”

Adds a .comment section in an assembly file.

Marks a symbol as weak.

This pragma instructs the link editor to not issue a warning if it does not find a defining declaration of the specified weak symbol. In which case the symbol is set to 0.

Allow the overriding of the current definition by a non-weak definition. See

Figure 4

.

Figure 4.

#pragma weak example

#pragma weak opt_handler extern void opt_handler (void); int main(int argc, char *argv[])

{

/* If opt_handler has not been defined, the linker does not

complain and the condition is false.*/

}

/* If opt_handler has been defined, the opt_handler is

invoked.*/

if (opt_handler) opt_handler();

This pragma can be used only when MPx extension is used. It disables the native code

generation for all extensions. Refer to



This pragma can be used only when MPx extension is used. It forces the native code

generation for the all extensions. Refer to



This pragma can be used only when MPx extension is used. It disables the native code

generation for the specified extension. Refer to


for further details. The typical use will be:

#pragma disable_specific_extgen ( MP1x, fct1, fct2).

8027948 Rev 15 53/166

Pragmas UM1237

This pragma can be used only when MPx extension is used. It forces the native code generation for the specified extension. The typical use will be:

# pragma force_specific_extgen ( MP1x, fct1, fct2)

Refer to



54/166 8027948 Rev 15

UM1237 Optimization guide

This chapter describes specific compiler options and techniques that can be used to gain maximum performance in your application.

4.1 Inlining

Inline function expansion is performed for function calls that the compiler estimates to be frequently executed. These estimations are based on a set of heuristics. The compiler might decide to replace the instructions of the call with code for the function itself (inline the call).

The current version of the compiler only supports the single file inlining mode as described in

Section 4.1.1

. The compiler supports both the single file inlining mode as described in

Section 5.2.1: Placement and layout on page 101

and cross file inlining through the IPA

optimization described in


.

The purpose of this section is to make users aware of the underlying algorithms used to select functions to inline. First, it describes how possible candidates are selected for inlining, and how the selection is finalized, taking size conditions into account. Then, user-level compiler switches are listed, to show how the inlining process can be controlled.

The inlining decisions of the compiler can be observed with the -INLINE:list option. We recommend that this option should be used when tuning inlining decisions. The exact scope and syntax of the -INLINE option are described throughout this section.

There are two kinds of candidates for inlining: may-inline and must-inline functions.

•

•

May-inline functions are selected by the compiler according to the following conditions: function is declared with the inline C keyword the functions not declared inline are may-inline candidates only if the -

INLINE:only_inline=off

option is specified. In this case, a function is a mayinline candidate if:

– it is declared with the static C keyword

– its name is not weak

– its address is neither passed nor saved

Must-inline functions are specified by the user, through the command line option:

-INLINE:must=fn1,fn2,...

May-inline and must-inline functions are then checked against several criteria to decide whether to inline them or not.

8027948 Rev 15 55/166

Optimization guide UM1237

Inlining criteria

•

•

•

•

Each candidate function is checked against inlining-exclusion cases which include: requires no-inlining by the user (-INLINE:never=fn, -INLINE:off command line options) recursive function vararg

function exception handler

After this preliminary test, each candidate function is inlined regardless of cost if it is marked must-inline, or if the -INLINE:all option has been specified by the user.

Otherwise, cost evaluation is used to decide whether to inline or not, and the candidate function is rejected if its estimated cost is above a given threshold set by the compiler. The -

INLINE:list=on

option can be used to list what is inlined. Changing the compiler limits is

not recommended, since this can lead to longer compilation times or increased memory usage or both, with no noticeable performance benefit.

•

•

Finally: the function to be inlined must be defined and visible in the same source file as the function using it a static function that is inlined can be in specific circumstances considered “dead”, and removed from the final object file

(b)

Table 21

specifies the options to control the stand-alone inlining.

More than one sub-option can be specified to the -INLINE:option either by using colons to separate each sub-option or by specifying multiple options on the command line. Some -

INLINE:option s are specified with a setting that either enables or disables the feature. To disable a feature, specify the sub-option with either =OFF, =FALSE or =0 (all these strings are case insensitive, for example -INLINE:list=OFF). To enable a feature, either use the option name alone (for example -INLINE:list) or any other string can be used on the right of the “=” sign (as in -INLINE:list=all). It is generally recommended to use =ON,

=TRUE

, =1 for the sake of clarity (for example -INLINE:list=ON).

Table 21.

Standalone inlining options

-inline

-noinline

Option

-INLINE:(on|off)

-INLINE:aggressive=(on|off)

Description

Enable inlining on inline functions. This is activated by default at optimization levels > 1.

Disable inlining.

Enable/disable inlining. Use of other -INLINE options implicitly set this to on.

Inline even non-leaf, out-of-loop calls. Default is off.

56/166 b. Note that this dead code removal was not performed in earlier versions of the stxp70cc compiler (that is, the compiler provided in toolset 3.1.0 and earlier). With those versions, inlining usually causes an increase in size, because both the original (not inlined) instance is preserved in the final executable code, even if it is never called.

8027948 Rev 15


Table 21.

Standalone inlining options (continued)


-INLINE:all

-INLINE:all_inline

Forces may inline functions to be inlined, bypassing cost evaluation. This option conflicts with -INLINE:off, and takes precedence if both are specified. Default is off.

Inline all functions marked by the C language inline keyword.

Allow dead function elimination. Default is on.

-INLINE:dfe

-INLINE:list=(on|off)

List compiler actions. Default is off.

-INLINE:must=name1[,name2...]

Always attempt to inline the named subroutines in addition to the default heuristic.

-INLINE:never=name1[,name2...]

Never attempt to inline the named subroutines.

-INLINE:only_inline=(on|off)

-INLINE:size_static=(on|off)

-INLINE:specfile=filename

-INLINE:static=(on|off)

Default is on. Inline only functions marked by the C language inline keyword. The

-INLINE:only_inline=off

option is mandatory to allow inlining of non inline functions.

Set to on, this option limits the inlining of static functions.

Set to off, this option allows more aggressive inlining of

static functions. See

Inlining static functions

. When code

is optimized for size (-Os) and for optimization levels:

-O0

, -O1 and -O2 the default is on; when code is optimized for speed (-O3, -O4) the default is off.

Specifies a filename containing inlining options. Default is none

.

Default is off. Allow static functions to be candidates for inlining.

In addition to these options, the option given in

Table 22

may be of interest when building a large body of inline functions (which is not recommended and may adversely affect performance).

Table 22.

Option changing inlining behavior

Option

-OPT:0limit=[0..n]

Description

Functions larger than size n are not optimized. Default is

3000

. Specifying 0 removes any limit but may lead to a very long compile time.

Inlining static functions

When the option -INLINE:size_static=on, the compiler assesses the total size increase that would result from the inlining of all the calls to the static callee function in the current caller. If this increase is above a given threshold, none of the calls to this callee function in the current caller are inlined.

When the option -INLINE:size_static=off, the compiler assesses the size increase that would result from the inlining of the calls to the static callee function incrementally. The first calls to the callee are inlined until the size increase becomes greater than the threshold.

8027948 Rev 15 57/166

Optimization guide

4.1.3

UM1237

Inlining any further calls is suspended when the size increase becomes greater than the threshold.

Extern inline functions

If both inline and extern are specified in a function definition, then the definition is used only for inlining. The function is never compiled on its own, not even if its address is referred to explicitly. The address becomes an external reference, as if the function had only been declared but not defined.

This combination of inline and extern has almost the same effect as a macro. The way to use it is to put a function definition in a header file with these keywords, and put another copy of the definition (lacking inline and extern) in a library file. The definition in the header file will cause most calls to the function to be inlined. If any instances of the function remain, they will refer to the single copy in the library.

The inlining process can be controlled within the C source code using #pragmas.

The stxp70cc compiler already supports several command-line options to configure its behavior, but it is not flexible enough. For instance, with the option -INLINE:never=foo the user can disable the inlining of foo everywhere it is called; conversely, with -

INLINE:must=foo

the user can force inlining of foo everywhere.

The user has the ability to force inlining or non-inlining at call sites through the use of pragmas. In addition, the noinline and always_inline attributes can be used at function declaration.

Pragmas

•

•

To force inlining or non-inlining of a function in the scope of a call site, the following two pragmas are introduced:

#pragma inline_next (foo,...)

forces inlining of function foo in the next statement

#pragma noinline_next (foo,...)

prevents inlining of function foo in the next statement

The ... denotes that it is possible to provide several function names with the same pragma. It is equivalent to several pragma lines.

•

•

Two similar pragmas are provided that can be used within the scope of a function:

#pragma inline_function (foo,...)

forces inlining of function foo every time it is called until the end of the current function

#pragma noinline_function (foo,...)

prevents inlining of function foo every time it is called until the end of the current function

The two call site scope pragmas take precedence over these two function scope pragmas.

•

•

Two lower priority pragma are provided, with file scope:

#pragma inline_file (foo,...) to force inlining of function foo every time it is called until the end of the current source file

#pragma noinline_file (foo,...) to prevent inlining of function foo every time it is called until the end of the current file

58/166 8027948 Rev 15


Finally, to revert inlining policy to the default one (that is, rely on the inliner’s evaluation of callee weight), the following pragma is introduced:

#pragma defaultinline (foo,...)

Function naming

As a special case, if the user does not provide any function name, the corresponding pragma applies to all functions called in the scope of the pragma. In this case, parentheses around the function names are optional.

User diagnostics

Several warning messages are provided to the user to help track errors.

If two conflicting pragmas are provided only the later is taken into account. For instance,

#pragma inline_next (foo)

#pragma noinline_next (foo) foo();

This generates the following warning: warning: #pragma noinline_next (foo) overrides previous #pragma inline_next (foo)

If pragmas are provided at an invalid scope (that is outside of a function), the following message is displayed: warning: #pragma noinline_function (foo) ignored (incorrect scope)

To help track misspelling, a warning is also displayed if a pragma could not be applied to any function call.

#pragma noinline_next (bar) foo(i);

This generates the following warning: warning: #pragma noinline_next (bar) matched no call

noinline and always_inline attributes

In order to enable the user to inhibit inlining of one function wherever it is called, the noinline

attribute is introduced, and is used at the function declaration level.

Conversely, to enable the user to force inlining of one function wherever it is called, the always_inline

attribute is introduced.

Precedence

Command-line options -INLINE:must=foo and -INLINE:never=foo take precedence over both pragmas and attributes.

Attributes take precedence over pragmas. That is, a function declared with

__attribute__((noinline))

is never inlined, regardless of pragma inline_xxx statements. However, the user can override this behavior with the -INLINE:must=foo command-line option.

If several contradictory pragmas with the same scope apply to the same function, the last one overrides the earlier ones.

8027948 Rev 15 59/166


Examples

Example one (

Figure 5

) illustrates the use of the #pragma noinline_next directive. All

calls to f1() are candidates for inlining, except the one directly following #pragma noinline_next

.

Figure 5.

#pragma noinline_next example

int ig = 0; inline void f1(int i) {ig += i;} void main()

{

f1(1); // f1 is candidate for inlining

#pragma noinline_next (f1)

f1(2); // f1 is not marked for inlining

f1(3); // f1 is candidate for inlining

}

printf("result is %d\n", ig);

Example two (

Figure 6

) illustrates the use of the #pragma inline_function directive.

All calls to f1() following the #pragma inline_function (f1) directive are forced to be inlined, except the one directly following #pragma noinline_next (f1). The call to f2()

following the #pragma inline_next (f2) is also forced to be inlined, while the first call to f2() is only a candidate for inlining (inlining depends on the respective weights of f2() and its caller).

Figure 6.

#pragma inline_function example

int ig = 0; int jg = 0; inline void f1(int i) {ig += i ;} inline void f2(int i) {jg += i ;} void main()

{

#pragma inline_function (f1)

f1(1); // f1 is forced to be inlined

f2(1); // f2 is candidate to inlining



#pragma inline_next (f2)



}

printf("result is %d %d\n", ig, jg);

60/166 8027948 Rev 15


Example three (

Figure 7

) illustrates the use of the #pragma defaultinline directive.

Figure 7.

#pragma defaultinline example

int ig = 0; int jg = 0; inline void f1(int i) {ig += i ;} inline void f2(int i) {jg += j ;} void main()

{

#pragma noinline_function (f1)



#pragma inline_next (f1)




#pragma defaultinline (f1)

}


printf("result is %d %d\n", ig, jg );

Example four (

Figure 8

) illustrates the use of several function names or an empty name list with #pragma directives.

Figure 8.

Empty or multiple function name example

#pragma noinline_file () int f(int i) { return i+1; } int g(int i) {

#pragma inline_next (f,g)

ignored

j += f(i) + f(i); // f is not marked for inlining

} int h(int i) {

#pragma noinline_next ()

int j=i + f(i) + g(i); // f and g are not marked for

inlining

#pragma inline_next (f,g)

j+=i + f(i) + g(i); // f and g are forced to be inlined

} void main()

{

}

// g and h are not marked for inlining

printf("result is %d %d\n", g(0), h(0));

8027948 Rev 15 61/166


Example five (

Figure 9

) illustrates the use of the noinline attribute and shows how the

attribute has precedence over #pragma.

Figure 9.

noinline attribute example

#pragma inline_file(f3) int ig = 0; void __attribute__ ((noinline)) f3(int i) { ig += i ; } int main()

{


#pragma inline_next(f3)


#pragma defaultinline (f3)

}


printf("result is %d\n", ig);

62/166 8027948 Rev 15


4.2.1

This section describes how the stxp70cc compiler implements loop unrolling.

Default unrolling policy

The way loops are unrolled depends on the optimization level and on the version and configuration of the core (single or dual ALU/dual issue, and the number of general purpose registers (GPRs)).

•

•

Two main parameters are controlled: the maximum unrolling factor to be applied the maximum size of the loop after unrolling (this size corresponds to the number of instructions in the internal representation rendered by the compiler when unrolling is applied)

The exact parameters used to control unrolling are listed in

Table 23

.

Table 23.

Loop unrolling parameters


-O0, -O1, -Os

-O2

-O3

-O4

-O4

-O4

Core

All

All

All

STxP70-3

STxP70-4-single issue

STxP70-4-dual issue,

16 GPRs

STxP70-4-dual issue,

32 GPRs

2

4

4

Maximum unrolling factor

No unrolling

2

2

Maximum unroll size

No unrolling

32

64

64

64

128

Note: 1 Depending on the internal analysis, the compiler is free to apply an actual unrolling factor which is smaller than the maximum specified for the optimization level and core. This is especially the case if a smaller unrolling factor enables the compiler to avoid the generation of a remainder loop.

2

The #pragma unroll directive takes precedence over the default behavior of the loop

unroller.

8027948 Rev 15 63/166


4.2.2 Advanced control of the unroller

UM1237

•

•

The following facilities are provided to fine tune loop unrolling: the loop unroll pragma #pragma unroll

This pragma can be used to apply a precise unrolling factor to a given loop. This pragma is described in


.

the stxp70cc -maggressive_unroll=n option

This option enables the aggressiveness of the unroller to be set. This option takes an integer in the range [0, 6] as an argument. It applies the unrolling parameters described in

Table 24

.

Table 24.

-maggressive_unroll option: values of n

Level Maximum unroll factor

0

1

2

3

4

5

6

8

8

4

4

No effect

2

2

Maximum unroll size

No effect

64

128

64

128

64

128

4.2.4

•

•

The precedence order is as follows:

#pragma unroll

takes precedence over both the default unroller behavior and the

-maggressive_unroll

option the -maggressive_unroll option takes precedence over the default unroller behavior

Built-in assume and pragma loopmod

The built-in, __builtin_assume can be used to instruct the compiler that the loop count is a multiple value of a given integer. This allows the compiler to apply an unrolling factor which does not cause the generation of a remainder loop. This saves code size while often ensuring a better efficiency of the final code.

The following code provides an example where the loop count is stated to be a multiple value of 4:

__builtin_assume((lcount&3)==0);

for(i=0; i<lcount; i++) {

*dest=*src;

dest++; src++;

}

The built-in can be easier to integrate in the code than using the #pragma loopmod described in

Section 3.2.4: #pragma loopmod on page 48

and may be more precise.

64/166 8027948 Rev 15

UM1237

4.3


Memory dependences in C programs

Precise analysis of memory dependences is key to compilation optimization, since it enables the compiler to more freely schedule load instructions above store instructions. By default, a C compiler assumes that any pair of memory accesses that reference distinct types are not aliased (that is, memory dependent). However, real world cases almost always involve pointers to the same types that are actually un-aliased: the compiler cannot generally deduce this property and must rely on additional information. This effect can be achieved either through the C language restrict keyword, or with the compiler option:

-OPT:alias=value

where possible values are listed in

Table 25

.

Table 25.

Possible value to the -OPT:alias option

Value Description

any typed unnamed restrict disjoint

The default. Any pair of memory accesses may be aliased.

Any pair of memory accesses that reference distinct types are not aliased.

Assume pointers never point to global objects.

Assume that different pointers never point to the same area

Assume multiple pointer indirection never overlap.

Although the compiler is able to compute precise memory dependences in many cases, this is not possible when complex memory accesses are involved, such as in the following example: for (i = 1; i < n; i ++) {

a[i-1] = a[i] + b[i];

} for (i = 1; i < n; i ++) {

c[d[i]] = c[i] + 1;

}

On the first loop, the compiler can fully determine the dependences between memory accesses, provided that it knows that a and b point to distinct memory locations (see the C language restrict qualifier). On the second loop, however, without information on values in d, the compiler assumes that all memory accesses in the loop are dependent. In particular, the sequence of load and store memory accesses in the iterations of the loop must be strictly respected, resulting in a poor instruction schedule if the loop is unrolled or software pipelined.

A useful property for loop optimizations is when a loop is vectorizable. This property can be enforced on a loop by using the #pragma loopdep VECTOR. A vectorizable loop is such that it can be decomposed into a sequence of loops, one per statement of the original loop, without changing the program results. Moreover, for each loop resulting from that decomposition (that contains only one statement), all load memory accesses can be performed before all store memory accesses, which means that a vector version of the loop can be written. In practice, unless the target processor is a real vector processor, the compiler does not decompose vectorizable loops as described. Rather, it uses the

8027948 Rev 15 65/166


vectorizable property of the original loop to remove dependences between memory accesses.

In the example above, the first loop is vectorizable, provided that a and b do not overlap.

The second loop is also vectorizable if the assertion (d[i]<=i) holds for all i.

Another useful property for loop optimizations is when a loop can be parallelized. This property can be enforced on a loop by using the #pragma loopdep PARALLEL. A parallelized loop is one where memory accesses that reference a given memory location may occur only in the same iteration of the loop. As a result, the sequence of memory accesses of the original loop can be changed in any way that preserves the relative order of memory accesses originating from the same loop iteration. Note that a parallelized loop is always vectorizable, so the #pragma loopdep PARALLEL is stronger (but less generally applicable) than the #pragma loopdep VECTOR.

In the example above, the first loop cannot be parallelized. The second loop can be parallelized if the assertion (d[i]==i) holds for all values of i.

The last useful property for loop optimizations is when a loop is liberal. This property can be enforced on a loop by using the #pragma loopdep LIBERAL. A liberal loop is one where all its memory accesses reference unique memory locations. As a result, all the memory accesses in the loop can be freely reordered. Note that a liberal loop can always be parallelized, so the #pragma loopdep LIBERAL is stronger (but less generally applicable) than the #pragma loopdep PARALLEL.

In the example above, the second loop is liberal if the assertion: (d[i]<1 || d[i]>=n) holds for all i. (For clarity, we omitted this case for the VECTOR and PARALLEL pragmas.)

The restrict qualifier, which applies to pointers or arrays in a C program, is also highly useful to remove dependences between memory accesses inside and outside loops. The restrict property states that two memory accesses originating from different pointers or arrays cannot reference the same memory location, when at least one of the pointers or array has the restrict qualifier. Please note that all memory accesses based on a given restrict pointer or array are still assumed dependent, unless it is obvious to the compiler that they are not, or there is a #pragma loopdep on the loop that applies to these dependences.

4.4

Note:

Aliasing rules in C/C++ programs

The -fstrict-aliasing option is enabled by default and allows the compiler to assume the strictest aliasing rules applicable to the language being compiled (the aliasing rules are stated in clause 6.5 (7) of the ISO/IEC Standard (Expressions)).

For C and C++, this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.

The type attribute may_alias is also available so that accesses to objects with types with this attribute are not subject to type-based alias analysis. Instead they are assumed to be able to alias any other type of object.

The -fno-strict-aliasing option can be used to disable the default option if required.

Particular attention is required before reporting a compiler issue related to aliasing, specifically when code runs correctly with the -fno-strict-aliasing option, but

66/166 8027948 Rev 15

UM1237

Note:

Note:


diverges when the default aliasing option is used. This is often caused by a violation of aliasing rules, which are part of the ISO C/C++ standard. These rules say that a program is invalid if you try to access a variable through a pointer of an incompatible type.

The example shown in

Figure 10

demonstrates this violation, where a float is accessed through a pointer to integer.

Figure 10. Aliasing example, using a cast

#include <stdio.h> int main(int argc, char *argv[])

{

float a = 0.0f ;

int *pa = (int *)&a ;

*pa = 0x40000000; /* violation of aliasing rules */

if (a != 0.0f)

puts("LEGACY BEHAVIOR") ;

else

puts("STRICT ALIASING BEHAVIOR") ;

}

return 0;

The aliasing rules were designed to allow compilers to perform more aggressive optimization. Basically, a compiler can assume that all changes to variables happen through pointers or references to variables of a type compatible with the accessed variable. Dereferencing a pointer that violates the aliasing rules results in undefined behavior.

In the case above, the compiler may assume that no access through an integer pointer can change the float a. Therefore, the actual value of a may be unaffected by the writing through pa

. What really happens is up to the compiler and may change with architecture and optimization level.

To disable optimizations based on alias-analysis for ‘faulty legacy code’, the option -fnostrict-aliasing

must be used as a work-around.

Because the practice of reading from a different union member other than the one most recently written to (called “type-punning”) is common, even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.

To fix the code in

Figure 10

above, you can use a union instead of a cast, as shown in

Figure 11

.

This is a GCC extension which might not work with other compilers.

8027948 Rev 15 67/166


Figure 11. Aliasing example, using a union

#include <stdio.h>

/*

According to GNU documentation, this code should work in

both strict and non-strict aliasing rules

*/

int main(int argc, char *argv[])

{ union {

float f ;

int i;

} u;

u.f = 0.0f ;

u.i = 0x40000000 ; /* is 2.0f */

if (u.f != 2.0f)

puts("NON-GNU BEHAVIOR") ;

else

puts("GNU ALIASING BEHAVIOR") ; return 0;

}

Now the result is always GNU ALIASING BEHAVIOR.

UM1237

68/166 8027948 Rev 15


Finally, to fully respect the ANSI C/C++ aliasing rules, it is necessary to write the data through a character type before reading it again. See

Figure 12

. The drawback of this standard conforming solution is that it has to account for endianness, and that it is less efficient than simply writing through an integer.

Figure 12. Aliasing example, writing through a character type

#include <stdio.h>

/*

According to ANSI standard, this code should work in

both strict and non-strict aliasing rules

*/

#include <stdio.h>

#define EXTRACTBYTE(val, pos) (((val) >> (pos*8)) & 0xff) int main(int argc, char *argv[])

{

union

{

float f ;

char c[4] ;

} u; const unsigned int twoasint = 0x40000000 ;

u.f = 0.0f ;

#if defined(__BIG_ENDIAN__)

u.c[0] = EXTRACTBYTE(twoasint, 3) ;




#elif defined(__LITTLE_ENDIAN__)





#else

#error "Unknown endianness : please define either __BIG_ENDIAN__ or __LITTLE_ENDIAN__"

#endif

if (u.f != 2.0f)

puts("UNEXPECTED BEHAVIOR") ;

else

puts("ANSI ALIASING BEHAVIOR") ;

return 0;

}

In this case, the program always prints “ANSI ALIASING BEHAVIOR” regardless of the compiler and its optimization options.

8027948 Rev 15 69/166


4.5 Profiling

Before optimizing any application, we recommend that you analyze the critical areas of your code to identify where optimization will have the most effect.

Profiling creates an instrumented program from your source code. Whenever this instrumented code is executed, the program generates an information file that can be displayed using the stxp70-gprof utility, supplied with the toolset.

4.5.1

4.5.2

Warning: Note that the functions in the toolset libraries (most especially, the standard C library) are not instrumented for intrusive profiling. Therefore, the time and cycles spent in the library functions is assigned to the caller functions in the application.

This section is not a complete guide to profiling, but a brief refresher on how to proceed with the compiler.

Profiling data generation

Profiling is enabled by the -pg compiler option. For example: stxp70cc -O2 -pg *.c -o myexe

Using profiling data

The first run of a program compiled using the -pg option generates a file called gmon.out.000

. This file can be viewed with the stxp70-gprof utility.

After each run in the same directory, the numerical suffix of gmon.out.000 is incremented.

The profile information for the next run is therefore gmon.out.001, and so on.

•

•

•

Note that a second file named stprof.out.xxx is also created. This file provides timing measurements related to the call tree. The following data are available: basic_time: only the time spent within a function callcost_time: time spent in the function and its children count: number of function calls

The symbolic information available in the profile information can be augmented by using the

-g

option when compiling the source code.

Users who are familiar with the standard gprof tool may use gprof to read the profiling output file. In this case, it is necessary to pass the option --graph to the tool: gprof --graph myexe gmon.out.000

70/166 8027948 Rev 15

UM1237

4.5.3


Special case of programs that never exit

•

•

Usually, profiling data are generated at program exit. Many embedded applications, however, are built as infinite loops and thus never exit. To enable profiling of such applications, the toolset provides a dedicated function named UserProfilingWrite().

When this function is called, it updates the following profiling output files: call-graph file gmon.out.xxx time profiling file stprof.out.xxx

In those file names, xxx stands for a magic number that is incremented each time this profiling function is called. It is only possible to use the function UserProfilingWrite() if the correct toolset header file gprof.h is included in the source code:

#include <gprof.h>

Warning: We recommend that you use UserProfilingWrite() outside critical or very often executed loops. It should be called only a few times in a program. Be aware that a call to this function may have side effect on compiler optimizations, and may therefore bias results if placed in critical parts of the code.

The profiling functions make use of the 64-bit cycle counters of the STxP70 core, and the value of the counter is read each time a function is entered and exited. Therefore, using those counters must be avoided when profiling is enabled. The predefined profiling macro __LIBGPROF_CYCLE_PROFILING

(which is automatically defined when -pg option is set) can be used to protect the user-defined instrumentation code based on cycle counters.

This small code sample below illustrates how to use this macro to avoid conflict between profiling and user instrumentation involving cycle counters:

#ifndef __LIBGPROF_CYCLE_PROFILING clrcc(); startcc();

#endif

8027948 Rev 15 71/166


Usually, program instrumentation dedicated to profiling does not require any more heap bytes than specified in the standard link script. However, in some specific applications – in particular when involving a large number of routines – the standard heap size may be too small. If this happens, the following message can appear at application run-time:

ERROR : profiling : cannot malloc profiling stack of XXX bytes: please increase heap!

To overcome this problem, edit the link script file associated to your application and increase the padding of .heap section. By default, the .heap section contribution line is:

.heap ALIGN(16) PAD(64K) NOINIT : { } > EXTSM

This means that the.heap section base is aligned on a multiple of 16 boundary address, is

64 Kbytes in size and not zero-initialized at startup. Moreover, this section is located in

EXTSM memory region. To increase the padding of this contribution, you should change the

64K by something bigger depending on the XXX amount required, as shown in the error message above.

Please note that if you do not specify a link script on your link command-line, the sx_valid.ld

file used by default is the one located in the folder:

<Toolset_Root>/arch_v3/stxp70cc/<stxp70cc_version>/lib/ldscript

Copy this file into your application project, modify its content according to statements above and add it to your link command.

The toolset provides several options to generate test coverage data that can be used with the GNU gcov test coverage program. Both the -ftest-coverage and -fprofilearcs

options produce data files that can then be input to gcov. See the Using the GNU

Compiler Collection (GCC) manual provided with this product for a description of how to apply code coverage techniques.

Table 26.



-fbranch-probabilities

Re-compile a program that has already been compiled with the fprofile-arcs

option. The -fbranch-probilities option instructs the compiler to optimize using estimated branch probabilities generated by -fprofile-arcs.

-fcoverage-counter64

Instruct the compiler to use a 64-bit edge counter instead of the default 32-bit counter. Each counter is saved as 64 bits and so the output can still be used with any gcov utility. Use this option if you think a statement is executed more than 2

32

times.

72/166 8027948 Rev 15


Table 26.


(continued)


-fprofile-arcs

-ftest-coverage

Instrument the "arcs" of the program flow during compilation. For each function of your program, stxp70cc creates a program flow graph, then finds a spanning tree for the graph. Only arcs that are not on the spanning tree have to be instrumented; the compiler adds code to count the number of times that these arcs are executed.

-fprofile-arcs

also makes it possible to estimate branch probabilities, and to calculate basic block execution counts. In general, basic block execution counts alone do not give enough information to estimate all branch probabilities.

When the program exits, -fprofile-arcs saves a list of arcs in the program flow graph to a file called sourcename.gcda. gcov can reconstruct the program flow graph and compute all basic block and arc execution counts from the information in this file.

Use the compiler option -fbranch-probabilities when recompiling to apply further optimizations.

Create a data file for the GNU gcov code coverage utility. The name of the data file begins with the name of your source file:

sourcename.gcno

. It contains a mapping from basic blocks to line numbers, which gcov uses to associate basic block execution counts with line numbers.

Note:

When recompiling, you must use the same code generation and optimization options for both compilations. The only difference allowed is to replace -fprofile-arcs with fbranch-probabilities

.

When running Interprocedural analysis, all the sources are merged into a unique file (or several files for large programs). Therefore, the compiler is unable to know which procedure belongs to which .c or .cxx file. The correspondence between a .c or .cxx and a .gcno or .gcda file is no longer possible. The name of .gcda and .gcno files is the name of the final executable, plus “_”, plus the number of the .s file that IPA has created. Since all the original .c or.cxx filenames are saved in the .gcno file, gcov is able to associate each procedure with a source file.

You will need a copy of gcov with a version number higher than or equal to 3.4.4.

8027948 Rev 15 73/166


This section describes the use of the options -finstrument-functions and minstrument-calls

.

The -finstrument-functions option provides standard GCC functionality. Using this option generates instrumentation calls for entry and exit to functions.

Just after function entry and just before function exit, the following profiling functions are called with the address of the current function and its call site: void __cyg_profile_func_enter (void *this_fn, void *call_site); void __cyg_profile_func_exit (void *this_fn, void *call_site);

The first argument is the address of the start of the current function. This may be looked up specifically in the symbol table.

The second argument is the address of the call site from where the current function was invoked. It corresponds to an address in the range of the caller function addresses that may be found in the symbol table of the executable.

Functions that are inlined by the compiler are not instrumented. To force instrumentation of all functions, use the -fno-inline option to disable inlining.

A function may be given the attribute no_instrument_function, in which case instrumentation is not done for this function. This can be used, for example, for the profiling functions listed above, high-priority interrupt routines, and any functions from which the profiling functions cannot safely be called (perhaps signal handlers, if the profiling routines generate output or allocate memory).

The program must be linked with an object file that implements the two functions above to link correctly.

Note: The option -minstrument-calls is not a standard GCC option.

Use this option to generate instrumentation calls just before, and just after each function call.

74/166 8027948 Rev 15


The following profiling function is called with the address of the caller function and the address of the callee function: void __profile_cal(void *caller_fn, void *callee_fn, const char *caller_name, const char *callee_name, int event);

The arguments to this function are as follows: caller_fn

This is the address of the start of the current function (the caller function), which can be looked up specifically in the symbol table.

callee_fn caller_name

This is the address of the start of the called function (the callee function), which can be looked up specifically in the symbol table.

This is the name of the caller function.

callee_name event

This is the name of the callee function, or NULL if the call is an indirect call.

The function names passed in the third and fourth arguments are pointers to static strings that have the lifetime of the instrumented executable or shared object.

The function names are the mangled names in C++.

This is 0 when this function is invoked just before a call, instrumenting a function entry. It is 1 when this function is invoked just after a call, instrumenting a function exit.

Function calls that are inlined by the compiler are not instrumented.

To force instrumentation of all functions use the -fno-inline option to disable inlining.

A function may be given the attribute no_instrument_function, in which case this instrumentation is not done if the caller or the callee function has the attribute no_instrument_function

.

The program must be linked with an object file that implements the function above to link correctly.

•

•

•

•

The main differences with the -finstrument-functions option are listed below.

This instrumentation tracks (caller, callee) address pairs instead of (call_site, callee) address pairs. If the call site information is required use the -finstrumentfunctions option.

This instumentation provides the caller and callee name when available, which avoids a specific post processing pass to retrieve the function names.

This instrumentation is at the call site and not in the callee, therefore for instance calls to top level library functions (which are not instrumented) are seen while the option finstrument-functions

does not see them. To disable the instrumentation of the call to a particular library routine you must declare it with the no_instrument_function

attribute.

This instrumentation is not standard GCC functionality.

8027948 Rev 15 75/166


4.8 Interprocedural analysis optimization (IPA)

UM1237

The -ipa option enables interprocedural analysis. With this option enabled, the compiler identifies opportunities for optimization across module boundaries. It does this by extending its scope for optimization and inlining from a single module to multiple modules.

Warning: The -ipa option in addition to the required optimization level must be included in both the compiler and linker phases.

•

•

•

•

The major benefits of IPA are: interprocedural constant propagation interprocedural alias analysis inter-module inlining interprocedural placement of data in specific memory spaces. On the STxP70, the possible spaces are DA and SDA. These can be controlled manually using options and

attributes already described (see also

Table 6: Generic options with -M flag on page 18

and the section entitled

memory on page 102

). The command line options that control

manual memory placement (such as -Mda and -Msda) are ignored when automatic placement is enabled.

A more advanced use of IPA is function specialization (also known as cloning).

The only mandatory option to trigger IPA compilation is -ipa.

The compilation and link time is longer because much of the optimization work is driven from the linker. This can be observed by using the -v compiler option.

•

•

•

•

The following steps are performed when building an executable in IPA mode: the .c files are translated into special .o files the .o files are merged together (code, symbol table) the .o files are analyzed and optimized the final link is performed

Because IPA mode optimizations are carried out by the linker as well as the compiler, the optimization is carried out only if the appropriate command line options are passed to both the linker and the compiler. It may, therefore, be necessary to modify the Makefile accordingly.

76/166 8027948 Rev 15

UM1237

4.8.2


IPA command line options

Table 27

describes advanced IPA options.

Table 27.

-dryipa

Advanced IPA options

Option

-IPA:aggr_cprop=ON|OFF

-IPA:cgi=ON|OFF

-IPA:cprop=ON|OFF

-IPA:depth=n

-IPA:dfe=ON|OFF

-IPA:dve=ON|OFF

-IPA:forcedepth=n

-IPA:inline=ON|OFF

-IPA:keeplight=ON|OFF

-IPA:maxdepth=n

Description

The -dryipa option replaces the -dryrun option, which is no longer relevant for IPA. The -dryipa option dumps details of the different steps invoked by the driver.

Enable or disable aggressive inter-procedural constant propagation. This option attempts to avoid passing constant parameters, replacing formal parameters by their corresponding constant values. The default in ON.

Enable or disable constant global variable identification. This option marks non-scalar global variables that are never modified as constants, and propagates their constant values to all files. The default is ON.

Enable or disable inter-procedural constant propagation. This option identifies formal parameters which always have a specific constant value. The default is ON. See also

-IPA:aggr_cprop

.

This option is identical to -IPA:maxdepth=n

Enable or disable dead function elimination. This option removes subprograms which are never called from the program. The default is ON.

Enable or disable dead variable elimination. This option removes variables which are never referenced from the program. The default is ON.

Set inline depths. Instead of the default inlining heuristics, this option directs IPA to attempt to inline all functions at a depth of

(at most) n in the call graph, where functions which make no call are at depth 0, those which call only depth 0 function are at depth 1, and so on. This ignores the default heuristic limits on inlining.

Perform inter-file subprogram inlining during main IPA processing. The default in ON.

Direct IPA not to send -keep to the compiler, in order to save disk space. The default is ON. Setting it to OFF leaves intermediate files in a directory which has the name of the final executable but suffixed with .ipakeep.

Direct IPA not to attempt to inline functions at a depth of more than n in the call graph, where functions which make no call are at depth 0, those which call only depth 0 functions are at depth

1, and so on. Inlining remains subject to overriding limits on code expansion. See also forcedepth, space and plimit.

8027948 Rev 15 77/166


Table 27.

Advanced IPA options (continued)


-IPA:mem_placement=ON|OFF

Enable or disable automatic placement of variables into the special SDA and DA memory spaces. This STxP70 specific optimization results in a more efficient address construction in the use of GP-based instructions. Default is ON when optimization level is O2 or higher (O2, O3, O4 and Os), OFF otherwise. Command line options that control the manual memory placement are ignored when automatic memory placement is enabled.

-IPA:mem_array=ON|OFF

-IPA:mem_struct=ON|OFF

-IPA::SDAspace=n

Enable or disable automatic placement of array variables into special memory spaces.

Enable or disable automatic placement of structure variables into special memory spaces.

Set the size of the SDA memory space to n bytes (the default is

4096).

-IPA::DAspace=n

-IPA:multi_clone=n

-IPA:node_bloat=n

-IPA:plimit=n

-IPA:space=n

-IPA:specfile=filename

Set the size of the DA memory space to n bytes (the default is

32768).

Specify the maximum number of clones that can be created from a single procedure. By default, this value is 0. Aggressive procedure cloning may provide opportunities for interprocedural optimization, but it also may significantly increase the code size.

When used in conjunction with -IPA:multi_clone, n this option specifies the maximum percentage growth of the total number of procedures relative to the original program.

Stop inlining in a particular subprogram when it reaches a size of n bytes in the intermediate representation. The default is

2500.

Stop inlining when the program size has increased by n%. For example, space=20 limits code expansion due to inlining to approximately 20%. The default is 100%.

Open filename to read more options. A spec file contains zero or more of IPA options.

4.8.3 Limitations and special cautions

IPA and debug options

IPA optimization is not compatible with the -g compiler option. If both options are passed to stxp70cc

, then the -ipa option is automatically disabled by the driver, and debugging information is generated.

IPA and compilations stages

The full benefit of IPA optimization is obtained only if both the compilation and the link stages receive the -ipa option and the optimization level in command line. This is particularly true when existing makefiles have separate stages and flags for compilation and link stages.

78/166 8027948 Rev 15


IPA memory placement versus options and attributes

•

•

•

The manual placement of variables in the special memory spaces takes precedence over the automatic placement. The automatic placement takes precedence over the command line options that control manual memory placements. For instance: the automatic placement does not operate on a variable if an attribute instructs the compiler to place it manually in a specific memory space if the memory spaces are already filled with variables placed manually as a consequence of either attribute, the automatic placement has no effect if manual memory placement and automatic memory placement options are passed to the compiler, then the options that control the manual memory placement are ignored

This section describes the stxp70cc command line options for controlling floating-point.

•

•

The IEEE754 standard defines two types of floating-point representation:

The "single precision" is a 32-bit representation. It corresponds to the float data type in

C.

The "double precision" is a 64-bit representation. It corresponds to the double data type in C.

By default, a C compiler considers that floating-point calculations must be performed with double precision, unless explicitly specified by the programmer. Furthermore, if any 32-bit floating-point data is encountered in a floating-point calculation, it is promoted to 64-bit precision. This aims at ensuring that the maximum precision is preserved.

Syntax

•

•

In a program which must only use 32-bit floating-point arithmetic, a programmer should: declare all floating-point variables as 32-bit variables, that is "float" use only 32-bit floating-point constants, that is, use the "F" suffix (for example, "5.3F" is interpreted as a 32-bit constant, whereas "5.3" is considered as a 64-bit constant).

Limitation and options

•

•

When the mechanism for controlling floating-point precision is only implemented by syntax this can cause problems: many programmers are not aware that floating-point constants without the F suffix are interpreted as 64-bit constants if the whole precision of a program needs to be modified, then all types and constants may have to be changed, which may be tedious

The option -fshort-double is to be used to change the default behavior of the compiler, and assume that floating-point arithmetic must be carried out in 32-bit arithmetic, even if

"double" types or constants without the F suffix are used.

8027948 Rev 15 79/166


4.9.3

4.9.4

UM1237

The option -mlib-short-double is to be used when specific libraries are provided to support short double code generation. On the STxP70, this option is deprecated, since it is forced to fit the default code generation setting. It is preserved mainly for legacy reasons.

Use of STxP70 with FPx

On any core without specific floating-point support, performing floating-point calculations in

32-bit or 64-bit arithmetic mainly results in calling different runtimes, or in different expansion of floating-point operations. This has a limited impact on performance.

On cores with 32-bit floating-point support, the problem is different. A program with 64-bit floating-point arithmetic cannot use the floating-point support of the core, which means that it will call the runtime instead. This is the case for the STxP70 with the FPx floating-point extension.

In other words, the FPx can be used efficiently only when floating-point arithmetic is 32-bit.

This is why it is highly recommended to use the option -fshort-double when the FPx is used, because it ensures that all floating-point computations are performed using 32-bit precision.

From the STxP70 toolset 4.1.0 onwards, a warning is emitted if the FPx is used without this option.

On the STxP70, -mlib-short-double is deprecated and no longer has effect. It is still recognized for legacy reasons.

Examples of floating-point arithmetic on the STxP70

Example 1: effect

Consider the following functions: float fct (float A)

{

return A * 5.3;

}

If this code is compiled with the option -O3 -Mextension=fpx, then the compiler generates the following code:

.global fct fct:

pushrl LK ;;

subu R15, R15, 4 ;;

.LEH_post_adjust_sp_fct_1:

callr __stod ;;

L_BB2_fct:

make R2, 13107 ;;

more R2, 13107 ;;

make R3, 16405 ;;

more R3, 13107 ;;

callr __muld ;;

L_BB3_fct:

addu R15, R15, 4 ;;

poprl LK ;;

jr __dtos ;;

80/166 8027948 Rev 15

UM1237

Note:


Because the default compiler behavior is 64-bit floating-point, the constant is considered 64bit, and the whole calculation is promoted to 64-bit. As a consequence, the multiplication is performed due to the 64-bit runtime. The FPx cannot be used although this was specified in the command line.

Example 2: adding -fshort-double

Adding the option -fshort-double to the command line modifies the default behavior of the compiler and the floating-point calculations are all performed in 32-bit. The resulting code now makes use of the FPx:

.global fct fct:

L_BB1_fct:

.global fct fct:

L_BB1_fct:

make R0, 16553 ;;

more R0, 39322 ;;

fmvr2f F1, R0 ;;

fmul F0, F0, F1 ;;

rts ;;

Example 3: specifying 32-bit floating-point using only syntax

Alternatively, the same result could be reached by modifying the source code as follows, and compiling without the -fshort-double option: float fct_float (float A)

{

return A * 5.3F;

}

When -fhsort-double is used, "double" data types are interpreted as 32-bit floatingpoint. This means that the following function, compiled with -O3 -Mextension=fpx fshort-double

, will lead to the same result as in

Example 2: adding -fshort-double , and

thus effectively makes use of the FPx:

double fct (double A)

{

return A * 5.3;

8027948 Rev 15 81/166


This section introduces and describes application configuration files (ACF), which facilitate the fine tuning of compiler options in files and functions.

4.10.1 General description and purpose

Note:

Open64 based compilers do not allow fine grain option settings. This means that, except for pragmas and attributes (such as inlining) that are already implemented, compiler options apply to all functions in a file, and to all files on a command line.

•

•

When IPA is not enabled, this limitation can be partly worked around: by using different command lines to generate object files by splitting code into different files if particular functions must be compiled with different options

On STxP70-v4, some optimizations are performed at linker or post link level. Those optimizations can depend on compilation options. Applying different options at compiler and linker post/linker level must be made with caution.

In any case, when IPA is enabled, this workaround cannot be applied. This may be problematic for the debugging and fine tuning of large applications. It is not easy to be implemented either in the context of the STWorkbench.

The application configuration files (ACF) have thus been implemented to apply specific compiler options to the different files and functions. The full set of options to be applied to the files and functions of the same application is called a configuration. An application configuration file can define several configurations, corresponding to different tuning scenarios. Those configurations can then be selected by a dedicated compiler option.

Principles and overview of the implementation

The implementation of application configuration files takes place directly at compiler driver level. It allows a fine grain, options control at a global level, file level and function level.

An application configuration file contains structured information to be attached to the corresponding functions or files.

It is read by the compiler if specified by a dedicated option. Then it is parsed by the driver, which applies the options at the requested level.

An ACF reproduces part of, or the whole of the application it is designed for, by listing files and functions names in a configuration. It can contain several configurations, and only one will be active during a compilation phase.

Figure 13

shows an example of an application configuration file.

82/166 8027948 Rev 15


Figure 13. Example application configuration file

configuration "c1" { // Starts the definition of a configuration called c1

-Os

// Option defined for all the application file "f1" { // Configuration specification for file f1

-O3 // In file f1, use speed optimization level function "foo" { // Configuration specification for function foo

-O2

-CG:if_conv=false // In function foo, disable if-conversion

}

}

} configuration "c2" { // Other configuration

-O3

} active configuration "c1"

•

•

•

In the example in

Figure 13

, notice the definition of two possible configurations "c1" and

"c2".

If configuration "c1" is applied, then all files are compiled with the -Os option, except file "f1", which is compiled with the option -O3. Furthermore, function "foo" in file "f1" is compiled with the option -O2, and if conversion is disabled.

If configuration "c2" is applied, then all files are compiled with option -O3, without any exception.

By default, configuration "c1" is applied as the active configuration. The configuration

"c2" can be activated by a dedicated compiler option (see

Section 4.10.4: Using the

ACF on page 85

).

Listing files or functions

It is possible to use a list of files or functions in a configuration, if several files (or functions) have to be compiled with the same set of options. The wild character asterisk "*" can be used in the names of files (or functions) to catch regular expressions. For example, an ACF could contain a section, such as the one shown in

Figure 14

.

Figure 14. Listing files and functions

file "f*" {//Configuration specification for all files with a name starting with 'f'

-Os // In those files, use speed optimization level function "foo1" "foo2" "foo3" {// Configuration specification for function

// foo1, foo2 and foo3

-O3

}

}

In this case, all files whose name is prefixed by an "f" are compiled with the option -Os.

Functions "foo1", "foo2", "foo3" are compiled with the option -O3.

8027948 Rev 15 83/166


configuration_file ::= configuration_file configuration | configuration_file active_configuration active_configuration ::= active configuration string configuration ::= configuration string { one_configuration } | configuration string { } one_configuration ::= one_configuration file_conf | one_configuration global_option | file_conf | global_option global_option options file_conf files_name

::= options

::= <list of compiler options>

::= file files_name { one-file_conf } | file files_name { }

::= files_name string

| string

| <nothing> one_file_conf file_option func_conf one_func_conf string

::= one_file_conf func_conf | one_file_conf file_option | func_conf | file_option

::= options

::= function files_name { one_func_conf }

::= one_func_conf option |

<nothing>

::= " <characters> "

UM1237

84/166 8027948 Rev 15


4.10.4 Using the ACF

Note:

Note:

Compiling with an ACF

The option -macf-decl can be used to instruct the compiler to read and use an ACF: stxp70cc -macf-decl my_acf.acf

The driver then parses the given file and applies defined options at the requested level, provided that a default configuration is defined in the file.

Options defined in a configuration file take precedence over options defined on the command line (or in an STWorkbench session).

Specifying the active configuration

•

•

The active configuration can be specified by two different means:

Using the dedicated keyword in the ACF: active configuration

"string"

For example: active configuration "c1"

Using the compiler option:

-macf-active string

For example: stxp70cc -macf-decl my_acf.acf -macf-active c1

The -macf-active option takes precedence over the active configuration keyword

in the ACF.

Some warnings are emitted if no active configuration can be actually selected and applied.

In this case the ACF is ignored.

Creation of the ACF template

Even if the syntax is quite simple, writing the ACF for a large application can be a tedious work. Thus, it is possible to automatically create the template of the ACF to be used on a given application by using the dedicated option -macf-template.

For example, the following command creates the template of the ACF needed to compile an application implemented in four source files; the template is created with the constant name template.acf

: stxp70cc -macf-template file1.c file2.c file3.c main.c

This file lists all files and functions present in the application in a single configuration, with no specific option. It also defines this configuration as the default one and names it “c1”.

The file template.acf is created locally, in the compilation directory. If a file with this name already exists in this folder, the new content may be appended.

The template file remains incomplete until the link stage is run. This enables it to be appended to, by subsequent compilation steps. It is, only when the template is linked, that it is closed and cannot be further appended to. The mechanism for appending to and closing

the template file is described further in


.

8027948 Rev 15 85/166


Summary

•

•

•

There are three ways to handle an ACF, demonstrated by the following examples: stxp70cc -macf-decl acf_filename.acf

Reads acf_filename.acf as an ACF, using the default configuration declared in the file as the active configuration. stxp70cc -macf-decl acf_filename.acf -macf-active c1

Reads acf_filename.acf as an ACF file, and uses the command line option to define the active configuration as c1. Configuration "c1" must be defined in the ACF acf_filename.acf

.

stxp70cc -macf-template source_file1.c source_file2.c source_file3.c source_main.c

Generates the ACF template for the application implemented by the source files specified. The source files must be linkable, and the compilation include a link stage to ensure that template is complete. For example: stxp70cc -macf-template source_file1.o source_file2.o source_file3.o source_main.o

4.10.5 Behavior of -macf-template option

Note:

The use of the -macf-template option is introduced in

Creation of the ACF template on page 85

.

The configuration defined and considered as the default in the template file is always named

"c1".

The behaviour of the -macf-template option depends on whether a template file already exists and also on whether it is considered complete and closed.

If the template.acf file is generated by one or more compilations without a link stage, the template file remains incomplete (and unusable) until the link stage is run.

Case 1: template.acf does not exist

1.

The following command is issued to create a file template.acf: stxp70cc -c -macf-template foo1.c

– this template contains the definition of a configuration "c1" for file foo1.c and all functions herein

– the closing bracket for "c1" is missing, and default configuration is not declared

2. The following command is now used to create a template for the file foo2.c: stxp70cc -c -macf-template foo2.c

– we have the pre-existing file template.acf created by the command in step

1.

– this new command appends the information related to file foo2 and all functions herein to the configuration "c1" of pre-existing file template.acf

– the closing bracket for "c1" and the declaration of the default configuration are still missing

86/166 8027948 Rev 15


3. Finally the following command is used to close the template and link it: stxp70cc -macf-template foo1.o foo2.o

– this last command only invokes the link stage. The file template.acf is closed, with “c1” declared as the default configuration

Steps

1.

to

2.

above generate the same file template.acf as the equivalent unique command: stxp70cc -macf-template foo1.c foo2.c

Case 2: template.acf exists and is closed

If the creation of a template is run with an existing, complete and closed template.acf file in the current folder, then the syntax will be invalid, and the parser will reject the resulting configuration file with an error message.

Makefiles

•

•

Compilation through makefile performs independent calls to the compiler to generate object files before linking. In this context, the generation of an ACF template requires an incremental behavior. The mechanism of the template generation tests if the template file template.acf

exists in the compilation directory. If it exists, it opens it in append mode.

Otherwise, it creates it. At the linker or archive creation stage, the following actions are performed: the template file is closed from a syntactical point of view (close of last '}', and the active configuration

lines are written) buffer and file are closed from a file system point of view

If the compilation does not end with a linker or archive creation stage (only use of the -S or

-c

option), then the buffer is flushed, the file is closed, but the file is not closed from a syntactical point of view. Since it does not end with the expected pattern, the corresponding template is not usable.

4.10.6 Scope and known limitations

Compiler options

Most stxp70cc compiler options, both external or internal can be used in the ACF.

Nevertheless, it would not make any sense to apply some of the options to only a subset of the files or functions. This is especially true for the compiler options which describe the hardware configuration.

•

•

•

The following options are not taken into account at file or function level:

-Mconfig

options: These options describe the hardware setup used to run the binary file to be generated by the compiler. Since this hardware is the same for all the parts of the code, those options should be the same in all files and functions. They are taken into account if they are defined at the global level of an ACF. They are ignored if they are defined only for some files or functions.

-Mextension

options: These options describe which extensions are available on the hardware, and can be used to generate the code. Like the-Mconfig options, they are accepted at global level, but discarded at file or function level.

-Mmode16

or -Mmode32: This option does not describe the hardware configuration, but rather the registers to be used during code generation. This option is accepted at

8027948 Rev 15 87/166


global and file levels, but not at function level. This is linked to technical reasons in relation with ABI handling (register saving at entry and exit of functions), which must be consistent over the whole application.

Inliner

The inliner operates on a full compilation unit and then takes into consideration the optimization level specified at global or file level, but not at function level. As a result, when using ACFs, we can get different assembly code for a given function. Depending on the scenario used, the function can apparently be compiled twice at the same optimization level.

For instance, consider the file f1.c: int foo1() { return 1; } int foo2() { return 2; } int foo3() { return foo1() + foo2(); }

1.

First scenario

With this first scenario, the file is compiled by the following command line, based on a global -Os option: stxp70cc -Os -c f1.c

Here foo3() is compiled using -Os.

Assembly code for foo3() contains calls to foo1() and foo2(), which are not inlined because of -Os.

An ACF acf1.acf is defined with the following directives: file "f1" { function "foo3" { -Os }

}

Code is compiled using this ACF: stxp70cc -O3 -c -macf-decl acf1.acf f1.c

Here foo3() is compiled using -Os.

Assembly code for foo3() does not contain calls to foo1() and foo2(), which are inlined because of -O3, which is visible to the inliner.

Code is compiled with option -O3: stxp70cc -O3 -c f1.c

Here foo3() is compiled using -O3.

Assembly code for foo3() does not contain calls to foo1() and foo2().

88/166 8027948 Rev 15


An ACF acf1.acf is defined with the following directives: file "f1" { function "foo3" { -O3 }

}

Code is compiled using this ACF: stxp70cc -Os -c -macf-decl acf1.acf f1.c

Here foo3() is compiled using -O3.

Assembly code for foo3() contains calls to foo1() and foo2(), which are not inlined because -Os is visible to the inliner.

Intuitively, the user might expect to have the same code for scenario 1 and 2, as well as for scenario 3 and 4, but this will not be the case because of the implementation of inlining.

8027948 Rev 15 89/166

GNU C extensions supported by stxp70cc

5 GNU C extensions supported by stxp70cc

UM1237

•

•

•

GNU cc provides a large set of extensions that are widely used in the GNU Linux community. These extensions can be used to: describe embedded features, for example, data section placement provide guidance to the compiler for optimization, for example, the noreturn function provide language extensions, for example, conditional lvalue or C99 features

The GNU extensions are sometimes the only way to access ELF features that are not directly available in the C language; for example, to declare a symbol as weak.

5.1.1

stxp70cc provides several language features not found in ANSI standard C. (The pedantic

option directs stxp70cc to print a warning message if any of these features are used.) To test for the availability of these features in conditional compilation, check for a predefined macro __GNUC__, which is always defined under stxp70cc.

It is recommended to always put code containing stxp70cc extensions under the C preprocessor macro __GNUC__.

#if __GNUC__

/* Original GNU code */

#else

/* Work-around code */

#endif

Statements and declarations in expressions

Statements and declarations in expressions allow complicated C statements to be written and used as if they were a simple C expression, optionally returning a result value. Local declarations and labels may be embedded.

This provides a way to construct a safe preprocessor macro that comprises several statements, without using the do { } while(0) trick that swallows the semi-colon.

#define cfoo() \

( { int y = foo (); int z; \

if (y > 0) z = y; \

else z = - y; \

90/166 8027948 Rev 15

UM1237 GNU C extensions supported by stxp70cc

When GNU extensions are used in conjunction with expression statements and macros, they enable service labels to be used, that is, labels whose scope is limited to the current statement. See

Figure 15

.

Figure 15. Locally declared labels example

#define SEARCH(array, max, target) \

({ \

__label__ found; \

typeof (target) _SEARCH_target = (target); \

typeof (*(array)) *_SEARCH_array = (array); \

int i, j; \

int value; \

for (i = 0; i < max; i++) \

for (j = 0; j < max; j++) \

if (_SEARCH_array[i][j] == _SEARCH_target) \

{ value = i; goto found; } \

value = -1; \ found: \

value; \

})

5.1.4

The address of a label defined in the current function, or a containing function, can be

obtained with the extended && unary operator that has type void*. See

Figure 16

.

Figure 16. Labels as values example

const char * cgoto(int i)

{

void *ptr = &&foo;

static void *array[] = { &&foo, &&bar, &&hack };

goto *array[i]; foo: bar:

} hack:

return "hack" ;

Naming an expression's type

A name can be given to the type of an expression using a typedef declaration with an initializer. To define name as a type name for the type of expression, do: typedef name = expression;

8027948 Rev 15 91/166


Note:

UM1237

This can be used in conjunction with the statements-within-expressions feature described in

Section 5.1.1

. For example, to define a safe “maximum” macro that operates on any

arithmetic type:

#define max(a,b) \

({typedef _ta = (a), _tb = (b); \

_ta _a = (a); _tb _b = (b); \

_a > _b ? _a : _b; })

The reason for using names that start with underscores for the local variables is to avoid conflicts with variable names that occur within the expressions that are substituted for a and b

.

In the future the GNU C language may include a new form of declaration syntax that allows the declaration of variables whose scopes start only after their initializers; this will be a more reliable way to prevent such conflicts.

typeof

allows you to refer to an object data type by referring to an object of that type. It is particularly useful to write generic and safe macro-definitions, which can then be applied to various primitive types or user-defined data types. Without this extension, it is necessary to define as many specific macros as the number of different types used in calls to the generic macro.

#define max(a,b) ({ \

typeof (a) _a = (a); \

typeof (b) _b = (b); \

_a > _b? _a: _b; \

})

Compound expressions, conditional expressions and casts are allowed as lvalues provided their operands are lvalues. For example:

(a, b) += 5;

The middle operand in a conditional expression may be omitted, for example: z = x? : y; long long

support (integer 64-bits) is supported by the stxp70cc compiler. It is now also an ISO C99 feature. long long x;

92/166

Floating-point numbers are written in hexadecimal format: float f = 0x1.fp3;

8027948 Rev 15


5.1.10 Specifying a register for a local variable

A register in either the core or an extension may be specified for a local variable, for example:

// R6 core register allocated to the myvar long variable

register long myvar asm ("r6") = name;

Note:

// The part number 1 of 128-bit width in the register 2

// of the register class D of the user defined extension MP2x

// is allocated to the variable myvarext

register MP2x_DP myvarext asm ("D2_P1");

The syntax for extension register specification is described in details in

Syntax of scalar/SIMD audio extension register lists on page 110

.

The extension multi-level register must always be specified using the smallest subpart syntax. It is however possible to allocate a top level register. In this case, the specified sub register must be the first one of the group composing the full register. For instance:

// declare a variable at level P allocated to D2_P1

register MP2x_DP var64 asm ("D2_P1");

// declare a variable at level X allocated to D1_P0 and D1_P1

register MP2x_DX var128 asm ("D1_P0");

8027948 Rev 15 93/166

GNU C extensions supported by stxp70cc UM1237

5.1.11 Array of length zero

Zero length arrays are allowed in GNU C. They are very useful as the last element of a

structure which is really a header for a variable length object. See

Figure 17

.

Figure 17. Zero length array example

#include <stdio.h>

#include <stdlib.h> struct line {

int length;

char contents[0];

}; struct line *newline( unsigned int this_length)

{

struct line *thisline = (struct line *)

malloc (sizeof (struct line) + this_length);

}

thisline->length = this_length;

return thisline ; void delline(struct line *thisline)

{

}

free(thisline) ; int main(int argc, char *argv[])

{

enum { __MAXL = 128 } ;

enum { __L = 16 } ;

struct line *lines[__MAXL] ;

int i ;

printf("sizeof(line) : %d\n", sizeof(struct line)) ;

for(i=0; i< __MAXL; i++) {

lines[i] = newline(__L) ;

}

for(i=0; i< __MAXL; i++) {

}

puts("Done.") ;

}

return 0 ;

94/166 8027948 Rev 15


5.1.12 Array of variable length

An array of variable length is an automatic array defined with a length that is not a constant expression. This type of array is also known as a VLA. See

Figure 18

.

Figure 18. Variable length array example

#include <stdio.h>

#include <stdlib.h> void sadcat(char *s1, char *s2)

{

char str[strlen (s1) + strlen (s2) + 1];

strcpy (str, s1);

strcat (str, s2);

printf("%s + %s == %s\n", s1, s2, str) ;

}

printf ("sizeof(str) = %d\n", sizeof(str)); void tester (int len, char buffer[len][len]) {

int i=0, j=0;

char tt[len][len];

for (i=0; i<len; i++)

}

for (j=0; j<len; j++)

buffer [i][j] = i*j;

printf ("sizeof(tt) = %d\n", sizeof(tt));

printf ("sizeof(buffer) = %d\n", sizeof(buffer)); char data[10][10]; int main(int argc, char *argv[])

{

sadcat("Foo", "Bar") ;

tester (4, data);

tester (10, data);

}

return 0 ;

This extension enables a macro to be defined that can safely be expanded into a function with a variable number of arguments. These macros are also called CPP vararg macros.

For example, the following C program:

#define eprintf(format, args...) fprintf (stderr, format, ##args)

eprintf ("success!\n");

eprintf ("%s%d: ", input_file_name, line_number); is expanded to: fprintf ((&__iob[2]), "success\n!"); fprintf ((&__iob[2]), "%s%d: ", input_file_name, line_number);

8027948 Rev 15 95/166


Note:

UM1237

GNU C supports two types of “variable number of arguments” syntax. The ISO C99 format, which uses __VA_ARGS__ and the GNU format that uses ##args. The ISO C99 format does not support the case where the number of parameters passed as part of the ellipsis is zero. GNU C reuses the ## trick to absorb the comma in this case. See

Figure 19

.

Figure 19. Variable number of arguments example

#include <stdio.h>

#define gnu_eprintf(format, args...) \

fprintf (stdout, "gnu_eprintf " format, ## args)

#define isoc99_eprintf(format, ...) \

fprintf (stdout, "isoc99_eprintf " format, __VA_ARGS__)

#define extended_isoc99_eprintf(format, ...) \

fprintf (stdout, "extended_isoc99_eprintf " format, \

#define errprintf(args...) \

gnu_eprintf ("errprintf " "%s\n", ## args) int main(int argc, char *argv[]) {

/* Try 1, 2, 3 arguments */

gnu_eprintf ("One argument: %s. Done.\n", __FILE__);

gnu_eprintf ("Two arguments: %s:%d. Done.\n", __FILE__, \

__LINE__);

isoc99_eprintf ("One argument: %s. Done.\n", __FILE__);

isoc99_eprintf ("Two arguments: %s:%d. Done.\n", __FILE__, \

__LINE__);

extended_isoc99_eprintf ("One argument: %s. Done.\n", __FILE__);

extended_isoc99_eprintf ("Two arguments: %s:%d. Done.\n", \

extended_isoc99_eprintf ("Three arguments: %s:%s:%d. Done.\n", \

__FUNCTION__, __FILE__, __LINE__);

/* The case with no arguments ... */

gnu_eprintf ("No arguments. Done.\n");

/* The line below causes a syntax error */

isoc99_eprintf ("No arguments. Done.\n");

extended_isoc99_eprintf ("No arguments. Done.\n");

/* Cascade of macros with variable number of arguments */

errprintf (__FILE__);

}

return 0 ;

96/166 8027948 Rev 15


GNU cpp permits string literals to cross multiple lines without escaping the embedded newlines. Each embedded newline is replaced with a single newline character in the resulting string literal, regardless of what form the newline took originally.

The macro definition:

#define MESSAGE \

"Hello, good brave new World!

" would be written under ISO:

#define MESSAGE \

"Hello,\n" \

"good brave new World!\n"

In ISO C99, arrays that are not lvalues still decay to pointers, and may be subscripted.

However, they may not be modified or used after the next sequence point and the unary

operator “&” may not be applied to them. See

Figure 20

.

Figure 20. Non-lvalue arrays example

struct foo {int a[4];}; struct foo f() {

static const struct foo f = { 2, 4, 8, 16 };

return f ;

} void bar (void)

{

int i;

for (i=0; i<4; i++)

}

printf ("f().a[%d] == %d\n", i, f().a[i]) ; int main(int argc, char *argv[])

{

bar ();

f().a[0] = 15;

bar ();

}

return 0 ;

8027948 Rev 15 97/166


5.1.16 Arithmetic on void and function pointers

In GNU C, addition and subtraction are supported by pointers to void and by pointers to functions. The size used for a void or for a function is 1. This means that although sizeof is allowed for void and for a function, it always returns 1. See

Figure 21

.

Figure 21. Arithmetic on void and function pointers example

void f0(void) {} void *p = 0; void (*pf)(void) = 0; bar (void) {

}

p++;

pf++;

printf ("sizeof(void) = %d\n", sizeof(void));

printf ("sizeof(func) = %d\n", sizeof(f0));

As in standard C++ and ISO C99, the elements of an aggregate initializer for an automatic variable are not required to be constant expressions. For example: int foo (int f, int g)

{

}

int beat_freqs[2] = { f-g, f+g };

return beat_freqs[0] * beat_freqs[1] ;

Compound literals used to be called “Constructor Expressions” before ISO C99 normalized them under the term “Compound Literals”. A compound literal looks like a cast containing an initializer. See

Figure 22

.

Figure 22. Compound literal example

#include <stdio.h>

#include <malloc.h> struct foo {int a; char b[2];} ; struct foo * givefoo(int x, int y, char a, char b) {

struct foo * sfoo = (struct foo *) malloc(sizeof (struct foo));

/* Fill in the anonymous struct at once with a Compound Literal */

*sfoo = (struct foo) {x + y, a, b};

}

return sfoo;

GNU C allows initialization of objects with static storage duration by compound literals, whereas ISO C99 does not.

98/166 8027948 Rev 15


This extension was called “GNU Style Labeled Elements in Initializers”. It is now an ISO C99 feature. It allows the initialization of particular elements of an aggregate, a structure or an array, by specifying the member name or the indices of the elements to initialize, in any

order. See

Figure 23

.

Figure 23. Designated initializers example

const int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 }; int a[6] = { [4] 29, [2] = 15 } ; enum { v1 = 1, v2 = 2 , v4 = 4 } ; int b[6] = { [1] = v1, v2, [4] = v4 } ; struct point { int x, y; }; struct point makep(int xvalue, int yvalue )

{

struct point p = { y: yvalue, x: xvalue };

return p ;

} struct point makepp(int xvalue, int yvalue )

{

}

struct point p = { .y = yvalue, .x = xvalue };

return p ;

With GNU C the = character can be omitted after the [index] indication.

Case ranges may be specified with integer value intervals in switch statements.

const char * which (int v) {

switch (v) {

case 0 ... 31: return "Control";

case 'A' ... 'Z': return "Upper";

case 'a' ... 'z': return "Lower";

default: return "None";

}

}

5.1.21 Cast to a union type

A cast to union type is similar to other casts, except that the type specified is a union type.

The type is specified either with the union tag or with a typedef name.

union foo { int i; double d; } u, v; makefoo (int i, double f) {

}

u = (union foo) i;

v = (union foo) f;

8027948 Rev 15 99/166


5.1.22 Dollar signs in identifier names

Dollar signs are allowed in identifier names.

int $a;

5.1.23 Prototypes and old-style function definitions

GNU C extends ISO C to allow a function prototype to override a later old-style nonprototype definition.

int isroot (uid_t); int isroot (x) /* ??? lossage here ??? */

uid_t x;

{

return x == 0;

}

// C++ comment

C++ comments are not recognized by the stxp70cc option -ansi. This is to avoid problems with constructs that contain the forward slash character “//”. For example: x = a //**/b;

5.1.25 Character ESC in constants

The sequence “\e” is recognized in string or character constants as an ASCII <escape> character. char escape = '\e'; char s[] = "\e\e";

5.1.26 Inquiring on alignment of types or variables

__alignof__

allows enquiries about how an object is aligned, or the minimum alignment required by a type or variable.

struct foo { int x; char y; } f; int x = __alignof__ (double); int b = __alignof__ (f.y);

Warning: The STxP70 ABI states that the stack is aligned to a 64 bit boundary. However, for wider extension data types, it is necessary to increase this value. A dedicated attribute aligned_stack is defined for this purpose.

100/166 8027948 Rev 15


An enum type can be defined without specifying its possible values. typedef enum _e e; struct _s {

e* p;

} s; enum _e { red, green, blue, black }; e x;

5.1.28 Function names as strings

GNU cc predefines two magic identifiers to hold the name of the current function. The identifier __FUNCTION__ holds the name of the function as it appears in the source. The identifier __PRETTY_FUNCTION__ holds the name of the function printed in a language specific fashion.

char here[] = "Function " __FUNCTION__ " in file " __FILE__;

5.2 Attributes

Attributes are generally a much better design than a #pragma directive for several reasons.

Firstly, an attribute specification is a piece of C language that can be generated by use of a

cpp macro definition, whereas a #pragma directive generation is generally not supported by non-GNU C preprocessors. Secondly, it avoids the scoping issues of the #pragma directive.

Several attributes can be applied to the same object by using a comma to separate them.

For example, to declare a symbol that is both weak and aliased: void useful (void) __attribute__ ((weak, alias("useful_func")));

5.2.1 Placement and layout section

When applied to a function, places the function in a user-defined section.

void myfunc (void) __attribute__ ((section(".mytext"))); void myfunc (void) {

printf ("From myfunc in .mytext section.\n");

}

When applied to a data object, places the data in a user-defined section.

struct duart a __attribute__ ((section ("DUART_A"))) = { 0 };

Support must be explicitly added in the startup file or system loader to load the newly created section.

8027948 Rev 15 101/166


memory

•

•

•

The STxP70 processor provides several special memory spaces that allow less costly accesses.

Tiny Data Area (TDA)

Data in the TDA is accessed using a single instruction of the form baseaddress+offset

, where offset is expressed in elements. The TDA is based at address 0 (which is byte 4 as accessing address 0 is not possible in C). Due to the way it is accessed, only 32 Kbytes can be placed in the TDA.

Small Data Area (SDA)

Data in the SDA is accessed using a single instruction of the form baseaddress+offset

, where offset is expressed in elements. An element can be a byte, 16-bit word, or 32-bit word depending on the type of the data object. An aggregate of 4,096 elements can be placed in SDA. This can be a mixture of scalars, arrays, and structures of various sizes and with element sizes of byte, 16-bit word, or

32-bit word, but the aggregate number of elements over all entries can not exceed

4,096.

Data Area (DA)

The addresses of data in the DA are build using a single instruction of the form addugp Ri, offset

, where offset is expressed in bytes. An aggregate of 32,768 bytes can be placed in the DA. This can be a mixture of scalars, arrays, and structures of various sizes and with element sizes of byte, 16-bit word or 32-bit word.

Three attributes are defined to instruct the compiler to place a variable in these spaces: int __attribute__ ((memory ("tda"))) x; // x is placed in TDA int __attribute__ ((memory ("sda"))) y; // y is placed in SDA int __attribute__ ((memory ("da"))) z; // z is placed in DA

aligned

When applied to a variable or a structure field, specifies a minimum alignment for a variable or structure field, measured in bytes. The aligned attribute can only increase the alignment; it can be decreased by specifying packed as well.

int x __attribute__ ((aligned (16))) = 0; struct _s { int x[2] __attribute ((aligned (8))); }; short array [3] __attribute ((aligned));

When applied to a type: typedef int more_aligned_int __attribute__ ((aligned(8)));

Warning: It is also possible to make use of a specific syntax for aligned data types. This based on the addition of the _aligned suffix to the type name. This syntax can be applied to any data type, but is especially recommended on SIMD audio extension (see also

Aligned data types on page 118

).

102/166 8027948 Rev 15


aligned_stack

When applied to a function, this attribute specifies that the head of the stack must be aligned to a given boundary. The value provided as an argument corresponds to the number of bytes to which the stack must be aligned. The argument must be a power of 2, strictly greater than 8 and lower than or equal to 256.

For instance the attribute below specifies that the stack of function fct() must be aligned to a 128-bit boundary: void fct() __attribute__ ((aligned_stack(16))); void fct()

{

...

}

Warning: Several means are provided to control the alignment of the stack. It is recommended to refer to

Table 6: Generic options with -M flag on page 18

for the description of the related option and precedence rules. Please note that the compiler is also able to perform self-alignment of the stack on many occasions, taking the size of local variables into account.

weak

When applied to a function, causes the function to be emitted as a weak symbol. Set to 0 if the symbol is not defined at link time. This is primarily of use in defining library functions that can be overridden in user code: void d_stub (void) __attribute__ ((weak)); if (d_stub) {

}

d_stub();

When applied to data, causes the declaration to be emitted as a weak symbol rather than a global symbol. This is primarily of use in defining variables that can be overridden in user code: int debug __attribute__ ((weak)) = 0;

alias

Applies only to functions: The required functionality is to provide an alias name for a given function. It is often used in conjunction with the weak requirement to define an alternate weak name for a given function.

void useful_func (void) {

/* ... Do something ... */

} void useful (void) __attribute__ ((alias("useful_func")));

8027948 Rev 15 103/166


Note:

UM1237

packed

Applies only to data: Specifies that a variable or structure should have the smallest possible alignment - one byte for a variable, and one bit for a field, unless a larger value with the aligned

attribute is specified.

The specified data alignment is applied during data layout, and the code generator emits safe sequence of instructions to avoid causing a misalign trap.

struct foo { char a; int x __attribute__ ((packed)); };

used

•

•

The GCC manual specifies that the used attribute may only apply to functions. For

stxp70cc it may also apply to variables.

The used attribute, attached to a function, means that the code must be emitted for this function, even if this function appears never to be referenced.

This attribute, attached to a variable, means that the definition must be emitted for the variable even if it appears that the variable is not referenced.

The used attribute follows the same syntax as any GCC attribute.

For a procedure: static int Foo() __attribute__ ((used)) ;

For uninitialized data: static foo __attribute__((used)) ;

For initialized data: static foo __attribute__((used)) = 2 ;

The assembly has been specifically extended to support this attribute:

.type Foo, @function, used

.type foo, @object, used

A motivation for using this attribute is to avoid the deletion of an unreferenced symbol by the dead code, dead data or IPA optimization. This can be useful for debugging purposes (for instance, a function dumping a specific data structure that is only called interactively from debugging sessions is removed if not marked as ‘used’, since the compiler does not find any reference to it).

constructor and destructor

Applies only to functions: The constructor attribute causes the function to be called automatically before execution enters main(). Similarly, the destructor attribute causes the function to be called automatically after exit().

void initdata (void) __attribute__ ((constructor)); void terminatedata (void) __attribute__ ((destructor));

104/166 8027948 Rev 15


5.2.2 Optimization

This section only applies to functions.

noreturn

Enables a function to be declared that cannot return, such as abort or exit. It is a useful indication to optimizers.

void byebye () __attribute__ ((noreturn));

malloc

Used to tell the compiler that a function returns a pointer that cannot alias anything. It is a useful indication to optimizers.

void * get_block (int) __attribute__ ((malloc));

Note:

The visibility attributes are supported as follows:

__attribute__((__visibility__("visibility-type")))

__attribute__((visibility("visibility-type"))) where visibility-type can be default, hidden, protected, internal. default

Default visibility is the normal case for ELF. This value is available for the visibility attribute to override other options that may change the assumed visibility of symbols.

hidden protected internal

Hidden visibility indicates that the symbol is not placed into the dynamic symbol table. This means that no other module (executable or shared library) can reference it directly.

Protected visibility indicates that the symbol is placed in the dynamic symbol table, but that references within the defining module bind the local symbol. This means that the symbol cannot be overridden by another module.

Internal visibility is similar to hidden visibility, but has additional processor-specific semantics. For the STxP70, this means that the function is never called from another module.

Hidden symbols cannot be referenced directly by other modules but they can be referenced indirectly by function pointers. By indicating that a symbol cannot be called from outside the module, the compiler may for instance omit the load of a PIC register since it is known that the calling function has already defined the correct value.

8027948 Rev 15 105/166


interrupt and interrupt_nostkaln

•

•

•

The interrupt attribute specifies that a function is an interrupt routine. This imposes: a save/restore of all registers at entry/exit of the function an rte instruction is used to return from the routine (instead of an rts) a proper stack alignment at entry/exit of the routine

The interrupt_nostkaln attribute has the same effect, except that it does not perform any stack realignment.

void __attribute__ ((interrupt)) it_routine_1(...)

{

}

...

format_arg

The format_arg attribute specifies that a function takes a format string for a printf, scanf

, strftime or strfmon style function and modifies it, so that the result can be passed to a printf, scanf, strftime or strfmon style function.

extern char * my_dgettextprintf (void *my_domaint,

const char *my_format) __attribute__ ((format_arg(2)));

mode

This attribute specifies the data type for the declaration whichever type corresponds to the mode. Refer to the GNU Compiler Collection Internals document for the definitions of modes, http://gcc.gnu.org/onlinedocs/gccint .

Use the keywords __byte__, __word__ and __pointer__ to indicate the mode corresponding to these quantities.

unsigned int qi __attribute__ ((mode (QI))); unsigned int w __attribute__ ((mode (__word__)));

106/166 8027948 Rev 15


5.2.5 Built-ins

A built-in is used in the same way a function call, but is expanded by the compiler very early in the intermediate representation, instead of doing a function call. On STxP70, most machine and extension instructions can also be addressed using built-ins. Please refer to

Chapter 7: Built-in functions on page 115

for further information.

__builtin_constant_p

This built-in tests if a value is a constant at compile time.

int x;

#define C 1 int main () {

if (__builtin_constant_p (C) == 1)

printf ("c is proved to be a constant\n");

if (__builtin_constant_p (x) == 0)

}

printf ("x is a not proved to be a constant\n");

return 0;

__builtin_return_address

__builtin_return_address

gets the return address of the currently executing function. void bar () {

printf ("RA = 0x%08x\n", (int)__builtin_return_address (0));

}

__builtin_expect

long __builtin_expect (long exp, long c)

__builtin_expect

provides the compiler with branch prediction information.

The return value is the value of exp, which should be an integral expression. The value of c must be a compile-time constant. The semantics of the built-in are that it is expected that exp == c

.

For example: if (__builtin_expect (exp, 0)) indicates that a call to foo() is not expected as exp should be 0.

__builtin_classify_type

__builtin_classify_type(object)

ignores the value of the object and considers

only its data type. It returns an enum describing what kind of type object is. See

Figure 24

.

8027948 Rev 15 107/166


Figure 24. __builtin_classify_type example

enum type_class __builtin_classify_type(object) enum type_class

{

no_type_class = -1,

void_type_class, integer_type_class, char_type_class,

enumeral_type_class, boolean_type_class,

pointer_type_class, reference_type_class, offset_type_class,

real_type_class, complex_type_class,

function_type_class, method_type_class,

record_type_class, union_type_class,

array_type_class, string_type_class, set_type_class,

file_type_class, lang_type_class

};

108/166 8027948 Rev 15

UM1237 GNU ASM

The stxp70cc compiler accepts “extended inline assembly” asm, as part of C programs.

This chapter only summarizes the main features of the asm implementation and describes its limitations. It is not a substitute for the GNU documentation.

6.1 Syntax

Note:

General syntax

asm(template : output operands : input operands : clobber list); or

__asm__(template : output operands : input operands : clobber list);

•

•

•

•

Where: template

is the assembler instruction, defined as a string constant output operands

is a list of comma separated output operands input operands

is a list of comma separated input operands clobber list

is a list of comma separated clobbered operands

The template section contains plain assembler, and uses ordinary STxP70 assembler syntax, with the notable exception of the %i (i is a positive integer) notation that refers to the ith output or input operand.

Multiple consecutive strings are automatically concatenated to enable a readable and correct template input. Multiple assembler instructions can be put together in a single asm template, separated by explicit newline characters ‘\n’.

If there are no output operands but there are input operands, two consecutive colons must be used in place of the output operands.

•

•

In the output and input list: each operand is described by an “operand constraint string” followed by a C expression in parentheses the available constraints are the following:

– r

general purpose register operand

– b

boolean register operand

– i

immediate integer operand, including symbolic constants only known at assembly time

– n

immediate integer operand, known at compile time

– g

guard register

– fpx_FX

FPx register (STxP70-4 only)

– the type attached to a scalar or SIMD audio extension (for instance, MP2x_VP or

MP2x_VX

)

8027948 Rev 15 109/166

GNU ASM UM1237

• an operand constraint can be prefixed by the following modifiers:

–

=

write-only operand, used for output operands

–

&

early clobber operand, does not prevent the use of =

– + operand is used for both input and output

•

•

•

•

In the clobber list: general registers are referred to by ri (where i has the range [0,31]), they map to the corresponding Ri hardware registers [0,31]

(c)

FPx extension registers are referred to by fi (where i has range [0,15]), they map to the corresponding Fi hardware registers guard registers are referred to by gi (where i has the range [0,7]), they map to the corresponding Gi hardware guard registers scalar or SIMD audio extension registers are referenced by a name determined by the extension and level

Syntax of scalar/SIMD audio extension register lists

The STxP70 core accepts scalar and SIMD audio extensions with multi-level register files.

The syntax has been extended to support such extension registers.

For non-SIMD registers (that is, registers with level “X” only), a register name is constructed using the following template:

<registerfile_name><register_id>

•

•

Where:

<registerfile_name>

is the name of the extension register file

<register_id>

is the number of the register

For example, when considering a register file T with a single level hierarchy, the registers are referenced as "T0", "T1", "T2" and so forth.

For SIMD register files, register names are constructed according to the following template:

<regfile_name><reg_id_max_level>_<regfile_min_level><reg_subid_min_

level>

•

•

•

•

Where:

<regfile_name>

is the name of the scalar or SIMD audio extension register file

<reg_id_max_level>

is the number of the register at the highest hierarchy level

(level “X”)

<regfile_min_level>

is a letter specifying the smallest level accessible for the register file:

– "X" for a single level register file

– "P" for a 2-level register file

– "Q" for a 4-level register file

<reg_subid_min_level>

is the offset of the register at the smallest hierarchy level.

110/166 c. If the configuration only includes 1 bank (16 registers), then the range is only [0,15]

8027948 Rev 15

UM1237

Note:

GNU ASM

For example, when considering the register file V of the MP2x extension, with a two level hierarchy, registers are referenced as "MP2x_V0_P0", "MP2x_V0_P1",

"MP2x_V1_P0", "MP2x_V1_P1"

and so forth.

Registers are always specified at the smallest hierarchy level. Therefore, to disable the full

V0 register, both subparts "V0_P0" and "V0_P1" must be specified in the clobber list.

Register file disambiguation

Due to the limited length of register file names, different register files may have similar names. To distinguish between the different register files, the register file name can be prefixed by an optional string, if necessary. The prefix has the following syntax:

%<registerfile_name><registerfile_smallest_level>%

•

•

Where:


is the name of the scalar or SIMD audio extension register file

<registerfile_smallest_level>

is a letter specifying the smallest level accessible for the register file:

– “X” for a single level register file

– “P” for a 2-level register file

– “Q” for a 4-level register file.

6.2 Assumptions

•

•

The following assumptions apply.

Output operand expressions must be lvalues.

The compiler assumes that the input is consumed before the outputs are produced, unless an output operand has the ‘&’ constraint modifier (also called “early clobber”).

The compiler does not assign the same register to an input operand and an early clobber operand. However, the compiler may assign the same register to an input operand and to a non-early clobber output operand.

6.3 Volatile

The volatile syntax is either: asm volatile (template : output operands : input operands : clobber list); or:

__asm__ volatile (template : output operands : input operands : clobber list);

The volatile keyword indicates that an instruction has side effects. A volatile statement is not deleted if it is reachable. The order of volatile asm statements and, or other volatile accesses is preserved. A consecutive sequence of volatile asm statements may not stay perfectly consecutive, since some other instructions may be scheduled in between. To achieve the effect of keeping instructions perfectly consecutive, use a single asm instruction.

An asm statement without any operand or clobbers will be treated identically to a volatile asm

statement, the same as for an asm statement without an output operand.

8027948 Rev 15 111/166

GNU ASM UM1237

6.4 Restrictions

•

•

•

•

The following restrictions apply.

The compiler does not parse the assembler instruction template; this means that it does not check if it is valid assembler input.

Up to 10 operands, results and clobbered registers are allowed.

Multiple alternative constraints are not supported.

At -O3 and -O4 optimization levels, the loop nest optimizer is disabled for loops containing asm statements.

6.5

6.6

Differences between the STxP70 core versions

•

•

•

The VLIW/VLIS STxP70-4 is designed to be assembly compatible with STxP70-3, except for a few instructions. This means that assembly statements written for STxP70-3 should work on STxP70-4. The main exceptions will be related to: the MAKE and MORE instructions, which should be replaced by a unique MAKE32 one on

STxP70-4 the SIMD comparisons, which are no longer supported on STXP70-4 the “;;” pattern to be used to separate bundles of instructions. For compatibility reasons, this pattern becomes mandatory on both STxP70-3 and STxP70-4. Code without “;;” is still accepted on STxP70-3, but this deprecated syntax is strongly discouraged.

GNU ASM optimization

The compiler unrolls loops containing GNU asm statements. The compiler is not aware of the resource requirements introduced by the opaque asm statement, therefore the unrolling decision may be less precise compared with other situations.

It is possible to prevent the compiler from unrolling by using either an option or a #pragma.

If the asm statement contains any control-flow, it must be contained completely within the asm

statement.

See


for information on #pragma unroll.

112/166 8027948 Rev 15

UM1237 GNU ASM

6.7 Example

The code example in

Figure 25

illustrates a typical use of asm statement on STxP70 core.

Figure 25. Example of an asm statement

unsigned int foo(unsigned int * ptr)

{

unsigned int res;

unsigned int count;

unsigned int val;

asm (

" setls L1, L_2 ;;\n\t"

" setlc L1, 8 ;;\n\t"

" setle L1, L_3+-4 ;;\n\t"

" make %0, 0 ;;\n\t"

" make %1, 0 ;;\n\t"

"L_2:\n\t"

" lw %2, @(%3 !+ 4) ;;\n\t"

" cmpneu g0, %2, 0 ;;\n\t"

"g0? bset %0, %0, %1 ;;\n\t"

" add %1, %1, 1 ;;\n\t"

: "=&r" (res), "=&r" (count), "=&r" (val), "+r" (ptr)

:

); return res;

}

The example in

Figure 25

delivers the assembly code given in

Figure 26

.

Figure 26. Example output of an asm statement

.entry

.global foo foo:

L_BB1_foo:

or R4, R0, 0 ;;

setls L1, L_2 ;;

setlc L1, 8 ;;

setle L1, L_3+-4 ;;

make R1, 0 ;;

make R2, 0 ;;

L_2:

lw R3, @(R4 !+ 4) ;;

cmpneu g0, R3, 0 ;;

g0? bset R1, R1, R2 ;;

add R2, R2, 1 ;;

L_3:

L_BB2_foo:

or R0, R1, 0 ;; rts

8027948 Rev 15 113/166

GNU ASM

6.8

UM1237

Parsing and optimization of GNU assembly statement

The STxP70 compiler is capable of parsing, analyzing and optimizing the content of the

GNU assembly statements. The main optimizations it can achieve are those carried out at the lowest level of the compiler, for example scheduling, removal of useless instructions, constant propagation.

By default, the compiler does not perform any parsing and optimization of user defined assembly statements. This parsing and optimization feature can be enabled with the option

-mparse-asmstmts

.

Some GNU assembly statements are used internally by the compiler to map extension instructions from C code. By default, those specific internal assembly statements are parsed and optimized by the compiler. This parsing and optimization feature can be disabled with the option -mparse-meta-asmstmts.

114/166 8027948 Rev 15

UM1237 Built-in functions

7.1

The stxp70cc compiler recognizes a number of built-ins. These are used to generate assembly language statements that cannot otherwise be expressed through standard ANSI

C/C++.

The built-ins are specified and called just like standard ANSI C/C++ functions and procedures, using standard types. However, they are treated in a special way by the compiler. The built-ins apply to the STxP70 core instructions, X3 instructions, floating point

FPx extension instructions, as well as scalar and SIMD audio extension (MPx) instructions.

On the core, FPx and MPx extension, built-ins may be needed to make use of instructions that the compiler cannot capture automatically, or to work around a missing optimization.

For technical reasons the set of core/X3 built-ins does not currently cover the full set of instructions. For instance, the load/store instructions are not available as built-ins. This also includes specific load/store instructions such as the lsetub instruction. Instructions that do not exist as built-ins can still be mapped by using the GNU assembly statements, see

Chapter 6: GNU ASM on page 109

.

Header files and C-models files

•

•

•

Several header and source files are provided to use built-ins for the core and for the X3, FPx and MPx extensions.

A header file named builtins_<extension>.h contains the definitions of the built-

ins themselves, as described in

Section 7.2: Naming built-ins on page 116

.

A header and a source file named builtins_model_<extension>.h and builtins_model_<extension>.c

respectively. These files contain the declaration and the definition of the STxP70 built-ins, modelled as C functions, and acting as executable specifications. This has the benefit that models can be used to develop specialized algorithms (DSP, video, and so on) on a workstation, and these can be immediately and safely ported to the STxP70 core and extensions.

Finally, a generic header file named <extension>.h facilitates the use of built-ins or

C-models, as explained in

Section 7.3: Using built-ins from C on page 120

. It includes the two headers mentioned above, plus the definition of some macros providing a unified view of built-ins and C-models. Only the generic header file for a given extension needs to be included in the application source code (see


).

•

•

•

•

The <extension> suffix is one of: sx

for STxP70 core x3

for STxP70 X3 extension fpx

for FPx floating point and integer arithmetics extension the alias of the audio scalar or SIMD extension, for instance MP2x

The header and source files mentioned above are delivered with the current compiler distribution (except for the audio scalar and SIMD extensions).

8027948 Rev 15 115/166

Built-in functions UM1237

Note:

•

•

•

The STxP70 built-ins make use of a flexible common naming scheme. The names of intrinsic built-ins and the corresponding C-models are complementary, and either are invoked (depending upon context) by using a dedicated simplified macro.

The basic built-ins defined in file builtins_<extension>.h all have names in the form:

__builtin_<extension>_<mnemonic>[_<operand_type>]

.

Similarly, the names of the C-models found in the files builtins_model_<extension>.[c|h]

are:

__cmodel_<extension>_<mnemonic>[_<operand_type>]

.

Finally, the generic macro defined in file <extension>.h gives a unified view of builtins and C-models. Its simplified name is built as:

<extension>_<mnemonic>[_<operand_type>]

.

•

•

•

•

<extension>

is the alias of the core or the extension and is one of the following: sx

for core x3

for X3 extension fpx

for floating point extension the alias of a SIMD extension, for instance MP1x

<mnemonic>

is the actual mnemonic of the instruction as it appears in the instruction set of either the core or the extension.

•

•

•

•

<operand_type>

is optional. It appears only in builtin-ins for the core, and for X3 and FPx extensions. It is necessary when the given instruction may accept different types of operands; for instance, either a register or a literal. In such cases, this part of the name denotes the type of the operand, and may be one of the following: this element is absent if the instruction exists with only one type of operand r

denotes an operand in a general purpose register iN

denotes a literal operand of size N bits g

denotes the instruction is guarded (used in X3 built-in names)

The operand types may appear in the name of the built-in in an order that differs from the order of the corresponding operands in the assembly instruction. For instance, writing the following built-in:

x3_cancelg_i8_i2_g(0x1, 0x5);

leads to the emission of the following assembly code:

cancelg b1, 5 ;;

The header files are located in the directory

<toolsdir>/stxp70cc/4.1/include/models

. This directory is pointed to by default when the code is compiled using stxp70cc. The <toolsdir> denotes the root folder of the toolset.

The C-models source files are located in the directory

<toolsdir>/stxp70cc/4.1/src/models

.

116/166 8027948 Rev 15

UM1237

Note:

7.2.2

Note:

Built-in functions

Example:

•

•

The core instruction addbp exists with a second operand that is either a register or a literal.

The corresponding built-ins are named as follows: int __builtin_sx_addbp_r(int, unsigned short)

for register operand int __builtin_sx_addbp_i8(int, unsigned short)

for u8 operand

•

•

The C-models have similar names: int __cmodel_sx_addbp_r(int, unsigned short)

for register operand int __cmodel_sx_addbp_i8(int, unsigned short)

for u8 operand

•

•

Finally, the unified macros for these built-ins and C-models are: sx_addbp_r

when used for a register operand sx_addbp_i8

when used for an u8 operand

The presence of the two leading underscores on each name denotes (according to the

ISO/IEC 9899 C Standard) that no such name should be defined by the user. More specifically:

“All identifiers that begin with an underscore and either an upper case letter or another underscore are always reserved for any use.”

Types and special built-ins for audio scalar/SIMD extensions

The built-ins for audio scalar or SIMD extensions may require data types that cannot be mapped to C native types. Vector operations may also be present on those extensions. This means that the naming scheme is slightly different from the scheme used for the core or on the other extensions.

The naming convention for data type names reflects this scheme. The naming convention uses an alias for the MPx that is dedicated to audio applications, which is currently either a scalar (MP1x) or an SIMD (MP2x) extension.

The instructions for these extensions are not currently mapped automatically by the compiler. They can only be invoked by using built-ins.

Data types

•

•

•

Scalar and SIMD audio extensions include two register banks at most. Each bank may have up to three consecutive “levels”, numbered from 0 to 2: level 0 corresponds to the full width of the register bank level 1 corresponds to the two halves of the register level 2 corresponds to the four quarters of the register

Furthermore, the register width is 2 n

bits, ranging from 8 bits to 512 bits inclusive.

The names of the data types that can be allocated to such banks take this structure into account. They are built using the following template:

<extension>_<registerfile_name><register_level>

8027948 Rev 15 117/166

Built-in functions UM1237

•

•

•

Where:

<extension>

is the alias of the SIMD extension


is the name of the SIMD extension register file

<register_level>

is a letter denoting the type that can be allocated to this level:

–

X

stands for the full register width at level 0

–

P

stands for the sub-parts at level 1 (two halves)

–

Q

stands for the sub-parts at level 2 (four quarters). It is not instantiated on the current MPx

Aligned data types

Since the data types of those extensions are likely to be larger than the default alignment of the stack (64 bits), some variants are also provided which impose a consistent alignment.

Those aligned types have the special suffix _aligned tailed to their names.

Example:

•

•

•

•

The MP2x extension contains a register bank called V with data accesses of 128 bits or

64 bits that supports two vector data types:

MP2x_VX

is a 128-bit data type

MP2x_VP

is a 64-bit data type

MP2x_VQ

is not instantiated

MP2x_VX_aligned

is a 128-bit data type, aligned to a 128-bit boundary

Special macros

The MP1x and MP2x extensions are all provided with a set of dedicated memory access and register move instructions. The latter can be invoked using dedicated macros that allow easy accesses to the register bank of the extension.

Example:

•

•

•

In the lines below, __part__ denotes the subpart of the wider register that can be represented by either a literal or a variable. _word_i_ denotes a 32-bit word to be assigned to the subpart i of the corresponding register.

Make macro builds a constant in extension register:

–

MP2x_make_VX(_VX_, _word_3_, _word_2_, _word_1_, _word_0_);

–

MP2x_make_VP(_VP_, _word_1_, _word_0_);

–

MP2x_make_VQ -> not instantiated

Compose macro composes register subparts into a wider one:

–

MP2x_compose_2xVP(_VX_, _VP_1_, _VP_0_);

–

MP2x_compose_4xVQ -> not instantiated

–

MP2x_compose_2xVQ -> not instantiated

Split macro decomposes a register subpart into narrower ones:

–

MP2x_split_2xVP(_VX_, _VP_1_, _VP_0_);

–

MP2x_split_4xVQ -> not instantiated

–

MP2x_split_2xVQ -> not instantiated

118/166 8027948 Rev 15

UM1237 Built-in functions

•

•

Insert macro inserts a register subpart into a wider one:

–

MP2x_insert_VP_into_VX(_VP_, _VX_, _part_);

–

MP2x_insert_VQ_into_VX -> not instantiated

–

MP2x_insert_VQ_into_VP -> not instantiated

Extract macro extracts a register subpart into a wider one:

–

MP2x_extract_VP_from_VX(_VP_, _VX_, _part_);

–

MP2x_extract_VQ_from_VX -> not instantiated

–

MP2x_extract_VQ_from_VP -> not instantiated

Specialized macros

Specialized versions of the insertion and extraction macros are provided to handle cases where the subpart of the wider register can be hard coded in the built-in name itself.

•

•

In the lines below, the macros do not accept an explicit __part__ parameter. The syntax of the name implicitly corresponds to a given subpart (for instance

MP2x_insert_VP_into_VX0

takes the complete 64-bit register _VP_ and inserts it in the lowest half of the 128-bit register _VX_).

Insert macro inserts a register subpart into a wider one:

–

MP2x_insert_VP_into_VX0(_VP_, _VX_);

–

MP2x_insert_VP_into_VX1(_VP_, _VX_);

–

MP2x_insert_VQ_into_VX0-> not instantiated

–


–


–


–

MP2x_insert_VQ_into_VP0-> not instantiated

–

MP2x_insert_VQ_into_VP1-> not instantiated

Extract macro extracts a register subpart from a wider one:

–

MP2x_extract_VP_from_VX0(_VP_, _VX_);

–

MP2x_extract_VP_from_VX1(_VP_, _VX_);

–

MP2x_extract_VQ_from_VX0-> not instantiated

–


–


–


–

MP2x_extract_VQ_from_VP0-> not instantiated

–

MP2x_extract_VQ_from_VP1-> not instantiated

8027948 Rev 15 119/166


7.3 Using built-ins from C

UM1237

This section explains the usage of the include files that are particular to built-ins and Cmodels.

All STxP70 built-ins prototypes are available in the include files presented in

Section 7.2:

Naming built-ins on page 116

.

To make use of the built-ins of the core, X3 extension, FPx extension or SIMD extensions in an application, the relevant header files (as listed below) must be included in the application sources.

#include <sx.h> // for the core,

#include <x3.h> // for the X3 extension,

#include <fpx.h> // for the FPx arithmetic extension,

#include <MP2x.h> // for the MP2x SIMD audio extension.

By default, the stxp70cc compiler generates machine instructions corresponding to the built-in functions found in the source code.

Example:

#include <sx.h>

... int fct(int a, int b)

{ int c; c=sx_lzc(a); // leading zero count return c;

}

The above code produces the following assembly code, where the lzc instruction of the core has been properly mapped.

.global fct fct: // 0x0

L_BB1_fct: // 0x0 lzc R0, R0 rts

In this case, it is equivalent to write the source code as:

#include <builtins_sx.h>


{ int c; c=__builtin_sx_lzc(a); return c;

}

This is because the macro sx_lzc is just mapped on the full built-in __builtin_sx_lzc by default as soon as the code is compiled for an STxP70 target.

120/166 8027948 Rev 15

UM1237

7.3.2

7.3.3


Standard use of built-in C-models

•

•

By default, the C-model files are designed to permit the use of the C-model on any host machine except the STxP70. There is no need to modify the source code. However, it is necessary to: add the path of the inc directory of the compiler in the toolset installation to the list of include paths add the file containing the source of the C-models to the list of source files to be compiled

Example:

Assuming that the toolset is installed in a directory named /home/myfolder, and a small file containing calls to core built-ins is to be compiled with C-models, using a GCC compiler, then the command line should contain the -I directive and the following source file.

gcc -I<tools_dir>stxp70cc/4.1/include/models \

<tools_dir>/stxp70cc/4.1/src/models/builtins_model_sx.c ...

Use of built-in C-models on STxP70 target

In a few cases, it may be necessary to compile application code using C-models, rather than actual machine instructions, even on the STxP70 target. This may be useful, for example, for testing or debugging purposes.

This can be done either by calling the C-model explicitly, or by using the macro instead

(thereby avoiding having to make any change to the source code). In the example given in

Section 7.3.3

, the following lines should appear:

#ifdef __SX__ // code is compiled for a STxP70 target

#undef __SX__ // hide the target and use the non STxP70 settings

#include <sx.h>

#define __SX__ // return to the regular settings for STxP70


{ int c; c=sx_lzc(a); // leading zero count C model is used return c;

}

8027948 Rev 15 121/166

MPx native support

8 MPx native support

8.1

Note:

UM1237

Goal of the MPx scalar support

•

•

•

The goal of the MPx native support is to generate MPx code automatically from standard C code. The compiler: detects variables that can beneficially be allocated to the MPx register file inserts required type conversions in the internal representation (also called “alien type conversion”) detects some patterns of instructions that can beneficially be replaced by MPx integer or fractional instructions.

Legacy source code that already contains variables explicitly allocated to MPx register file and calls to MPx built-ins are not affected by these changes. It is compiled as before and the generated assembly remains the same.

The SIMD variants of the MPx benefit from the same level of (scalar) support as the scalar variant. This means that the SIMD aspects of those variants are not dealt with by the compiler.

•

•

These new features allow the porting of applications to the MPx with less effort than previously, because: the extension type is no longer required, except in specific cases the use of intrinsics is more limited

In addition to pure audio applications, long long arithmetic also benefits from this support.

This chapter describes the scope of the MPx support, and explains how it can be used.

Examples are provided to help with comprehension.

8.2 Control of the MPx native support

•

•

•

By default, native support of the MPx is enabled in the compiler when: the code is compiled for the MPx when the option -Mextension=MP1x is set optimization level is equal to either -O2, -O3, -O4 or -Os the mapping of fractional instructions is enabled using the option

-Mextoption=MP1x:enablefractgen

(formerly called -Menablefractgen or

-Mfractsupport

, see

Section 8.3.4: Pattern recognition for integer and fractional data types on page 125

).

It is possible to disable this native support by using the option: -Mnoextgen

122/166 8027948 Rev 15

UM1237 MPx native support

8.3

Note:

8.3.1

Pragmas are provided to provide fine-grain control of MPx support. They allow the developer to enable or disable MPx support in a given set of functions, declared as arguments to the pragmas, overriding the option passed to the compiler.

•

•

The syntax is as follows:

#pragma disable_extgen (foo1, foo2)

disables MPx scalar support in functions foo1 and foo2 in the file where it is placed, even if option

-Mextension=MP1x

is set and optimization level is higher than -O1.

#pragma force_extgen (foo1, foo2)

forces MPx scalar support in functions foo1

and foo2 in the file where it is placed, even if option -Mnoextgen is set.

Those file scope pragmas must be placed at the beginning of a file. They affect all variants of the MPx (that is, both the scalar and SIMD variants).

•

•

A more focused version is also provided:

#pragma disable_specific_extgen (extname, foo1, foo2)

disables scalar support on specified extension in functions foo1 and foo2 in the file where it is placed, even if option -Mextension=MP1x is set and optimization level is higher than

-O1

#pragma force_specific_extgen (extname, foo1, foo2)

forces scalar support on specified extensions in functions foo1 and foo2 in the file where it is placed, even if option -Mnoextgen is set

Those file scope pragmas must be placed at the beginning of a file. They affect all variants of the MPx (that is, both the scalar and SIMD variants).

Scope of the MPx native support

•

•

•

This section presents an overview of the features available in MPx native support. It consists of three main levels: built-in based support (already present in toolset 3.2.0) support of type equivalence between long long integer and MPx data types (new in toolset 3.3.0) automatic MPx code generation on MPx instructions and long long integer arithmetic (new in toolset 3.3.0)

Besides the overview presented in this section, the latter two levels are documented in detail in sections

Chapter 8.4: Type equivalence

and

Chapter 8.5: Automatic code generation

.

The native support now includes a limited pattern recognition facility, which can detect more complex patterns like mac for both integer and fractional data types.

Built-in based support with MPx_Vx type

This feature has been available since toolset release 3.2.0. With this level of support, the developer explicitly uses MPx built-ins and MPx types to write an application for the MPx, as in the following example C code:

MPx_Vx a, b, c;

MPx_ADDD(c, a, b);

8027948 Rev 15 123/166


8.3.2

UM1237

This code places three 64-bit variables, a, b and c, in the MPx_Vx register set. It uses the

MPx addition instruction to add a and b, storing the result in c. Since it uses built-ins and specific data types, this code is neither generic nor portable to another processor.

Support of type equivalence between long long and MPx_Vx

The MPx_Vx type matches the MPx registers, and is therefore semantically equivalent to the long long native type of the C language. In order to limit the work needed to port applications to the MPx, the compiler handles the semantic equivalence between MPx_Vx and long long. This means that the user can declare variables as long long type instead of MPx_Vx. The compiler is responsible for placing them in the MPx registers, if there is a benefit to be gained.

With this support, the C code in the example above can be simplified as follows: long long a, b, c;

MPx_ADDD(c, a, b);

This C code is more portable, as it does not involve any specific type. Only the intrinsic

(MPx_ADDD) is still specific. The code generated by the compiler is the same as the code generated with MPx types.

Warning: The heuristics currently used to place variables into MPx registers are based on a quite systematic behavior: as soon as a variable appears as a MPx_Vx parameter in a MPx built-in, then it is placed in a MPx register. The explicit use of

MPx_Vx

type in new code should be avoided and the long long

data type used instead. More details can be found in

Section 8.6: Important remarks and known limitations on page 129

.

8.3.3 Automatic MPx code generation on long long arithmetic

The MPx instruction set includes long long integer arithmetic instructions (add, sub, shift, and so forth). In previous versions of the toolset, it was necessary to use built-in functions to map those instructions. In order to limit the effort when porting applications to the MPx, the current version of the compiler automatically maps these operations to MPx instructions.

The above example (

Section 8.3.2

) can now be written in standard C: long long a, b, c; c = a + b;

•

•

The compiler now ensures both: the placement of the variables a, b and c in the MPx registers the mapping of the MPx_ADDD instructions

•

•

In addition to pure arithmetic operations, the MPx also provides instructions that: clear the contents of a MPx register copy the contents of one MPx register into another

124/166 8027948 Rev 15

UM1237

8.3.4


The compiler also maps the following instructions when dealing with either an assignment to zero or a copy operation: long long a = 0; // mapped to a MPx register clear instruction long long b = c; // mapped to a MPx register copy instruction

Pattern recognition for integer and fractional data types

The compiler provides pattern recognition capabilities to detect a set of complex patterns and map them to their equivalent MPx instructions. These capabilities address both integer and fractional instructions.

The list of recognized instructions is provided in

Table 28

.

mahll mshll shlrr2x shrr2x andcd mph mpw maw msw

Table 28.

Pattern recognition

Mnemonic mafw msfw mpfw

Equivalent source code Comment

ll1+((long long)i1*i2)<<1 ll1-((long long)i1*i2)<<1

((long long)i1*i2)<<1

Requires

-Mextoption=MP1x:enable fractgen

Requires


Requires


(long long)((int)ll1+(int)ss1*ss2)

32b MAC with 16b multiplicands

(long long)((int)ll1-(int)ss1*ss2)

32b MAC with 16b multiplicands

(long long)i1<<i2

(int)(ll1<<i2)

(ll1 & (!ll2)

-

-

-

(long long)ss1*ss2 i1*i2 ll1+(long long)i1*i2 ll1-(long long)i1*i2

-

-

-

32b multiplier when no X3/FPx

Note: The three first rows correspond to fractional instructions, which are subject to specific limitations (

Section 8.6.6: Limitations regarding mapping of fractional instructions on page 131

). Their mapping is therefore only performed if the dedicated flag

-Mextoption=MP1x:enablefractgen

is set.

8027948 Rev 15 125/166

MPx native support UM1237

The example code listed here summarizes the equivalences that are accepted or rejected by the compiler front-end when MPx support is enabled.

Figure 27. Summary of type equivalence with MPx support

// declaration of variables

MPx_VX gvx; // forced to MPx long long gll; // candidate to placement in MPx registers int gi; // to be placed in GPR

// Initialisation of global variables

MPx_VX gvx_2 = 1234LL; // Accepted

MPx_Vx gvx_3 = (long long) 11.3f: // Accepted

MPx_Vx gvx_array[4] = {1, 10, -1, -10} foo(long long In) {

...

// Assignments of local variable using function parameters

MPx_Vx A = In;

// Assignment of local variable using a constant*

MPx_Vx B = 12LL;

// Constant assignment of global variables gvx = 0LL; // Accepted gvx = 1234LL; gvx = 0x12LL;

// Accepted

// Accepted gvx = 0; gvx = 1234; gvx = 0x12;

// Accepted

// Accepted

// Accepted

// Variable assignment of global variables gvx = gll; // Accepted gll = gvx; // Accepted gvx = (unsigned long long)gi;// Accepted gvx = (long long)gi; gi = (int)gvx; gi = (unsigned int)gvx;

// Accepted

// Accepted

// Accepted

// Unary/binary operator (not planned to be supported, use long long var instead) gvx = gvx + gvx; // Not supported (error msg from front-end)

// Usage of long long variable in builtin calls

MPx_ADDD(gll, gll, gll); // Accepted

// Usage of long long variable in builtin calls (in/out param)

MPx_MAFW(gvx, 1, 2); // Accepted

// Usage of long long constant in builtin calls

MPx_ADDD(gll, 1234LL, 123LL); // Accepted

126/166

The result of instructions and built-ins in their functional form is always considered unsigned by convention. Though, the actual type might be signed, and not explcitly visible to the compiler. This must be taken into account expecially when writting comparisons.

For example, the following code is incorrect: if (MP1x_SUBS_f(a, b) < 0) {

8027948 Rev 15


Because the MP1x_SUBS_f() result is unsigned, the comparison is considered by the compiler as always false and the corresponding block is therefore deadcoded.

The main recommendation for built-ins usage is to avoid the functional form and use only the procedural version in which the type of the result is given explicitely by the developer, for example: int res = MP1x_SUBS_f(a, b) if (res < 0) {

Alternatively, it is also possible to explicitely cast the builtin result to the proper type: if ((int)MP1x_SUBS_f(a, b) < 0) {

However, the first method described using the procedural version is the preferred method.

8.5

8.5.1

Automatic code generation

Scope and principle

•

•

Some of the instructions available on the MPx map operations from C code. This limits the need for intrinsics, and contributes to performance enhancements. Two cases are possible.

The operation derived from the C code matches one of the instructions of the MPx. For instance, this is the case with 64-bit addition, which can be mapped on the MPx ADDD instruction.

The operation derived from the C code fits a sequence of instructions which may belong to either the core or the MPx instruction set. For instance, a 64-bit “min” operation does not exist on the MPx, but it can be emulated using a sequence of instructions involving both core and MPx instructions (MPx and core comparisons).

These sequences are called “meta-instructions”.

The second case is especially useful, because it makes more extensive use of the MPx instructions with lower effort at developer level. In addition to the pure audio applications for which it is designed, MPx support can also bring significant gains in applications that handle long long

arithmetic.

8027948 Rev 15 127/166

MPx native support UM1237

8.5.3

•

•

•

•

•

•

•

•

•

•

•

•

In the current release of the compiler, the following C operations are directly mapped to individual MPx instructions:

64-bit signed and unsigned addition mapped to ADDD

64-bit signed and unsigned subtraction mapped to SUBD

64-bit left shift signed and unsigned mapped to SHLRD

64-bit arithmetic right shift signed mapped to SHRRD

64-bit arithmetic right shift unsigned mapped to SHRURD

64-bit logical right shift signed and unsigned mapped to SHRURD

64-bit negate signed and unsigned mapped to NEGD

64-bit bitwise NOT signed and unsigned mapped to NOTD

64-bit bitwise OR signed and unsigned mapped to ORD

64-bit bitwise AND signed and unsigned mapped to ANDD

64-bit bitwise exclusive OR (XOR) signed and unsigned mapped to XORD

64-bit bitwise negate OR (NOR) signed and unsigned mapped to NORD

Operations mapped to meta-instructions

•

•

•

•

•

•

•

The following operations of the C language are mapped to or emulated by meta-instructions: the ten 64-bit signed and unsigned comparisons (equal to, not equal to, greater than, less than, greater than or equal, less than or equal) the 64-bit signed and unsigned min the 64-bit signed and unsigned max the 64-bit absolute value the 64-bit signed and unsigned multiplication (takes two 64-bit operands and returns a

64-bit result) the 32-bit signed and unsigned multiplication (takes two 32-bit operands and returns a

32-bit result)

(d) the 32-bit to 64-bit conversions

The number of actual instructions present in each meta-instruction depends on the complexity of the computation: for instance, comparisons are implemented in two instructions at most, whereas the 64-bit multiplication takes about 25 instructions.

128/166 d. This mapping allows 32-bit multiplications to be mapped to the MPx multiplier in case the X3 or FPx 32-bit multiplier is not present in the configuration. Note however that in this case the resulting code is less efficient than with the 32-bit multiplier, since it requires one more instruction to extract the lower 32-bit part of the result.

8027948 Rev 15

UM1237

8.6 Important remarks and known limitations


Note:

8.6.2

8.6.3

•

•

As already mentioned in the warning in

Section 8.3.2

, the MPx_Vx type should be avoided

when writing new code. The following combinations are especially discouraged: simultaneous use of long long and MPx_Vx types in the same function

C long long arithmetic applied to variables declared as MPx_Vx

For instance, the compiler considers the following code illegal:

MPx_Vx a; long long b, c; c = a + b;

These restrictions do not affect legacy code, as this is only based on a combination of MPx types and built-ins.

Long long passed as function parameters

The ABI of the STxP70 core specifies that function arguments are passed in the core registers. This applies to long long variables as well, and this must be taken into account when making the choice to declare a variable as either MPx or long long type.

Consider the following code: extern int bar ( long long ); int foo ( long long a ) { return ( bar ( a ) );

}

In this example, it makes no sense to store the long long variables in MPx_Vx registers, as the core registers are used for the function call in any case.

Long long life span crossing function call

The STxP70 ABI states that MPx registers are all considered to be scratch registers. This means that they do not retain their values across a function call.

Consider the following code: int foo() { long long a; a = 0L; bar(); a = a + b;

[...]

}

In this example, if a is promoted to MPx_Vx for its full life span, it may be spilled

(e)

by the register allocator, which is extremely costly. A developer must bear this in mind when writing e. “Spilled” means that the contents of the register are temporarily stored in memory and then restored when needed.

8027948 Rev 15 129/166


8.6.4

UM1237

MPx code. Note that the cost is neither assessed nor handled by the compiler, so it is the developer’s responsibility to use the most efficient placement.

Efficiency of code in meta-instructions

Currently, the compiler does not optimize the code in the meta-instructions. In those parts of code, the compiler performs register allocation, but it does not schedule the instructions, nor does it perform any advanced optimizations. Even if the code has been designed for efficiency, it is possible that sub-optimal patterns may exist in the final code if MPx native support is enabled.

This limitation might be overcome in future versions of the tools.

•

•

The current pattern recognition algorithm is limited, and only able to recognize the expressions if: the conversions are made explicit by casts, and correspond to the exact model of the instruction to be recognized it is located in a single C statement

Exact type conversions

For example, in the following code, the maw instruction is not recognized because of implicit type conversions: long long mac; int a, b; mac+= a*b; // multiplication result is 32bit

However, the maw instruction is recognized in the following code: long long mac; int a, b; mac+= (long long)a*b; // multiplication result is 64bit

Single statement expressions

A pattern is more likely to be recognized if it occurs within a single statement. For example, avoid code that resembles the following, as it may result in missed opportunities to map the

maw instruction: long long mac; int a, b; long long tmp; tmp= (long long)a*b; mac+= tmp;

On the other hand, the maw instruction is always recognized in the code below: long long mac; int a, b; mac+= (long long)a*b;

130/166 8027948 Rev 15


The automatic mapping of fractional instructions is disabled by default. It is enabled only if the flag -Mextoption=MP1x:enablefractgen

(f)

is set.

Take care when enabling the automatic mapping of fractional instructions. It may induce two changes to the behavior:

1.

The fractional instructions of the MPx are likely to modify the value of the saturation flag. Consequently it is not safe to enable these instructions if the code contains built-ins that use saturation. This change is clearly a non-conservative one.

2. The use of fractional instructions modifies the behavior of overflow. The wrap-around performed in the scope of integer arithmetic is changed into clamping. Notice that this change is still conservative, as it remains compliant with the C standard. Though, it introduces discrepancies between the core and the MPx with regard to the result of arithmetic overflow. For example, the multiplication of 0x7FFFFFFFFFFFFFFF with

0x7FFFFFFFFFFFFFFF

provides the following results:

– without mapping fractional instructions: 0x0000000000000001,

– with mapping of fractional instructions: 0x7FFFFFFC00000001.

Warning: The automatic recognition and mapping of fractional instructions should be enabled only if the following conditions are met:

- source code does not already contain built-ins that may read the saturation flag (otherwise, the semantics may not be preserved)

- clamping is acceptable for handling arithmetic overflow

The mapping of saturated arithmetic and the mapping of the cross register left shift instructions are not supported by the compiler.

f.

The name of this option has changed: it was formerly named -Menablefractgen or -Mfractsupport, which was not accurate enough. The former name is still recognized, but its use is strongly discouraged.

8027948 Rev 15 131/166


8.7 Examples

UM1237

Consider a simple function that performs the addition and shift of two long long input parameters, and returns the result as a long long integer: long long fct(long long a, long long b)

{

long long tmp;

tmp = a + b;

tmp = tmp << 2;

return tmp;

}

No MPx support

When MPx is not present and MPx support is not enabled (stxp70cc -O3 test.c), then the code generated relies solely on core instructions and runtimes:

.global fct fct:

L_BB1_fct:

make R4, 0 ;;

addcu R4, R4, R4 ;;

addcu R0, R0, R2 ;;

make R2, 2 ;;

addcu R1, R1, R3 ;;

.global __shll

.type __shll, @function

jr __shll ;;

MPx support

When MPx is present and MPx support is enabled (stxp70cc -O3 -Mextension=MP1x test.c

), then MPx instructions are mapped where needed:

.global fct fct:

L_BB1_fct:

XRF0RR2X V0, R1, R0 ;;

XRF0RR2X V1, R3, R2 ;;

ADDD V0, V0, V1 ;;

SHLID V0, V0, 2 ;;

XRF0CSX2R R0, V0, V0 ;;

XRF0CSX2R R1, V0, V0 ;;

rts ;;

Note: 1 The moves between the core and the MPx registers are introduced to deal with ABI constraints. Those instructions are necessary only because the addition is insulated in a function. They are not present in successive long long arithmetic operations, and do not represent any extra cost. (Consequently, they are shown here in italic.)

2 The MPx instructions are mapped automatically (ADDD, SHLID) to perform long long operations.

132/166 8027948 Rev 15


Consider a piece of code that involves long long operations that do not fit a single MPx instruction. The following example is a function to find the maximum value between two alternatives, a and b.

long long fct(long long a, long long b)

{

long long tmp;

if(a>b) tmp=a;

else tmp=b;

return(tmp);

}

No MPx support

When MPx is not present and MPx support is not enabled (stxp70cc -Os test.c), the code generated relies only on core instructions and runtimes:

.global fct fct:

L_BB1_fct:

cmpeq G0, R1, R3 ;;

cmpgtu G1, R0, R2 ;;

andg G0, G0, G1 ;;

cmpgt G1, R1, R3 ;;

org G0, G0, G1 ;;

G4? or R4, R2, 0 ;;

G0? or R4, R0, 0 ;;

G0? or R3, R1, 0 ;;

or R1, R3, 0 ;;

or R0, R4, 0 ;;

rts ;;

The core of the computation are those instructions that are not in italic. The sequence contains three comparisons and two boolean operations (GMI).

8027948 Rev 15 133/166


8.7.3

UM1237

MPx support

When MPx is present and MPx support is enabled (stxp70cc -Os -Mextension=MP1x test.c

), only two comparisons are needed. (The instructions in italic are not taken into account, as they are mainly needed because of the encapsulation of the code in a function.)

.global fct fct:

L_BB1_fct:

XRF0RR2X V3, R1, R0 ;;

XRF0RR2X V2, R3, R2 ;;

cmpgtx2r R0, V3, V2 ;;

cmpne G0, R0, 0 ;;

L__0_4:

G4? XRF0CSX2R R0, V0, V2 ;;

G0? XRF0CSX2R R2, V1, V3 ;;

G4? XRF0CSX2R R1, V0, V0 ;;

G0? or R0, R2, 0 ;;

G0? XRF0CSX2R R2, V1, V1 ;;

G0? or R1, R2, 0 ;;

rts ;;

Case of the 32-bit multiplication

Consider the function below, which performs the multiplication of two 32-bit integers and returns the result as a 32-bit integer: int fct(int a, int b)

{ return (a*b);

}

The resulting assembly code depends on compiler options and core configuration.

No X3 multiplier, no MPx support

If code is compiled without the X3 32-bit multiplier and without the MPx native support

(stxp70cc -O3 -Mconfig=mult:no test.c), then a runtime is called:

.global fct fct:

L_BB1_fct:

.global __mulw

.type __mulw, @function

jr __mulw ;;

134/166 8027948 Rev 15

UM1237

Note:


X3 multiplier, no MPx support

If code is compiled with the X3 32-bit multiplier, and without the MP1x support (stxp70cc

-O3 -Mconfig=mult:yes test.c

), then the 32-bit multiplication available in X3 is mapped:

.global fct fct:

L_BB1_fct:

mp R0, R0, R1 ;;

rts ;;

No X3 multiplier, MPx support

If code is compiled without the X3 32-bit multiplier, but with the MPx support enabled

(stxp70cc -O3 -Mextension=MP1x test.c), then the MPx 64-bit multiplier emulates a 32-bit multiplication. This requires one more instruction to extract the proper 32-bit result:

.global fct fct:

L_BB1_fct:

mpw V2, R0, R1 ;;

xrf0csx2r R5, V2, V2 ;;

L__0_2:

or R0, R5, 0 ;;

rts ;;

If both the X3 32-bit multiplier and the 64-bit MPx multiplier can be used to map a 32-bit multiplication, then the X3 multiplier is preferred.

8027948 Rev 15 135/166

Relocatable loader library

9 Relocatable loader library

UM1237

This chapter describes how dynamic loading is implemented using the relocatable loader library RL_LIB for the STxP70.

Table 29

and

Table 30

list a number of acronyms and definitions used within this chapter.

Table 29.

Acronyms

Acronym

DLL

DSO

GOT

GP

PC

PIC

PID

Term

Dynamic link library

Dynamic shared object

Global offset table

Global pointer – alias of R13 register in STxP70 ABI

Program counter register

Position independent code

Position independent data

Table 30.

Definitions

Term

Preemption

Relocation

Definition

Sometimes you may need to use some of the functions or data items from a shareable object, but may wish to replace others with your own definitions. For example, you may want to use the standard C runtime library shareable object, libc.so, but to use your own definitions for the heap management routines malloc() and free(). In this case it is important that calls to malloc() and free() within libc.so call your definition of the routines and not the definitions present in libc.so. Your definition should override, or preempt, the definition within the shareable object. This feature of shareable objects is called symbol preemption.

Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. In other words, relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image.

136/166 8027948 Rev 15

UM1237 Relocatable loader library

This section provides an introduction to the concepts used for dynamic linking.

•

•

All code within a dynamic link library (DLL) should be position independent (PIC). This allows the text segment of the DLL to remain pure so that it can be shared among many processes. Position-independence imposes two requirements on generated code:

Code that forms an absolute address referring to any address in the DLL’s text or data segments is not allowed, because the code would have to be relocated at load time, making it non-sharable. All branches must be PC-relative, instruction and references to the data segment and to constants and literals in the text segment must be relative to a base pointer (typically GP).

Code that references symbols that are or may be imported from other loaded modules must use indirect addressing through a global offset table (GOT). The linker is expected to resolve procedure calls by creating import stubs, and the compilers must generate indirect loads and stores for data items that may be dynamically bound. In both cases, the indirection is made through the global offset table, allocated by the linker, and initialized by the dynamic loader. The global offset table is described in

Procedure calls and long branch stubs

through to

Materializing function pointers on page 138

Procedure calls and long branch stubs

•

•

•

Normal procedure calls can be prepared with the call instructions, which use PC-relative addressing. There are three possible cases at link time:

If the target is not within the same module, or if it is subject to preemption by an earlier definition from another loaded module, the linker must allocate an import stub and resolve the relocation of the call instruction to the stub.

If the target is known to be within the same module and the displacement is small enough, the call instruction can be statically resolved to the call target.

If the target is within the same load module, but the displacement is too far for the call instruction, the linker must allocate a long branch stub. The long branch stub itself must satisfy the PIC requirements. If the target is within range of the stub, the stub may use a PC-relative goto instruction; otherwise, it must load the address of the target from the global offset table.

Access to the data segment

The DLL’s data segment must be accessed through the GP value that must be set by a DLL procedure before any use. The GP value is used to access both global offset tables and statically allocated data.

•

There are several cases:

Global variables that are imported from another load module, or that are subject to preemption by an earlier definition in another load module, must be accessed indirectly through the global offset table. The compiler must generate code to load a pointer from the global offset table, using GP-relative addressing mode, and then access the data item using that pointer. The compiler does not have to allocate the global offset table; there are relocations defined in the object file format that instruct the linker to allocate a global offset table slot and to supply the GP-relative address of that slot.

8027948 Rev 15 137/166

Relocatable loader library UM1237

•

Statically allocated variables of local scope, or global variables whose definitions are not subject to pre-emption, may be accessed directly with GP-relative addressing mode.

Access to constants and literals in the text segment

Constants and literals allocated in the text segment may be accessed with GP-relative addressing, or with indirect addressing through the global offset table.

Materializing function pointers

Function pointers may be materialized by indirect addressing through the global offset table.

Pointers to functions that are not subject to preemption may be materialized using

GP-relative addressing. Function pointers may not be materialized from immediate operands.

When the linker determines that a procedure call refers to an entry point in a different load module, it resolves the reference locally by building an import stub with the same name as the intended target. The import stub contains code that points to an entry point inside the

global offset table, and transfers control, as described in Section Calling sequence.

Control is then transferred if the compiler gets enough information to know that a particular entry point is in a different load module, it may generate a calling sequence that obviates the need for the linker to build an import stub. However, this calling sequence is ABI specific, and is not specified in this document.

The dynamic loader is a component of the operating system software that locates all load modules belonging to an application, loads them into memory, and binds the symbolic references among them. Most of the operations of the dynamic loader is specific to the particular operating system environment, and is further described in the ABIs for those environments. The common run-time architecture has been designed to minimize the amount of work involved in the binding process, by concentrating most of the relocation required in the global offset tables, and by prohibiting any items in the text segment that may require dynamic relocation.

9.1.4 Rationale

Code in main programs may be absolute or position independent. If an absolute program imports data from a DLL, the linker is forced to allocate the data in the main program’s data segment statically (this is commonly called the “.dynbss hack”). When data imported from

DLLs is allocated in the main program’s data segment, the program may be subject to future compatibility problems when the DLL is replaced with a newer version. This issue may be avoided by requiring main programs to be position independent, at the cost of some efficiency in the main program. This compatibility/performance trade-off is not made in the common run-time architecture; it is left to the specific ABI.

138/166 8027948 Rev 15


Direct and indirect procedure calls are described in the following sections.

Direct procedure calls follow the sequence of steps shown in

Figure 28

. The following

paragraphs describe these steps in detail.

1.

Preparation for call. Values in scratch registers that must be kept alive across the call must be saved. They can be saved by either copying them into preserved registers or by saving them onto the memory stack.

The parameters must be set up in registers and memory as described in the Subroutine

linkage and parameter passing chapter of the STxP70 Application binary interface

manual (7937486).

2. Procedure call. All direct calls are made with a call relative instruction, which writes the link register (also known as LK) for the return link.

For direct local calls the PC-relative displacement to the target is computed at link time.

Compilers may assume that the standard displacement field in the call instruction is sufficiently wide to reach the target of the call. If the displacement is too large, the linker must supply a branch stub at some convenient point in the code; compilers must guarantee the existence of such a point by ensuring that code sections in the relocatable object files are no larger than the maximum reach of the call instruction.

Direct calls to other load modules cannot be statically bound at link time, so the linker must supply an import stub for the target procedure; this import stub obtains the address of the target procedure from the global offset table. The call instruction can then be statically bound using the PC-relative displacement to the import stub.

The call instruction saves the return link address in the link register, which is aliased to general purpose register R14.

3. Import stub (direct external calls only). The import stub is allocated in the load module of the caller, so that the call instruction may be statically bound to the address of the import stub. The import stub obtains the address of the target procedure’s entry point from the global offset table. In position-independent code (PIC), it must access the global offset table using the current GP (which means that the GP must be valid at the point of call). In absolute code, it can access the global offset table using an absolute reference, so the GP does not need to be valid at the point of call. The import stub then branches to the target entry point.

The detailed operation of an import stub is ABI specific.

When the target of a call is in the same load module, an import stub is not used.

However, for position-independent code, the GP value must still be valid for the caller at the point of call, so that if the target is an internal function, it can assume that the GP value is already correctly set.

The compiler may choose to generate calling code that performs the functions of the import stub. This saves a branch compared to using the import stub, but is less efficient than a direct call within the same load module. Therefore, the compiler should only do this if it deduces that call target is in a separate load module, or that there is a high probability of this.

8027948 Rev 15 139/166


4. Procedure entry. The prologue code in the target procedure is responsible for allocating a frame on the memory stack, if necessary.

If it is a non-leaf procedure, it must save the link register in the memory stack frame.

The prologue must also save any preserved registers that will be used in this procedure.

If it is a position-independent procedure that makes calls or accesses global data, then it must establish the GP value in the GP register. The GP register (R13) is a preserved register, and therefore must be saved before being modified. A position-independent internal function may assume that the GP register already contains the correct value.

A position-independent leaf procedure that accesses global data is not required to put the GP value in R13, it may use a scratch register instead, thus avoiding the need for saving and restoring register R13.

5. Procedure exit. The epilogue code is responsible for restoring the link register and any preserved registers that were saved.

If a memory stack frame was allocated, the epilogue code must deallocate it. Finally, the procedure exits by branching through the link register with the return instruction.

6. After the call. Any saved values should be restored.

Figure 28. Direct procedure calls

Caller Callee

Prepare the call

- setup arguments

- save registers

Import stub

-bad entry address

-goto

Entry

- allocate memory frame

- save return link

- save registers

Call

- call

After the call

- restore registers

Procedure body

Exit

- restore registers

- restore return link

- destroy memory frame

- return

140/166

Indirect procedure calls follow nearly the same sequence, except that the branch target is set indirectly. This sequence is best shown in

Figure 29

.

1.

Preparation for call. Indirect calls are built by loading the entry point address into the link register. Values in scratch registers that must be kept alive across the call must be saved, which can be done by either copying them into preserved registers or by saving them on the memory stack. The parameters must be set up in registers and memory as described in the Subroutine linkage and parameter passing chapter of the STxP70

Application binary interface manual (7937486).

8027948 Rev 15


2. Procedure call. All indirect calls are made with the call indirect instruction, which reads and writes the link register. The call instruction saves the return link address in the link register.

3. Procedure entry, exit, and return. The remainder of the calling sequence is the same as for direct calls.

Figure 29. Indirect procedure calls

Caller

Callee

Prepare the call

- load entry address

- setup arguments

- save registers

Entry

- allocate memory frame

- save return link

- save registers

Call

- call

After the call

- restore registers

Procedure body

Exit

- restore registers

- restore return link

- destroy memory frame

- return

8027948 Rev 15 141/166


9.3

Note:

Introduction to the relocatable loader library

UM1237

The relocatable loader library (RL_LIB) supports the creation and loading of DSOs

(dynamic shared objects, also known as load modules) in an embedded environment.

RL_LIB implements DSOs as defined in the standard for supporting ELF System V Dynamic

Linking.

For applications that do not rely on advanced OS features (such as file systems, virtual memory management and multi process segment sharing), use RL_LIB as an alternative to

the standard ELF System V Dynamic Loader (libdl.so).

9.3.2

The ELF System V ABI supports several run-time models. Only some run-time models are suitable for embedded systems without the support of traditional operating system services.

The run-time model for an application dictates the method used for linking and loading.

RL_LIB implements the R_Relocatable run-time model. The application has a main module and several load modules. The main module is statically linked and loaded. The load modules are loaded on demand (by explicit calls to the loader) at run-time. The load modules are loaded at an arbitrary address and dynamic symbol binding is applied by the loader for symbols undefined in the load modules. In the hierarchy of loaded modules, the dynamic symbol binding traverses the modules from the bottom up.

Relocatable run-time model

•

•

•

•

•

•

•

•

•

•

The R_Relocatable run-time model, as implemented by RL_LIB, has the following features: one main module loaded at application startup by the system several load modules that can load at run-time and unload after use several modules can be resident at the same time a loaded module can load and unload other load modules (as for the main module) load modules can be loaded anywhere access to symbols in loaded modules from the loader through a call to the loader library the loader performs dynamic symbol binding when loading a module and symbols are searched in the load modules hierarchy bottom-up (to the main module) sharing of code and data objects between modules is achieved by linking to the objects in a common ancestor the loader library is statically linked with the main module the system support archive library should be linked with the main module

Figure 30

shows an example of an application that has four load modules A, B, C and D.

142/166 8027948 Rev 15


Figure 30. Example of an application with four load modules

printf

Module B main printf malloc printf

*exec_A

Module A malloc

*exec_B

*exec_C

*exec_D malloc

Module C printf malloc

Module D

Note:

In

Figure 30

, curved arrows (from load modules to parent module) represent load time

symbol-binding performed while the load module loads. Straight arrows (from loader module to loaded module) represent explicit symbol address resolution performed through the loader library API.

The following describes a possible scenario.

1.

At run-time, the main module loads the module A into memory through the rl_load_file()

function.

2. The loader, in the process of loading A into memory, binds the symbol printf

(undefined in A) to the printf function defined in main.

3. The main program uses the rl_sym() function to retrieve a pointer to the function symbol exec_A in A.

4. For A, the main program loads the module D and references to printf are resolved to the printf in main. In addition, references to malloc in D are also resolved to the malloc

in main.

5. The main program retrieves a pointer to exec_D in D using the rl_sym() function.

6. The main program (at some point) invokes the function exec_A.

7. The function loads the two modules B and C.

8. The undefined reference to printf in B is resolved to the printf in main (the loader searches first in A, and then in main).

9. The undefined reference to malloc in C is resolved to the malloc in A (the loader searches for and finds it in A). Note that the malloc function called from D (malloc of main) is then different from the malloc function called from B (or C, or A) which is the malloc

of A.

10. After retrieving symbol addresses using the rl_sym() function, module A can indirectly call functions or reference data in B and C.

At any time, the main module or the module A can unload one of the loaded modules.

8027948 Rev 15 143/166


The relocatable code generation model

•

•

The relocatable code generation model is the same as the code generation model for the

System V model with the following differences.

No symbol can be preempted. Dynamic symbol binding always searches the current module first. This has the effect that a module containing a symbol definition can be sure that it will use this definition. For example, this enables inlining in load modules.

Weak references are treated the same way as undefined references in load modules.

Therefore, when traversing the module tree bottom-up, the first definition found is taken.

9.4 Relocatable loader library API

The relocatable loader library supports loading and unloading a module and for accessing a symbol address in a module by name. The relocatable loader library is provided as a library librl.a

and its associated header file rl_lib.h.

The functions defined in this API are explained in the following sections.

All the functions manipulating a load module use a pointer to the rl_handle_t type. This is an abstract type for a load module handle.

A load module handle is allocated by the rl_handle_new() function and deallocated by the rl_handle_delete() function.

The main module handle is statically allocated and initialized in the startup code of the main module.

A module handle references one loaded module at a time. To load another module from the same handle, the previous module must first be unloaded.

144/166 8027948 Rev 15


rl_handle_new

Definition:

rl_handle_t *rl_handle_new(

const rl_handle_t *parent,

int mode);

Allocate and initialize a new handle

Arguments:

parent mode

mode

The handle of the parent module.

Determines the RL_LIB chunk mode. Valid values for mode are:

RL_ONE_CHUNK_MODE

(defined to be 0)

RL_MULTIPLE_CHUNK_MODE

(defined to be 1)

Returns:

Description:

The newly initialized handle.

The rl_handle_new() function allocates and initializes a new handle that can be used for loading and unloading a load module.

The handle of the parent module to which the loaded module will be connected is specified by the parent argument.

In RL_MULTIPLE_CHUNK_MODE, the mode argument activates two separate memory allocators: rl_text_memalign for text segments and rl_data_memalign for data segments. In RL_ONE_CHUNK_MODE, the mode argument activates one global memory allocator rl_memalign, for any segment type.

Generally, a load module will be attached to the module using this function, therefore a handle will typically be allocated as follows: rl_handle_t *new_handle = rl_handle_new(rl_this(),

RL_ONE_CHUNK_MODE);

rl_handle_delete

Definition:

int rl_handle_delete(

rl_handle_t *handle);

Arguments:

Finalize and deallocate a module handle

handle

The handle to deallocate.

Returns:

Description:

Returns 0 for success, -1 for failure.

The rl_handle_delete() function finalizes and deallocates a module handle.

The handle must not hold a loaded module. The loaded module must have been first unloaded by rl_unload() before calling this function. If successful, the value returned is 0. Otherwise the value returned is -1 and the error code returned by rl_errno()

is set accordingly.

8027948 Rev 15 145/166


rl_this

Definition:

Arguments:

Returns:

Description:

Return the handle for the current module

rl_handle_t *rl_this(void);

None.

The handle for the current module.

The rl_this() function returns the handle for the current module. If called from the main module, it returns the handle of the main module. If called from a loaded module, it returns the handle that holds the loaded module.

This function is used when allocating a handle with rl_handle_new(). It can also be used, for example, to retrieve a symbol in the current module: void *symbol_ptr = rl_sym(rl_this(), "symbol");

rl_parent

Definition:

Arguments:

Returns:

Description:

Return the handle for the parent of the current handle

rl_handle_t *rl_parent(void);

None.

The handle for the parent of the current handle.

The rl_parent() function returns the handle for the parent of the current handle

(as returned by rl_this()).

It may be used, for example, to find a symbol in one of the parent modules: void *symbol_in_parents = rl_sym_rec(rl_parent(), "symbol");

rl_load_addr

Definition:

Return the memory load address of a loaded module

const char *rl_load_addr(


Arguments:

handle

The handle for the loaded module.

Returns:

Description:

The memory load address of the loaded module, or NULL.

The rl_load_addr() function returns the memory load address of a loaded module. It returns NULL if the handle does not hold a loaded module or if the handle passed is the main program handle.

rl_load_size

Definition:

Return the memory load size of a loaded module

unsigned int rl_load_size(


Arguments:

handle


Returns:

Description:

The memory load size of the loaded module, or 0.

The rl_load_size() function returns the memory load size of a loaded module. It returns 0 if the handle does not hold a loaded module or if the handle passed is the main program handle.

146/166 8027948 Rev 15


rl_file_name

Return the filename associated with the loaded module handle

Definition:

const char *rl_file_name(


Arguments:

handle


Returns:

Description:

The filename associated with the loaded module handle, or NULL.

The rl_file_name() function returns the filename associated with the loaded module handle. It returns NULL if no filename is associated with the current loaded module, if the handle does not hold a loaded module or if the handle passed is the main program handle.

rl_set_file_name

Definition:

int rl_set_file_name(

rl_handle_t *handle,

const char *f_name);

Arguments:

handle f_name

Specify a filename for the handle

The handle for the module.

The filename to specify for the handle.

Returns:

Description:


The rl_set_file_name() function is used to specify a filename for a handle. This filename is attached to the next module that will be loaded. It can be used to specify a filename for modules loaded from memory or to force a different filename for a module loaded from a file.

This function returns 0 if the filename was successfully set, or -1 and the error code returned by rl_errno() is set accordingly if a module is already loaded or if the application runs out of memory.

8027948 Rev 15 147/166


rl_load_buffer

Definition:

int rl_load_buffer(


const char *image);

Arguments:

handle image

Load a relocatable module into memory


The image of the load module.

Returns:

Description:


The rl_load_buffer() function loads a relocatable module into memory from the image referenced by image.

It allocates the space for the loaded module in the heap, loads the segments from the memory image of the loadable module, links the module to the parent module of the handle and relocates and initializes the loaded module.

This function calls the action callback functions for RL_ACTION_LOAD after loading and before executing any code in the loaded module.

The value 0 is returned if the loading was successful. The value -1 is returned on failure and the error code returned by rl_errno() is set accordingly.

rl_load_file

Definition:

Load a relocatable module into memory from a file

int rl_load_file(


const char *f_name);

Arguments:

Returns:

Description:

handle f_name


The file from which to load the relocatable module.


The rl_load_file() function loads a relocatable module into memory from the file specified by f_name.

It opens the specified file with an fopen() call, allocates the space for the loaded module in the heap, loads the segments from the file, links the module to the parent module of the handle, relocates and initializes the loaded module. The file is closed with fclose() before returning. This function calls the action callback functions for the RL_ACTION_LOAD after loading and before executing any code in the loaded module.

0

is returned if the load was successful, -1 is returned on failure and the error code returned by rl_errno() is set accordingly.

148/166 8027948 Rev 15


rl_load_stream

Load a relocatable module into memory from a byte stream

Definition:

typedef int rl_stream_func_t (

void *cookie,

char *buffer,

int length); int rl_load_stream(


rl_stream_func_t *stream_func,

void *stream_cookie);

Arguments:

Returns:

Description:

handle stream_func stream_cookie


The user specified callback function.

The user specified state.


The rl_load_stream() function loads a relocatable module into memory from a byte stream provided through a user specified callback function stream_func and the user specified state stream_cookie.

The callback function must be of type rl_stream_func_t. It is called multiple times by the loader to retrieve the load module data in the buffer buffer of length length until the module is loaded into memory. The loader always calls the callback function with a buffer length strictly greater than 0. The stream_cookie argument passed to rl_load_stream

is passed to the callback function in its cookie parameter. The cookie

parameter is intended to be used by the callback function to update a private state.

The callback function must return the number of bytes transferred. If the returned value is less than the given buffer length or is -1, rl_load_stream() will in turn return an error and the error code returned by rl_errno() is set accordingly.

The rl_load_stream() function allocates the space for the loaded module from the heap, loads the segments by calling the callback function, links the module to the parent module of the handle, relocates and initializes the loaded module. This function calls the action callback functions for RL_ACTION_LOAD after loading and before executing any code in the loaded module.

0

is returned if the load was successful, -1 is returned on failure and the error code returned by rl_errno() is set accordingly.

This function can be used as an alternative to rl_load_buffer() or rl_load_file()

to allow any loading method to be implemented.

8027948 Rev 15 149/166


rl_unload

Definition:

Arguments:

UM1237

The following example illustrates how the rl_load_file() function may be implemented using the rl_load_stream() function:

/* User implementation of the callback function that read from a file. */

static int rl_stream_read(FILE *file, char *buffer, int length)

{

int nbytes;

nbytes = fread(buffer, 1, length, file);

}

return nbytes;

...

{

/* Loads the module from a file.*/

FILE *file;

int status;

file = fopen(f_name, "rb");

if (file == NULL) { /*... error... */ }

status = rl_load_stream(handle, (rl_stream_func_t

*)rl_stream_read, file);

if (status == -1) { /*... error... */ }

fclose(file);

}

...

Unload a previously loaded relocatable module

int rl_unload(


Returns:

Description:

handle



The rl_unload() function unloads a previously loaded relocatable module. It finalizes, unlinks, and frees allocated memory for the loaded module. This function calls the action callback functions for RL_ACTION_UNLOAD before unloading and after having executed finalization code in the module.

The return value is 0 if the unloading is successful, otherwise the return value is -1 and the error code returned by rl_errno() is set accordingly.

150/166 8027948 Rev 15


rl_sym

Definition:

Return a pointer reference to the symbol in the loaded module

void *rl_sym(


const char *name);

Arguments:

Returns:

Description:

handle name


The symbol in the loaded module.

The pointer reference to the symbol.

The rl_sym() function returns a pointer reference to the symbol named name in the loaded module specified by handle. It searches the dynamic symbol table of the loaded module and returns a pointer to the symbol. The handle parameter can be the handle of any currently loaded module, or the handle of the main module.

If the symbol is not defined in the loaded module, NULL is returned. It is not generally an error for this function to return NULL. For example, the user may conditionally call a specific function only if it is defined in the module.

In this function, as well as in the rl_sym_rec() function, the name parameter must be the mangled symbol name. For instance, on some targets, C names are mangled by prefixing the name with an underscore (_). For example, to return a reference to the printf() function, the symbol name passed to rl_sym() will be “_printf”.

rl_sym_rec

Return a pointer reference to the symbol in the loaded module or one of its ancestors

Definition:

void *rl_sym_rec(


const char *name);

Arguments:

handle name


The symbol in the loaded module.

Returns:

Description:

The pointer reference to the symbol.

The rl_sym_rec() function returns a pointer reference to the symbol named name in the loaded module specified by handle or one of its ancestors.

This function searches the dynamic symbol table of the loaded module and returns a pointer to the symbol if found. If the symbol is not found, the function iteratively searches in the dynamic symbol table of the parent module until the symbol is found.

The handle parameter can be the handle of any currently loaded module, or the handle of the main module.

If the symbol is not defined in the loaded module or one of its ancestors, NULL is the returned. It is not generally an error for this function to return NULL.

The name parameter must be the mangled symbol name as for the rl_sym() function.

8027948 Rev 15 151/166


rl_foreach_segment

Definition:

Iterate over all the segments of loaded module and call the supplied function

typedef rl_segment_info_t_ rl_segment_info_t; typedef int rl_segment_func_t (


rl_segment_info_t *seg_info,

void *cookie); int rl_foreach_segment(


rl_segment_func_t *callback_fn,

void *callback_cookie);

Arguments:

Returns:

Description:

handle callback_fn



callback_cookie

The argument to pass to the function.


The rl_foreach_segment() function iterates over all the segments of the loaded module handle and calls back the user supplied function. For each segment, the function callback_fn is called with the following parameters.

handle

The handle passed to the function.

seg_info

The segment information pointer filled with the current segment information.

cookie

The argument passed to the function.

The segment information returned in seg_info is a pointer to the following structure: typedef unsigned int rl_segment_flag_t; struct rl_segment_info_t_ {

const char *seg_addr;

unsigned int seg_size;

rl_segment_flag_t seg_flags;

};

The user callback function must return 0 on success or -1 on error.

In the case where the callback function returns an error, the rl_foreach_segment()

function returns -1 and the error code returned by rl_errno

is set to RL_ERR_SEGMENTF. Otherwise the function returns 0.

152/166 8027948 Rev 15


rl_add_action_callback

Definition:

Add a user action callback function to the user action callback list

typedef unsigned int rl_action_t;

#define RL_ACTION_UNLOAD 2

Arguments:

Returns:

Description:

typedef int rl_action_func_t (


rl_action_t action,

void *cookie); int rl_add_action_callback(

rl_action_t action_mask,

rl_action_func_t *callback_fn,

void *callback_cookie); action_mask

The set of actions for which the callback function must be called.

callback_fn


callback_cookie

The argument to pass to the function.


The rl_add_action_callback() function adds a user action callback function to the user action callback list. It can be called multiple times with different callback functions. The same callback function cannot be added more than once.

For each defined action, each callback function is called in the order it was added into the callback list. The callback functions are not attached to a particular module and are called for any further loaded/unloaded modules.

This function returns 0 on success and -1 on failure. It does not set any error codes.

This function can fail if a callback function is already in the callback list or if the program goes out of memory.

The rl_action_t type defines the action flags for module loading/unloading and is passed to the action function callback. The action flags can be OR-ed to create an action mask that can be passed to the function rl_add_action_callback(). The action defined are:

RL_ACTION_LOAD

The callback is called just after the module has been loaded in memory and cache has been synchronized. No module code has been executed.

RL_ACTION_UNLOAD

The callback is called just before the module is unloaded from memory. No module code will be executed after this point.

RL_ACTION_ALL

The callback will be called for any action.

8027948 Rev 15 153/166


The type for the user action callback function is rl_action_func_t. The parameters passed to the callback function when it is called are: handle

The handle that performed the action.

action performed.

cookie

The parameter passed to rl_add_action_callback()

.

The callback function returns 0 on success and -1 on failure. In the case of failure, the loading (or unloading) of the module is undone and the error code returned by rl_errno()

is set to RL_ERR_ACTIONF.

rl_delete_action_callback

Definition:

Remove the given function from the action callback list

int rl_delete_action_callback(

rl_action_func_t *callback_fn);

Arguments:

Returns:

Description:

callback_fn


Returns 0 for success, -1 if the callback was not present in the callback list.

The rl_delete_action_callback() function removes the specified callback function from the action callback list. This function returns 0 if the callback was removed, or -1 if it was not present in the callback list. No error code is set.

rl_errno

Definition:

Return the error code for the last failed function

int rl_errno(


Arguments:

Returns:

Description:

handle


The error code for the last failed function.

The rl_errno() function returns the error code for the last failed function.

Table 31

lists the possible codes.

Table 31.

Errors returned by rl_errno()

Error code

RL_ERR_NONE

RL_ERR_MEM

RL_ERR_ELF

Diagnostic

Possible error causing function

No previous call has failed.

Ran out of memory (rl_memalign(), rl_text_memalign()

or rl_data_memalign() failed).

The load module is not a valid ELF file.

rl_load_buffer()

, rl_load_file()

, rl_load_stream()

, rl_set_file_name() rl_load_buffer()

, rl_load_file()

, rl_load_stream()

, rl_set_file_name()

154/166 8027948 Rev 15


Table 31.

Errors returned by rl_errno() (continued)

Error code Diagnostic

Possible error causing function

RL_ERR_DYN

RL_ERR_SEG

RL_ERR_REL

RL_ERR_RELSYM

RL_ERR_SYM

RL_ERR_FOPEN

RL_ERR_FREAD

RL_ERR_STREAM

RL_ERR_LINKED

RL_ERR_NLINKED

RL_ERR_SEGMENTF

RL_ERR_ACTIONF

The load module is not a dynamic library.

rl_load_buffer()

, rl_load_file()

, rl_load_stream()


The load module has invalid segment information.

The load module contains invalid relocations.

rl_load_buffer()

, rl_load_file()

, rl_load_stream()

, rl_set_file_name() rl_load_buffer()

, rl_load_file()

, rl_load_stream()


A symbol was not found a load time.

rl_errarg()

returns the symbol name.

rl_load_buffer()

, rl_load_file()

, rl_load_stream()


The symbol is not defined in the module.

rl_errarg()

returns the symbol name.

rl_sym()

, rl_sym_rec()

The file cannot be opened by rl_fopen()

.

Error while reading the file in rl_fread()

.

Error while loading the file from a stream.

rl_load_file() rl_load_file() rl_load_stream()

Module handle is already linked.

Module handle is not linked

Error in segment function callback.

Error in action function callback. rl_load_file()

, rl_load_buffer()

, rl_load_stream()

, rl_handle_delete() rl_unload()

, rl_sym(), rl_sym_rec()

, rl_foreach_segment() rl_foreach_segment() rl_load_file()

, rl_load_buffer()

, rl_load_stream()

8027948 Rev 15 155/166


rl_errarg

Definition:

Arguments:

UM1237

Return the name of the symbol that could not be resolved

const char *rl_errarg(


Returns:

Description:

rl_errstr

Definition:

Arguments:

handle


The name of the symbol that could not be resolved.

If rl_errno() returns either RL_ERR_RELSYM or RL_ERR_SYM, the rl_errarg() function returns the name of the symbol that could not be resolved.

Return a string for an error code

const char *rl_errstr(


Returns:

Description:

handle


A string for the error code.

The rl_errstr() function returns a readable string for the error code reported by rl_errno()

. For example:

...

void *sym = rl_sym(handle, "symbol"); if (sym == NULL) fprintf(stderr, "failed: %s\n", rl_errstr(handle));

...

If symbol is not defined in the module referenced by handle then the following message is displayed: failed: symbol not found: symbol

156/166 8027948 Rev 15


9.5 Customization

The relocatable loader library defines a number of functions that it uses internally for providing services such as heap memory management and file access. To provide custom implementation of these functions, the application in the main module can override these functions.

Note:

These functions allocate free space for the load module image and for the handle objects: void *rl_malloc(int size); void *rl_memalign(int align, int size); void *rl_text_memalign(int align, int size); void *rl_data_memalign(int align, int size); void rl_free(void *ptr);

•

•

•

Where: rl_memalign

is valid only in RL_ONE_CHUNK_MODE rl_text_memalign

is valid only in RL_MULTIPLE_CHUNK_MODE for text segments rl_data_memalign

is valid only in RL_MULTIPLE_CHUNK_MODE for data segments

The default behavior for these functions is to call the standard C library functions malloc()

, memalign() and free() respectively.

If providing a custom implementation, override all three functions.

Note:

The rl_load_file() function uses these functions to open, read and close a file handle: void *rl_fopen(const char *f_name, const char *mode); int rl_fclose(void *file); int rl_fread(char *buffer, int eltsize, int nelts, void *file);

The default behavior for these functions is to call the standard C library functions fopen(), fread()

and fclose() respectively.

If providing a custom implementation, override all three functions and link them with the main program.

To build a relocatable library that can be loaded by the RL_LIB loader, additional compile time and link time options must be used.

The following is a simple example of building a hello world loadable module: stxp70cc -o rl_hello.o -fpic -Mgot=small -c rl_hello.c

stxp70cc -o rl_hello.rl --rlib rl_hello.o

Alternatively, the compile and link phases can be carried out with a single command: stxp70cc -o rl_hello.rl -fpic -Mgot=small --rlib rl_hello.c

To build a main module suitable for loading a relocatable library, specific link time options are required. No specific compile time option are required for the main module.

8027948 Rev 15 157/166


9.6.1

Note:

UM1237

The following is an example of building a main module: stxp70cc -o prog.o prog.c

stxp70cc -o prog.exe --rmain prog.o

The compile and link phases can be carried out with a single command: stxp70cc -o prog.exe --rmain prog.c

Importing and exporting symbols

For the relocatable loader system to function, the main module (or a loaded module) must provide services to the other load modules. To avoid a load error when loading a module, it is usual for the referenced symbols to be linked into the main module.

When the services are present in a library, the main module imports the corresponding symbols at link time. However, to import symbols, the linker requires an import script.

•

•

stxp70-rltool generates a list of symbols in the form of an import or export script from the specified input files. Where, the input files are either load modules (relocatable libraries) or a text file containing a list of symbols:

An import script is generated from a list of symbols specified in the file symbol_list

(where, symbol_list must have only one symbol on each line), or from one or more load module files. In the latter case, the stxp70-rltool utility generates an import script from the set of symbols that the load modules require.

An export script can be generated to reduce the size of the dynamic symbol table in the main module or load modules. An export script is not mandatory as all global symbols are exported by default.

The export script defines the set of symbols (and only these) that must be exported to the other modules through the dynamic symbol table. These symbols are then accessible by the load time symbol binding process and by the calls to rl_sym() and rl_sym_rec()

.

This utility has both a generic driver stxp70-rltool as well as version specific commands to invoke it: stxp70v3-rltool and stxp70v4-rltool. All versions of the utility are documented in the STxP70 utilities reference manual (8210925).

stxp70v3-rltool and stxp70v4-rltool are identical in terms of options and arguments.

Using the relocatable loader import/export utility

This section provides some examples of using the relocatable loader import/export utility.

•

•

Two common scenarios where an import script might be generated are:

When the required services are well defined and a list of symbols can be passed to the

stxp70-rltool utility.

When the list of services is not defined but the load modules are available and can be passed to the stxp70-rltool utility. The stxp70-rltool utility generates an import script from the set of symbols that the load modules require.

The following command generates an import script from a list of symbols specified in the file prog_import.lst

(one symbol per line): stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -s -o prog_import.ld prog_import.lst

158/166 8027948 Rev 15


The following command generates an import script that the main module can load from a list of load modules, liba.rl and libb.rl: stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -o prog_import.ld liba.rl libb.rl

Use the import script to link the main module, for example: stxp70cc -o prog.exe --rmain object_files.o prog_import.ld

•

•

Two common scenarios where an export script might be generated are:

When an import script is required for the module, the export script can be generated at the same time. This is because the symbols to export are generally those that are imported.

For a load module that has a well known external interface, the export script can be generated from a list of symbols to export.

The following example shows how to generate an export script and import script for a list of modules that is then used when linking the main module. Only the symbols from liba.rl and libb.rl are imported into the main module and exported by it.

stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -e -o prog_import_export.ld liba.rl libb.rl

stxp70cc -o prog.exe --rmain object_files.o prog_import_export.ld

To generate an export script for a load module with a well defined interface specified in the file liba_export.lst (one symbol per line): stxp70-rltool -mcore=[stxp70v3|stxp70v4] -e -s -o liba_export.ld liba_export.lst

stxp70cc -o liba.rl --rlib *.o liba_export.ld

When compiling a load module with the -fpic -Mgot=small option, some overhead occurs in the generated code to access functions and data objects. Compiler options and

C language extensions can be used to reduce this overhead.

Relocatable libraries are not subject to symbol preemption, therefore, when generating position independent code, the -fvisibility=protected option can be used in addition to -fpic -Mgot. The -fvisibility=protected option enables the inlining of global functions and can be used as a default option for compiling relocatable libraries. For example: stxp70cc -o a.o -fpic -Mgot=small -fvisibility=protected a.c

In addition to this option, fine grain visibility can be specified with the

__attribute__((visibility(...))

GNU C extension at the source code level.

For example, if the external interface of a load module is well defined in a header file, the

__attribute__((visibility("protected"))

can be attached to each function of the external interface. To specify that all other defined functions are internal to the load module, on the command line, use the -fvisibility=hidden option. This combination of options optimize references from the same file to global objects that are not part of the interface.

To specify the visibility of each symbol externally with the given <file>, use the mvisibility-decl=<file>

option. In the case where the external services required by a module (default visibility) and the external services provided by the module (protected visibility) are known, all other functions or data objects can be declared as internal (hidden visibility). This option can be used to specify these visibility declarations. In this case, only

8027948 Rev 15 159/166


the functions that are external have an associated overhead. The other internal functions have a very reduced overhead.

For a full inter-procedural optimization of the relocatable library, use the -ipa option. In this case, when combined with the declaration of external functions, the library is generated with a minimal overhead for the dynamic linking support.

For detailed information on the visibility specification, refer to the compiler options documentation and to the ELF System V Dynamic Linking ABI.

The debugging of dynamically loaded modules is possible in the same way as for System V dynamic shared objects. The main module debugging information loads at load time of the application. The load modules debugging information loads at load time of the load modules.

To update debugging information, the loader maintains a list of loaded modules together with their filenames (the file contains the debugging information) and the load address of the module. Each time a new module loads, the loader calls a specific function. The debugger has to set a breakpoint on this specific function and, when the breakpoint is hit, traverse the list to find new loaded modules and load the debugging information.

For the STxP70 toolset, the debugger implements the required mechanism for the automatic debugging of loaded modules.

To find the file that contains the debug information, the loader must know the path to the load module. This is automatic in the case of rl_load_file() as the filename is specified in the interface. For the rl_load_buffer() and rl_load_stream() functions, the user must set the filename with a call to the rl_set_file_name() function.

For example, the following code enables automatic debugging of a load module loaded with rl_load_buffer()

:

{

int status;

rl_handle_t *handle = rl_handle_new(rl_this(), 0);

if (handle == NULL) { /* error */ }

#ifdef DEBUG_ENABLED

rl_set_filename(handle, "path_to_the_file_for_the_module");

#endif

status = rl_load_buffer(handle, module_image);

}

if (status == -1) { /* error */ }

...

160/166 8027948 Rev 15


The action callbacks may be used with a profiling support library, or alternatively, a user defined package can be informed that a segment has just been loaded or is on the point of being unloaded by using the user action callback interface.

Below is an example that iterates over the segment list and declares the executable segments to a profiling support library on the loading/unloading of a module.

static int segment_profile(rl_handle_t *handle, rl_segment_info_t

*info,

{

rl_action_t action = *((rl_action_t *)cookie);

const char *file_name = handle_file_name(handle);

if (file_name != NULL && (info->seg_flags & RL_SEG_EXEC) {

if (action == RL_ACTION_LOAD) {

/* Call profiling interface for adding a code region. */

profiler_add_region(file_name, info->seg_addr, info-

>seg_size);

}

if (action == RL_ACTION_UNLOAD) {

/* Call profiling interface for removing a code region. */ info->seg_size);

}

}

}

return 0; static int module_profile(rl_handle_t *handle, rl_action_t action,

{

rl_foreach_segment(handle, segment_profile, (void *)&action);

}

return 0; int main()

{

...

if (rl_add_action_callback(RL_ACTION_ALL, module_profile,

NULL)==-1){

fprintf(stderr, "rl_add_Action_callback failed\n"); exit(1);

}

...

status = rl_load_file(handle, file_name);

...

}

return 0;

8027948 Rev 15 161/166


9.9 Memory protection support

UM1237

When a new library segment has loaded into memory or is on the point of being unloaded from memory, a system library (or the user) can use the user-action callback interface to install a memory protection scheme.

To set user protection support, use the user-action callback, see

Section 9.8: Profiling support

.

A basic MUTEX implementation is provided in the STxP70 targeting of the pre-compiled

RL_LIB, delivered with the toolset. In addition, because there is no cache activated on the

STxP70, specific functions such as bsp_cache_purge_data and bsp_cache_invalidate_instruction (which respectively purge the data cache and handle instruction cache invalidation) are not implemented.

It is the programmer’s responsibility to implement those functions depending on the platform and STxP70 architecture used.

Table 32

provides details of the files’ location in the toolset

distribution.

Table 32.

RL_LIB source file location

Functionality Source file

STxP70 v3 MUTEX implementation

STxP70 v3 Cache management

STxP70 v4 MUTEX implementation

STxP70 v4 Cache management

<RL_LIB_root>/librl/config/stxp70v3/sys_mutex.[c|h]

<RL_LIB_root>/librl/config/stxp70v3/targ_elf.[c|h]

<RL_LIB_root>/librl/config/stxp70v4/sys_mutex.[c|h]

<RL_LIB_root>/librl/config/stxp70v4/targ_elf.[c|h]

162/166 8027948 Rev 15

UM1237 Compiler bugs

This chapter describes the different categories of compiler bugs and how they should be reported to STMicroelectronics.

10.1 Identifying a compiler bug

•

•

•

•

•

•

The following cases are compiler or toolset bugs: the compilation phase ends with an assertion message the compilation phase ends with a system error message (core dump, bus error) the compilation phase produces an output that cannot be assembled the compilation phase never ends, or at least does not end in a reasonable amount of time the compiler produces an error message for code that is valid input the compiler produces code that does not compute the expected results (but see

Section 10.1.2

)

•

The following case is possibly not a compiler or toolset bug.

The code is functional under a specific optimization level, but not under another. This may be due to an existing code bug that is only exposed by aggressive optimization.

10.2 Checks performed by user

•

•

•

•

•

The following checks should be performed on your code before reporting a bug: check that the code works correctly on at least one other compiler, on another host check that the code does not access out-of-bound memory check that the source code does not raise any warning when compiled with the -Wall option check that the source code does not make assumptions that may be false: specifically check restrict annotations, and optimization pragmas check that the code does not exercise language edges or does not violate language standards: an example of undefined behavior is to assume a specific behavior of shift operators when the shift amount is negative or bigger than the size of the type shifted

8027948 Rev 15 163/166

Compiler bugs UM1237

10.3 Workaround

The following can be carried out to temporarily work-around a compiler bug.

1.

Demote the optimization level to -O1 or -O0 when compiling the specific file creating the problem, either in category 1 or 2. (See

Section 10.1.1

and

Section 10.1.2

.)

2. Remove the optimization pragmas or restrict annotations.

3. Finally, check that you have an up-to-date compiler release.

10.4 Reporting a compiler bug

Carry out the following if a compiler bug is encountered.

1.

Obtain your compiler version by running the command stxp70cc -version.

2. If the compiler bug is in category 1 (see

Section 10.1.1

), prepare a pre-processed input

file that can reproduce the problem.

3. If the compiler bug is in category 2 (see

Section 10.1.2

), prepare a source set and

Makefile that can reproduce the problem.

4. Supply the full command line that generates the problem.

5. Report the result of the following command in the shell that you use: uname -a.

6. Prepare a description of the expected result and the actual result.

7. Report all the above information through your local ST Field Applications Engineer

(FAE).

Finally, when in doubt, it is preferable that a possible bug is reported than ignored.

10.5 Known bugs and limitations

Please refer to the Release note supplied with the toolset for an up-to-date list of bugs and limitations.

164/166 8027948 Rev 15

UM1237 Revision history

Table 33.

Document revision history

Date Revision Changes

Earlier revision history entries deleted as they are no longer pertinent.

05-Mar-2012

17-May-2012

11

12

Update for STxP70 toolset 2012.1.

Updated

Documentation suite on page 9

to remove references to STxP70 assembler documents. The assembler as is documented in the GNU documents, supplied with the toolset.

Updated

Inlining criteria on page 56

to change -INLINE:none to -INLINE:off.

Added the option -INLINE:size_static and updated the description of -

INLINE:all

in


.

Added

Inlining static functions on page 57

.

Update for STxP70 toolset 2012.1 patch 001.

Table 15: Code generation options on page 31

updated -mlib-short-double

and added -mlib-nofloat.

Table 19: C99 support in stxp70cc on page 42

updated throughout.

19-Sep-2012

28-Jan-2013

13

14


Updated

Table 6

and

Table 7

to add config options bypass and bhb.

Updated

Table 13

to add -o4 optimization option.

Updated

Table 14

, --deadcode and -f[no]unroll-loops options.

Updated

Table 15

to add -maggressive_unroll option.

Updated

Table 20

optimization levels.

Updated

Table 21

, -INLINE:size_static to add -o4 optimization level.

Added


.

Updated

Table 27

, -IPA:mem_placement to include -o4 optimization level.

Updated

Section 6.4: Restrictions on page 112

.

Updated

Section 8.2.1: Compiler options on page 122

.

Update for STxP70 toolset 2012.2. Update 01.

Updated

rl_handle_new on page 145

to add mode argument.

Updated

rl_errno on page 154

to expand description of RL_ERR_MEM error code.

Updated

Section 9.5.1: Memory allocation on page 157

.

08-May-2013 15


Corrected syntax for FPx registers in

Chapter 6: GNU ASM on page 109

.

Added options to control warnings generated for -fpack-struct in


and updated description of -fpack-struct in


.

Updated the description of -f[no-]math-errno in


to

reflect its changed behavior in this toolset release.

Added GNU assembly parsing options at the end of


.

Added


.

8027948 Rev 14 165/166

UM1237

Please Read Carefully:

Information in this document is provided solely in connection with ST products. STMicroelectronics NV and its subsidiaries (“ST”) reserve the right to make changes, corrections, modifications or improvements, to this document, and the products and services described herein at any time, without notice.

All ST products are sold pursuant to ST’s terms and conditions of sale.

Purchasers are solely responsible for the choice, selection and use of the ST products and services described herein, and ST assumes no liability whatsoever relating to the choice, selection or use of the ST products and services described herein.

No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted under this document. If any part of this document refers to any third party products or services it shall not be deemed a license grant by ST for the use of such third party products or services, or any intellectual property contained therein or considered as a warranty covering the use in any manner whatsoever of such third party products or services or any intellectual property contained therein.

UNLESS OTHERWISE SET FORTH IN ST’S TERMS AND CONDITIONS OF SALE ST DISCLAIMS ANY EXPRESS OR IMPLIED

WARRANTY WITH RESPECT TO THE USE AND/OR SALE OF ST PRODUCTS INCLUDING WITHOUT LIMITATION IMPLIED

WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE (AND THEIR EQUIVALENTS UNDER THE LAWS

OF ANY JURISDICTION), OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

ST PRODUCTS ARE NOT AUTHORIZED FOR USE IN WEAPONS. NOR ARE ST PRODUCTS DESIGNED OR AUTHORIZED FOR USE

IN: (A) SAFETY CRITICAL APPLICATIONS SUCH AS LIFE SUPPORTING, ACTIVE IMPLANTED DEVICES OR SYSTEMS WITH

PRODUCT FUNCTIONAL SAFETY REQUIREMENTS; (B) AERONAUTIC APPLICATIONS; (C) AUTOMOTIVE APPLICATIONS OR

ENVIRONMENTS, AND/OR (D) AEROSPACE APPLICATIONS OR ENVIRONMENTS. WHERE ST PRODUCTS ARE NOT DESIGNED

FOR SUCH USE, THE PURCHASER SHALL USE PRODUCTS AT PURCHASER’S SOLE RISK, EVEN IF ST HAS BEEN INFORMED IN

WRITING OF SUCH USAGE, UNLESS A PRODUCT IS EXPRESSLY DESIGNATED BY ST AS BEING INTENDED FOR “AUTOMOTIVE,

AUTOMOTIVE SAFETY OR MEDICAL” INDUSTRY DOMAINS ACCORDING TO ST PRODUCT DESIGN SPECIFICATIONS.

PRODUCTS FORMALLY ESCC, QML OR JAN QUALIFIED ARE DEEMED SUITABLE FOR USE IN AEROSPACE BY THE

CORRESPONDING GOVERNMENTAL AGENCY.

Resale of ST products with provisions different from the statements and/or technical features set forth in this document shall immediately void any warranty granted by ST for the ST product or service described herein and shall not create or extend in any manner whatsoever, any liability of ST.

ST and the ST logo are trademarks or registered trademarks of ST in various countries.

Information in this document supersedes and replaces all information previously supplied.

The ST logo is a registered trademark of STMicroelectronics. All other names are the property of their respective owners.

© 2013 STMicroelectronics - All rights reserved

STMicroelectronics group of companies

Australia - Belgium - Brazil - Canada - China - Czech Republic - Finland - France - Germany - Hong Kong - India - Israel - Italy - Japan -

Malaysia - Malta - Morocco - Philippines - Singapore - Spain - Sweden - Switzerland - United Kingdom - United States of America

www.st.com

166/166 8027948 Rev 15

No results