Universal Binary Programming Guidelines

Universal Binary Programming Guidelines
Universal Binary Programming Guidelines
Preliminary 2005-06-06
! Apple Computer, Inc.
© 2005 Apple Computer, Inc.
All rights reserved.
No part of this publication may be
reproduced, stored in a retrieval
system, or transmitted, in any form or
by any means, mechanical, electronic,
photocopying, recording, or
otherwise, without prior written
permission of Apple Computer, Inc.,
with the following exceptions: Any
person is hereby authorized to store
documentation on a single computer
for personal use only and to print
copies of documentation for personal
use provided that the documentation
contains Apple’s copyright notice.
The Apple logo is a trademark of
Apple Computer, Inc.
Use of the “keyboard” Apple logo
(Option-Shift-K) for commercial
purposes without the prior written
consent of Apple may constitute
trademark infringement and unfair
competition in violation of federal
and state laws.
No licenses, express or implied, are
granted with respect to any of the
technology described in this book.
Apple retains all intellectual property
rights associated with the technology
described in this book. This book is
intended to assist application
developers to develop applications
only for Apple-labeled or
Apple-licensed computers
Every effort has been made to ensure
that the information in this document
is accurate. Apple is not responsible
for typographical errors.
Apple Computer, Inc.
1 Infinite Loop
Cupertino, CA 95014
408-996-1010
Apple, the Apple logo, AppleScript,
Carbon, Cocoa, ColorSync, iTunes,
Logic, Mac, Macintosh, Quartz,
QuickDraw, and QuickTime are
trademarks of Apple Computer, Inc.,
registered in the United States and
other countries.
eMac, Finder, Pages, and Xcode are
trademarks of Apple Computer, Inc.
Objective-C is a trademark of NeXT
Software, Inc.
Java and all Java-based trademarks
are trademarks or registered
trademarks of Sun microsystems, Inc.
in the U.S. and other countries.
OpenGL is a trademark of Silicon
Graphics, Inc.
PowerPC and the PowerPC logo are
trademarks of International Buisiness
Machines Corporation, used under
license therefrom.
Intel® and Pentium® are registered
trademarks of Intel Corporation or its
subsidiaries in the United States and
other countries.
MMX™ is a trademark of Intel
Corporation or its subsidiaries in the
United States and other countries.
Simultaneously published in the
United States and Canada
Even though Apple has reviewed this
manual, APPLE MAKES NO
WARRANTY OR REPRESENTATION,
EITHER EXPRESS OR IMPLIED, WITH
RESPECT TO THIS MANUAL, ITS
QUALITY, ACCURACY,
MERCHANTABILITY, OR FITNESS
FOR A PARTICULAR PURPOSE. AS A
RESULT, THIS MANUAL IS SOLD “AS
IS,” AND YOU, THE PURCHASER, ARE
ASSUMING THE ENTIRE RISK AS TO
ITS QUALITY AND ACCURACY.
IN NO EVENT WILL APPLE BE LIABLE
FOR DIRECT, INDIRECT, SPECIAL,
INCIDENTAL, OR CONSEQUENTIAL
DAMAGES RESULTING FROM ANY
DEFECT OR INACCURACY IN THIS
MANUAL, even if advised of the
possibility of such damages.
THE WARRANTY AND REMEDIES SET
FORTH ABOVE ARE EXCLUSIVE AND
IN LIEU OF ALL OTHERS, ORAL OR
WRITTEN, EXPRESS OR IMPLIED. No
Apple dealer, agent, or employee is
authorized to make any modification,
extension, or addition to this warranty.
Some states do not allow the exclusion or
limitation of implied warranties or
liability for incidental or consequential
damages, so the above limitation or
exclusion may not apply to you. This
warranty gives you specific legal rights,
and you may also have other rights which
vary from state to state.
Contents
Introduction
Introduction to Universal Binary Programming Guidelines 9
Who Should Read This Document 9
Assumptions 9
Conventions 10
Organization of This Document 10
Chapter 1
Building a Universal Binary 13
Build Assumptions 13
Building Your Code 14
Troubleshooting 17
Determining Whether a Binary is Universal 18
Build Options 18
Default Compiler Options 18
Architecture-Specific Options 19
Autoconf Macros 19
See Also 19
Chapter 2
Architectural Differences 21
Alignment 21
Bit Fields 21
Byte Order 21
Calling Conventions 22
Data Type Conversions 22
Data Types 23
Divide-By-Zero Operations 23
Floating-Point Equality Comparisons 23
Structures and Unions 24
See Also 24
Chapter 3
Swapping Bytes 25
Why Byte Ordering Matters 25
Guidelines for Swapping Bytes 27
Byte Swapping Routines 28
Byte Swapping Strategies 28
3
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C O N T E N T S
Constants 29
Custom Apple Event Data 29
Custom Resource Data 30
Floating-Point Values 30
Integers 31
Network-Related Data 32
OSType-to-String Conversions 32
Unicode Text Files 33
Writing and Installing a Callback to Byte Swap Data 34
See Also 40
Chapter 4
Guidelines for Specific Scenarios 41
Aliases 41
Archived Bit Fields 41
Bit Shifting 42
Bit Test, Set, and Clear Functions: Carbon and POSIX 42
Deprecated Functions 43
Disk Partitions 43
Double-Precision Values: Bit-by-Bit Sensitivity 43
Finder Information and Low-Level File System Operations 43
Font-Related Resources 44
GWorlds 44
Java I/O API (NIO) 45
Machine Location Data Structure 45
Metrowerks PowerPlant 45
Multithreading 46
Objective-C: Messages to nil 46
Objective-C Runtime: Low-Level Operations 46
Open Firmware 47
OpenGL 47
OSAtomic Functions 49
Pixel Data 49
QuickDraw Routines 50
QuickTime Components 50
Runtime Code Generation 52
System-Specific Predefined Macros 52
See Also 52
Chapter 5
Preparing Vector-Based Code 53
Accelerate Framework 53
Rewriting AltiVec Instructions 54
Differences Between Instruction Set Architectures 54
The Programming Model 56
Building x86 ISA Code 58
4
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C O N T E N T S
Aligning Data 59
Detecting Vector Unit Availability 63
See Also 66
Appendix A
Rosetta 67
What Can Be Translated? 67
How It Works 68
Special Considerations 68
Forcing an Application to Run Translated 69
Troubleshooting 70
Appendix B
x86 Equivalent Instructions for AltiVec Instructions 73
Intrinsics 73
vec_floor Routine 78
Appendix C
Fast Matrix Multiplication 79
Platform-Specific Code 79
Architecture-Independent Matrix Multiplication 83
Appendix D
Application Binary Interface 85
Data Types and Alignment 85
Stack Structure 86
Parameter Passing 86
Return Values 87
Appendix E
Flipping PowerPlant Resources 89
Document Revision History 105
5
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C O N T E N T S
6
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
Tables, Figures, and Listings
Chapter 1
Building a Universal Binary 13
Figure 1-1
Figure 1-2
Figure 1-3
Table 1-1
Chapter 2
Architectural Differences 21
Listing 2-1
Listing 2-2
Listing 2-3
Chapter 3
Big-endian byte ordering compared to little-endian byte ordering 26
A data structure that contains multibyte and single-byte data 25
Encoding a floating-point value 31
Decoding a floating-point value 31
Swapping a 16-bit integer from big-endian to host-endian 31
Swapping integers from little-endian to host-endian 32
A declaration for a custom resource 36
A flipper function for RGBColor data 36
A flipper for the custom 'PREF' resource 37
Byte order marks 33
Guidelines for Specific Scenarios 41
Listing 4-1
Listing 4-2
Table 4-1
Chapter 5
Code that illustrates byte-ordering differences 22
Architecture-dependent code 22
A union whose components can be affected by byte order 24
Swapping Bytes 25
Figure 3-1
Listing 3-1
Listing 3-2
Listing 3-3
Listing 3-4
Listing 3-5
Listing 3-6
Listing 3-7
Listing 3-8
Table 3-1
Chapter 4
The Build pane 15
Architectures settings 16
The Architecture entry for the Chess application is Intel, PowerPC 18
Default values for compiler flags on a Macintosh using an Intel microprocessor
19
A structure that swaps bit fields 41
Statements to include in the .r file for a component 51
Quartz constants that specify byte ordering 50
Preparing Vector-Based Code 53
Figure 5-1
Figure 5-2
Figure 5-3
Vector elements in memory order compared to register order 55
Misaligned data 59
Bytes that extend off the array 60
7
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
T A B L E S ,
F I G U R E S ,
Figure 5-4
Figure 5-5
Figure 5-6
Figure 5-7
Figure 5-8
Listing 5-1
Listing 5-2
Listing 5-3
Table 5-1
Table 5-2
Table 5-3
Table 5-4
Table 5-5
Table 5-6
Appendix A
Listing A-1
An equivalent routine for vec_floor 78
AltiVec intrinsics and x86 equivalent instructions 73
Platform-specific code needed to support matrix multiplication 79
Architecture-independent code that performs matrix multiplication 83
Application Binary Interface 85
Figure D-1
Table D-1
Appendix E
The Info pane for the Calculator application 70
Rosetta listens for a port connection 71
Terminal windows with the commands for debugging a PowerPC binary on
a Macintosh using an Intel microprocessor 72
A structure whose endian format depends on the architecture 69
Fast Matrix Multiplication 79
Listing C-1
Listing C-2
Appendix D
Misaligned data with unknown data at each end 60
The back-step method for handling unaligned data 61
Loading a scalar into a known location 61
Loading a single-byte or word (16 bits) 62
Using MASKMOVDQU for partial vector stores 63
Code that turns on the FZ and DAZ bits 58
Code that checks for processor-specific features. 64
Code that detects vector unit types 64
Suffixes and the corresponding data type for Intel intrinsics 56
Equivalent SSE2/SSE and AltiVec data types 57
Apple and Intel names for vector data types 57
Header file and instruction set 58
Selectors used to obtain features of a PowerPC processor 63
Selectors used to obtain features of an Intel processor 64
x86 Equivalent Instructions for AltiVec Instructions 73
Listing B-1
Table B-1
Appendix C
L I S T I N G S
Rosetta 67
Figure A-1
Figure A-2
Figure A-3
Appendix B
A N D
Stack frame layout 87
Data types, sizes, and alignment for IA-32 85
Flipping PowerPlant Resources 89
Listing E-1
Code that flips PPob resources 89
8
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
I N T R O D U C T I O N
Introduction to Universal Binary
Programming Guidelines
Universal Binary Programming Guidelines will assist experienced developers in building and modifying
their Mac OS X applications to run as universal binaries. Universal binaries run natively on Macintosh
computers using PowerPC or Intel® microprocessors and deliver optimal performance for both
architectures in a single package.
This document is designed to help developers determine exactly how much work needs to be done
and provides useful tips for general as well as specific code modification scenarios. It is intended to
be used as a reference and not to be read cover to cover. It describes the prerequisites for building
code as a universal binary and shows how to do so using Xcode. It also discusses the differences
between the Intel and PowerPC architectures that can affect code behavior and provides guidelines
for ensuring that your universal binary code builds correctly.
Important: This is a preliminary document for an application binary interface (ABI) in development.
Although this document has been reviewed for technical accuracy, it is not final. Apple Computer is
supplying this information to help developers plan for the adoption of the technologies and
programming interfaces described herein. This information is subject to change, and software
implemented according to this document should be tested with final operating system software and
final documentation. Newer versions of this document may be provided with future seeds of the ABI.
For information about updates to this and other developer documentation, view the New & Updated
sidebars in subsequent seeds of the ADC Reference Library.
Who Should Read This Document
Any developer who currently has an application that runs in Mac OS X will want to read this document
to learn how to modify their code so that it runs natively on all current Apple hardware. Developers
who have not yet written an application for the Macintosh, but are planning to do so, will want to
follow the guidelines in the document to ensure that their code can run as a universal binary.
Assumptions
The document assumes the following:
■
Your application runs in Mac OS X.
Who Should Read This Document
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
9
I N T R O D U C T I O N
Introduction to Universal Binary Programming Guidelines
Your application can use any of the Mac OS X development environments: Carbon, Cocoa, Java,
or BSD UNIX.
If your application runs in a version of the Mac OS that is earlier than Mac OS X version 10.0, you
should first read Carbon Porting Guide and Technical Note TN2003 Moving Your Code to Mac OS
X.
If your application runs in the UNIX operating system but not specifically in Mac OS X, you
should first read Porting UNIX/Linux Applications to Mac OS X.
If your application runs only in the Windows operating system, you should first read Porting to
Mac OS X from Windows Win32 API.
■
You know how to use Xcode.
Currently Xcode is the only GUI tool available that compiles code to run universally.
If you are unfamiliar with Xcode, you might want to take a look at Xcode 2.1 User Guide.
If you have been using CodeWarrior, you should read Moving Projects from CodeWarrior to Xcode.
Conventions
The term x86 is a generic term used throughout this book to refer to the class of microprocessors
manufactured by Intel. This book uses the term x86 as a synonym for IA-32 (Intel Architecture 32-bit).
Organization of This Document
This document is organized into the following chapters:
10
■
“Building a Universal Binary” (page 13) shows how to use Xcode to build native and universal
binaries, describes build options, and provides troubleshooting information for code that doesn’t
run properly on the x86 architecture.
■
“Architectural Differences” (page 21) outlines the major differences between the x86 and the
PowerPC architectures. Understanding the differences will help you to write portable code.
■
“Swapping Bytes” (page 25) describes byte ordering differences in detail, provides a list of byte
swapping routines, and discusses strategies for a number of scenarios that require byte swapping.
This is a must-read chapter for all Mac OS X developers. It will help you understand how to avoid
byte-ordering issues when transferring data and data files between architectures.
■
“Guidelines for Specific Scenarios” (page 41) contains tips for a variety of situations that are not
common to most applications.
■
“Preparing Vector-Based Code” (page 53) describes the Accelerate framework and provides
guidelines for rewriting AltiVec instructions for the Intel instruction set architecture.
■
“Rosetta” (page 67) describes the translation process that allows PowerPC binaries to run on a
Macintosh that uses the x86 architecture.
■
“x86 Equivalent Instructions for AltiVec Instructions” (page 73) lists C intrinsic routines that are
equivalent between the two architectures.
Conventions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
I N T R O D U C T I O N
Introduction to Universal Binary Programming Guidelines
■
“Fast Matrix Multiplication” (page 79) uses matrix multiplication as an example to show how to
write vector code with a minimum amount of architecture-specific coding.
■
“Application Binary Interface” (page 85) describes those portions of the Macintosh version of
the IA-32 ABI that are different from the System V IA-32 ABI.
■
“Flipping PowerPlant Resources” (page 89) lists code that implements byte swapping code for
handling PPob resources.
Organization of This Document
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
11
I N T R O D U C T I O N
Introduction to Universal Binary Programming Guidelines
12
Organization of This Document
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
1
Building a Universal Binary
Architectural differences between Macintosh computers using Intel and PowerPC microprocessors
can cause existing PowerPC code to behave differently when built and run natively on an Intel
microprocessor. The extent to which architectural differences affect your code depends on the level
of your source code. Most existing code is high-level source code that is not specific to the processor.
If your application falls into this category, you’ll find that creating a universal binary involves adjusting
code in a few places. Cocoa developers may need to make fewer adjustments than Carbon developers
whose code was ported from Mac OS 9 to Mac OS X.
Most code that uses high-level frameworks and builds with GCC 4.0 in Mac OS X v10.4 will build
with few, if any, changes on a Macintosh using an Intel microprocessor. The best approach for any
developer in that situation is to build the existing code on a Macintosh using an Intel microprocessor,
run the native x86 binary, and see how the application runs. Find the places where the code doesn’t
behave as expected and then consult the sections in this document that cover those issues.
Developers who use AltiVec instructions in their code or who intentionally exploit architectural
differences for optimization or other purposes, will need to make the most code adjustments. These
developers will probably want to consult the rest of this document before building a universal binary.
AltiVec programmers should read “Preparing Vector-Based Code” (page 53) and consult “x86
Equivalent Instructions for AltiVec Instructions” (page 73).
This chapter describes how to use Xcode version 2.1 to create a universal binary, provides
troubleshooting information, and lists relevant build options. You’ll find that the software development
workflow on a Macintosh using an Intel microprocessor is exactly the same as the software
development workflow on a Macintosh using a PowerPC microprocessor.
Build Assumptions
Before you build your code as a universal binary, you must ensure that:
■
Your application already builds for Mac OS X. Your application can use any of the Mac OS X
development environments: Carbon, Cocoa, Java, or BSD UNIX.
■
Your application uses the Mach-O executable format. Mach-O binaries are the only type of binary
that run natively on a Macintosh using an Intel microprocessor. If you are already using the Xcode
compilers and linkers, your application is a Mach–O binary. Carbon applications based on the
Code Fragment Manager Preferred Executable Format (PEF) must be changed to Mach-O.
Build Assumptions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
13
C H A P T E R
1
Building a Universal Binary
■
Your code project is ported to GCC 4.0. Xcode uses GCC 4.0 for targeting x86. You may want to
look at the document Porting to GCC 4.0 Release Notes to assess whether you need to make any
changes to your code to allow it to use GCC 4.0.
■
You installed the Mac OS X v10.4 universal SDK. The installer places the SDK in this location:
/Developer/SDKs/MacOSX10.4u.sdk
Building Your Code
If you have already been using Xcode to build applications on a Macintosh using a PowerPC
microprocessor, you’ll see that building your code on a Macintosh using an Intel microprocessor is
accomplished in the same way. By default, Xcode 2.1 compiles code to run on the architecture on
which you build your Xcode project.
When you are in the process of developing your project, you’ll want to use the following settings for
the Default and Debug configurations:
■
Keep the Architectures settings set to $(NATIVE_ARCH).
■
Change the Mac OS X Deployment Target settings to Mac OS X 10.4.
■
Make sure the SDKROOT settings is /Developer/SDKs/MacOSX10.4u.sdk.
You can set the SDK root for the project by following these steps:
1.
Open your project in Xcode version 2.1 or later.
2.
In the Groups & Files list, click the project name.
3.
Click the Info button to open the Info window.
4.
In the General pane, under the Cross-Develop Using Target SDK setting, click Choose.
5.
Click Change in the sheet that appears.
6.
Choose MacOSX10.4u.sdk and click Choose.
The Debug build configuration turns on ZeroLink, Fix and Continue, and debug-symbol generation,
among other settings, and turns off code optimization. Keep in mind that you can’t run a debug binary
built on one architecture on the other architecture.
When you are ready to test your application on both architectures, you’ll want to use the Release
configuration. This configuration turns off ZeroLink and Fix and Continue. It also sets the
code-optimization level to its highest setting by default. As with the Default and Debug configurations,
you’ll want to set the Mac OS X Deployment Target to Mac OS X 10.4 and the SDK root to
MacOSX10.4u.sdk. To build a universal binary, the Architectures setting for the Release configuration
must be set to build on Intel and PowerPC.
You can change the Architectures setting by following these steps:
1.
14
Open your project in Xcode version 2.1 or later.
Building Your Code
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
1
Building a Universal Binary
2.
In the Groups & Files list, click the project name.
3.
Click the Info button to open the Info window.
4.
In the Build pane (see Figure 1-1), choose Release from the Configuration pop-up menu.
Figure 1-1
The Build pane
5.
Scroll until you see the Architectures setting. Select it and click Edit.
6.
In the sheet that appears, select the PowerPC and Intel options, as shown in Figure 1-2.
Building Your Code
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
15
C H A P T E R
1
Building a Universal Binary
Figure 1-2
Architectures settings
7.
Close the Info window.
8.
Build and run the project.
If your application does not behave as expected when you run it as a native binary on a Macintosh
using an Intel microprocessor, see “Troubleshooting” (page 17).
If your application behaves as expected, don’t assume that it also works on the other architecture.
You need to test your application on Macintoshes using both PowerPC and Intel microprocessors. If
your application reads data from and writes data to disk, you should make sure that you can save
files on one architecture and open them on the other.
Note: Xcode has per-architecure SDK support. For example, you can target Mac OS X v10.3 for
PowerPC while also targeting Mac OS X v10.4.1 for Intel.
For information on default compiler settings, architecture-specific options, and autoconf macros, see
“Build Options” (page 18).
16
Building Your Code
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
1
Building a Universal Binary
For information on building with version-specific SDKs for PowerPC (Mac OS X v10.3, v10.2, and so
forth) while still building for a Macintosh using an Intel microprocessor, see Using Cross Development
in Xcode, in the Xcode 2.0 User Guide.
Troubleshooting
The most typical behavior problems you’ll observe when your application runs natively on a Macintosh
using an Intel microprocessor are:
■
The application crashes.
■
Unexpected numerical results.
■
Incorrectly displayed color.
■
Text is not displayed properly—characters from the Last Resort font or unexpected Chinese or
Japanese characters appear.
■
Files are not read or written correctly.
■
Network communication does not work properly.
The first two problems in the list are typically caused by architecture-dependent code. On a Macintosh
using an Intel microprocessor, an integer divide-by-zero exception results in a crash, but on a Macintosh
using PowerPC the same operation returns zero. In these cases, the code must be rewritten in an
architecture-independent manner. “Architectural Differences” (page 21) discusses the major differences
in architecture between Macintosh computers using Intel microprocessors and those using PowerPC
microprocessors. That chapter can help you determine which code is causing the crash or the
unexpected numerical results.
The last four problems in the list are most often caused by byte-ordering differences between
architectures. These problems are easily remedied by taking the byte order into account when you
read and write data. The strategies available for handling byte ordering, as well as an in-depth
discussion of byte-ordering differences, are provided in “Swapping Bytes” (page 25). Keep in mind
that Mac OS X ensures that byte-ordering is correct for anything it is responsible for. Apple-defined
resources (such as menus) won’t result in problem behavior. Custom resources provided by your
application can result in problem behavior.
Apple engineers prepared a lot of code to run natively on Macintosh computers using Intel
microprocessors—including the operating system, most Apple applications, and Apple tools. The
guidelines in this book are the result of their work. In addition to the more common issues discussed
in “Architectural Differences” (page 21) and “Swapping Bytes” (page 25), the engineers identified
a number of narrowly-focused issues. These are described in “Guidelines for Specific
Scenarios” (page 41). You will want to at least glance at this chapter to see if your code can benefit
from any of the information.
Troubleshooting
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
17
C H A P T E R
1
Building a Universal Binary
Determining Whether a Binary is Universal
You can determine whether an application has a universal binary by clicking the application icon and
pressing Cmd-I. In the More Info section of the Info pane for the application, the Architecture entry
(see Figure 1-3) lists whether the binary is Intel, PowerPC, or both Intel and PowerPC.
Figure 1-3
The Architecture entry for the Chess application is Intel, PowerPC
On a Macintosh using an Intel microprocessor, when you double-click an application that doesn’t
have an executable for the native architecture, it might launch. Whether or not it launches depends
on how compatible the application is with Rosetta. For more information, see “Rosetta” (page 67).
Build Options
This section contains information on the build options that you need to be aware of when using Xcode
2.1 on a Macintosh using an Intel microprocessor. It lists the default compiler options, discusses how
to set architecture-specific options, and provides information on using autoconf macros.
Default Compiler Options
In Xcode 2.1 on a Macintosh using an Intel microprocessor, the defaults for compiler flags that differ
from standard GCC distributions are listed in Table 1-1.
18
Determining Whether a Binary is Universal
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
1
Building a Universal Binary
Table 1-1
Default values for compiler flags on a Macintosh using an Intel microprocessor
Compiler Flag Default Value
Specifies to
-mfpmath
sse
Use SSE instructions for floating-point math.
-msse2
On by default Enable the MMX™, SSE, and SSE2 extensions in the Intel instruction
set architecture.
Architecture-Specific Options
Most developers don’t need to use architecture-specific setting for their projects.
In Xcode, to set one flag for x86 and another for PowerPC, you use the PER_ARCH_CFLAGS_i386 and
PER_ARCH_CFLAGS_ppc build settings variables to supply the architecture-specific settings.
For example to supply -faltivec and -msse3, you would add the following build settings:
PER_ARCH_CFLAGS_i386 = -msse3
PER_ARCH_CFLAGS_ppc = -faltivec
Similarly, you can supply architecture-specific linker flags using the OTHER_LDFLAGS_i386 and
OTHER_LDFLAGS_ppc build settings variables.
You can pass the -arch flag to gcc, ld, and as. The allowable values are i386 and ppc. You can specify
both flags as follows:
-arch ppc -arch i386
Autoconf Macros
If you are compiling a project that uses autoconf and trying to build it for both the PowerPC and x86
architectures, you need to make sure that when the project configures itself, it doesn't use autoconf
macros to determine the endian type of the runtime system. For example, if your project uses the
autoconf AC_C_BIGENDIAN macro, the program won't work correctly when it is run on the opposite
architecture from the one you are targeting when you configure the project. To correctly build for
both PowerPC and x86 architectures, use the compiler-defined __BIG_ENDIAN__ and
__LITTLE_ENDIAN__ macros in your code.
See Also
These resources provide information related to compiling and building applications, and measuring
performance:
■
Xcode 2.0 User Guide contains all the instructions needed to compile and debug any type of Xcode
project (C, C++, Objective C, Java, AppleScript, resource, nib files, and so forth).
■
GNU C/C++/Objective-C 3.3 Compiler provides details about the GCC implementation. Xcode uses
the GNU compiler collection (GCC) to compile code.
See Also
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
19
C H A P T E R
1
Building a Universal Binary
The assembler (as) used by Xcode supports AT&T System V/386 assembler syntax in order to
maintain compatibility with the output from GCC. The AT&T syntax is quite different from Intel
syntax. The major differences are discussed in the GNU documentation.
20
■
Performance tools. Shark, MallocDebug, ObjectAlloc, Sampler, Quartz Debug, Thread Viewer,
and other Apple-developed tools (some command-line, others use a GUI) in the /Developer
directory. Command-line performance tools are in the /usr/bin directory.
■
Code Size Performance Guidelines and Code Speed Performance Guidelines discuss optimization
strategies for a Mach-O executable.
See Also
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
2
Architectural Differences
The PowerPC and the x86 architectures have some fundamental differences that can prevent code
compiled for one architecture from running properly on the other architecture. The extent to which
you need to change your PowerPC code so that it runs natively on a Macintosh using an Intel
microprocessor depends on how much of your code is processor specific. This chapter describes the
major differences between architectures, organized alphabetically by topic. You can use the information
to identify the parts of your code that are likely to be problematic.
Alignment
All PowerPC instructions 4 bytes in size and must be 4-byte aligned. x86 instructions are variable in
size (from 1 to >10 bytes) and as a consequence do not need to be aligned.
Bit Fields
The value of a signed, 1-bit bit field is either 0, 1, or –1, depending on the compiler, architecture,
optimization, level, and so forth. Code that compares the value of a bit field to 1 may not work if the
bit field is signed, so you will want to use unsigned 1-bit bit fields. Keep in mind that the order of bit
fields in memory can be reversed between architectures.
For more information on issues related to endian format, see “Swapping Bytes” (page 25). See also
“Archived Bit Fields” (page 41) and “Structures and Unions” (page 24).
Byte Order
Microprocessor architectures commonly use two different byte ordering methods (little-endian and
big-endian) to store the individual bytes of multibyte data formats in memory. This difference becomes
critically important if you try to read data from files that were created on a computer that uses a
different byte ordering than yours. You also need to consider byte ordering when you send and
receive data through a network connection and handle networking data. The difference in byte
ordering can produce incorrect results if you do not account for this difference. For example, the order
of bytes in memory of a scalar type is architecture-dependent, as shown in Listing 2-1 (page 22).
Alignment
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
21
C H A P T E R
2
Architectural Differences
Listing 2-1
Code that illustrates byte-ordering differences
unsigned char charVal;
unsigned long value = 0x12345678;
unsigned long *ptr = &value;
charVal = *(unsigned char*)ptr;
On a processor that uses little-endian addressing the variable charVal takes on the value 0x78. On a
processor that uses big-endian addressing the variable charVal takes on the value 0x12.
charVal = (unsigned char)*ptr;
For a detailed discussion of byte ordering and strategies that you can use to account for byte ordering
differences, see “Swapping Bytes” (page 25).
Calling Conventions
The x86 C-language calling convention (application binary interface, or ABI) specifies that arguments
to functions are passed on the stack. The PowerPC ABI specifies that arguments to functions are
passed in registers. Also, x86 has far fewer registers, so many local variables use the stack for their
storage. Thus, programming errors, or other operations that access past the end of a local variable
array or otherwise incorrectly manipulate values on the stack may be more likely to crash applications
on x86 systems than on PowerPC.
For information on the IA-32 ABI, see “Application Binary Interface” (page 85).
Data Type Conversions
The PowerPC and x86 architectures perform some data type conversions differently, such as casting
a string to a long and converting a floating-point type to an integer type. When the system converts
a floating-point type to an integer type, it discards the fractional part of the value. The behavior is
undefined if the value of the integral part cannot be represented by the integer type.
Listing 2-2 shows an example of the sort of code that is architecture-dependent. You would need to
modify this code to make it architecture-independent. On a PowerPC microprocessor, the variable x
shown in the listing is equal to 7fffffff or INTMAX. On an x86 microprocessor, the variable x is equal
to 80000000 or INTMIN.
Listing 2-2
Architecture-dependent code
int main (int argc, const char * argv[])
{
double a;
int
x;
a = 5000000.0 * 6709000.5;
x = a;
printf("x = %08x \n",x);
return 0;
// or any really big value
}
22
Calling Conventions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
2
Architectural Differences
Data Types
A long double is 16 bytes on both architectures, but on x86, only 80 bits are significant.
A bool data type is a single byte on an x86 system, but four bytes on PowerPC. This size difference
can cause alignment problems. You should use fixed-size data types to avoid alignment problems.
(The bool data type is not the Carbon Boolean type, which is a fixed size of 1 byte.)
Existing document formats that include the bool data type as part of a data structure that is written
directly to disk can be problematic because the data structure might not be laid out the same on both
architectures. If the data structure definition is updated to use the UInt32 data type or another
fixed-size four-byte data type, the structure should then be portable, although values need to be
byte-swapped as appropriate.
Divide-By-Zero Operations
An integer divide-by-zero is fatal on an x86 system, and continues on a PowerPC system, where it
returns zero. (A floating point divide-by-zero behaves the same on both architectures.) If you get a
crash log that mentions EXC_I386_DIV (divide by zero), your program divided by zero. Mod
operations perform a divide, so a mod-by-zero operation produces a divide-by-zero exception. To
fix a divide-by-zero exception, find the place in your program corresponding to that operation. Then
add code that checks for a denominator of zero before performing the divide operation.
For example, change this:
int a = b % c;
// Divide by zero can happen here;
to this:
int a;
if (c != 0) {
a = b % c;
} else {
a = 0;
}
Floating-Point Equality Comparisons
The results of a floating-point equality comparison are architecture-dependent. Whether the comparison
works depends on a number of things, including the compiler, the surrounding code, all compiler
flags in use (particularly optimization flags), and the current floating-point mode for the thread. If
your floating point comparison is currently working on PowerPC, you may need to inspect it on x86.
You can use the GCC flag -Wfloat-equal to receive a warning for floating-point equality comparisons.
Data Types
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
23
C H A P T E R
2
Architectural Differences
Structures and Unions
The fields in a structure can be sensitive to their defined order. Structures must either be properly
ordered or accessed by the field name directly.
When a union has components that could be affected by byte order, use a form similar to that shown
in Listing 2-3. Code that sets wch and then reads hi and lo as the high and low bytes of wch will work
correctly. The same is true for the reverse direction. Code that sets hi and low and then reads wch
will get the same value on both architectures. For another example, see the WideChar union that’s
defined in the IntlResources.h header file.
Listing 2-3
A union whose components can be affected by byte order
union WChar{
unsigned short wch;
struct {
#if __BIG_ENDIAN__
unsigned char hi;
unsigned char lo;
#else
unsigned char lo;
unsigned char hi;
#endif
} s;
}
See Also
The ISO standard for the C programming language—ISO/IEC 9899—is a valuable reference that you
can use to investigate code portability issues, many of which may not be immediately obvious. You
can find this reference in a number of locations on the web, including:
http://www.iso.org/
24
Structures and Unions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
3
Swapping Bytes
Two different byte-ordering methods (or endian formats) exist in the world of computing. An endian
format specifies how to store the individual bytes of multibyte numerical data in memory. Big-endian
byte ordering specifies to store multibyte data with its most significant byte first. Little-endian byte
ordering specifies to store multibyte data with its least significant byte first. The PowerPC processor
uses big-endian byte ordering. The x86 processor family uses little-endian byte ordering. By convention,
multibyte data sent over the network uses big-endian byte ordering.
If your application assumes that data is in one endian format, but the data is actually in another, then
it will interpret the data incorrectly. You will want to analyze your code for routines that read multibyte
data (16 bits, 32 bits, or 64 bits) from, or write multibyte data to, disk or to the network, as these
routines are sensitive to byte ordering format. There are two general approaches for handling byte
ordering differences: swap bytes when necessary or use XML or another byte-order-independent
data format such as those offered by Core Foundation (CFPreferences, CFPropertyList, CFXMLParser).
Whether you should swap bytes or use a byte-order-independent data format depends on how you
use the data in your application. If you have an existing file format to support, the binary-compatible
solution is to accept the big-endian file format you have been using in your application, and write
byte swapping code to use when the file is read or written on x86. If you don’t have legacy files to
support, you could consider redesigning your file format to use XML (extended markup language),
XDR (external data representation), or NSCoding (Objective C) to represent data.
This chapter describes why byte ordering matters, gives guidelines for swapping bytes, describes the
byte swapping APIs available in Mac OS X, and provides solutions for most of the situations where
byte ordering matters.
Why Byte Ordering Matters
The example in this section is designed to show you, in more detail, why byte ordering matters. Take
a look at the C data structure defined in Listing 3-1. It contains a four-byte integer, a character string,
and a two-byte integer. The listing also initializes the structure.
Listing 3-1
A data structure that contains multibyte and single-byte data
typedef struct {
uint32_t myOptions;
char
myStringArray [7];
short
myVariable;
} myDataStructure;
Why Byte Ordering Matters
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
25
C H A P T E R
3
Swapping Bytes
myDataStructure aStruct;
aStruct.myOptions = 0xfeedface;
strcpy(aStruct.myStringArray, "safari");
aStruct.myVariable = 0x1234;
Compare (see Figure 3-1) how this data structure is stored in memory on big-endian and little-endian
systems. In a big-endian system, memory is organized with the address of each data byte increasing
from most significant to least significant. In a little-endian system, memory is organized with the
address of each data byte increasing from the least significant to the most significant.
Figure 3-1
Big-endian byte ordering compared to little-endian byte ordering
Big-endian
Data
Address
Little-endian
Data
Address
0x00000000
fe
0x00000000
ce
0x00000001
ed
0x00000001
fa
0x00000002
fa
0x00000002
ed
0x00000003
ce
0x00000003
fe
0x00000004
's'
0x00000004
's'
0x00000005
'a'
0x00000005
'a'
0x00000006
0x00000006
'f'
0x00000007
'f'
'a'
0x00000007
'a'
0x00000008
'r'
0x00000008
'r'
0x00000009
'i'
0x00000009
'i'
0x0000000A
\0
0x0000000A
\0
0x0000000B
*
0x0000000B
*
0x0000000C
12
0x0000000C
34
0x0000000D
34
0x0000000D
12
0x0000000E
*
*
0x0000000E
*
*
0x0000000F
0x0000000F
Padding bytes used to
maintain alignment
As you look at Figure 3-1, note the following:
■
26
Multibyte data, such as the 32-bit and 16-bit variables shown in the figure, are stored differently
between big-endian and little-endian systems. As you can see in the figure, big-endian systems
store data in memory so that the most significant byte of the data is stored in the address with
the lowest value. Little-endian systems store data in memory so that the most significant byte of
the data is in the address with the highest value. Hence, the least significant byte of the myOptions
variable (0xce) is stored in memory location 0x00000003 on the big-endian system while it is
stored in memory location 0x00000000 on the little-endian system.
Why Byte Ordering Matters
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
3
Swapping Bytes
■
Single-byte data, such as the char values in the myStringArray character array, are stored in the
same memory location on either system regardless of the byte ordering format of the system.
■
Each system pads bytes to maintain four-byte data alignment. Padded bytes in the figure are
designated by a shaded box that contains an asterisk.
The byte ordering of multibyte data in memory matters if you are reading data written on one
architecture from a system that uses a different architecture and you access the data on a byte-by-byte
basis. For example, if your application is written to access the second byte of the myOptions variable,
then when you read the data from a system that uses the opposite byte ordering scheme, you'll end
up retrieving the first byte of the myOptions variable instead of the second one.
Suppose the example data values that are initialized by the code shown in Listing 3-1 (page 25) are
generated on a little-endian system and saved to disk. Assume that the data is written to disk in
byte-address order. When read from disk by a big-endian system, the data is again laid out in memory
as shown in Figure 3-1 (page 26). The problem is that the data is still in little-endian byte order even
though it is interpreted on a big-endian system. This difference causes the values to be evaluated
incorrectly. In this example, the value of the field myOptions should be 0xfeedface, but because of
the incorrect byte ordering it is evaluated as 0xcefaedfe.
Note: The terms big-endian and little-endian come from Jonathan Swift’s eighteenth-century satire
Gulliver’s Travels. The subjects of the empire of Blefuscu were divided into two factions: those who
ate eggs starting from the big end and those who ate eggs starting from the little end.
Guidelines for Swapping Bytes
The following guidelines, along with the strategies provided later in this chapter, will help ensure
optimal byte swapping code in your application.
■
Keep data structures in native byte-order while in memory. Only byte swap when you read data
from disk or write it to disk.
■
When possible, let the compiler do the work for you. For example, when you use function calls
such as the Core Foundation function CFSwapInt16BigToHost, the compiler determines whether
the function call does something for the processor you are targeting. If the code does nothing,
the compiler won’t call the function. Letting the compiler do the work is more efficient than for
you to use #ifdef statements.
■
If you must access a large file, consider arranging the data in such a way that limits the byte
swapping that you must perform. For example, you can arrange the most frequently accessed
data contiguously in the file. Then, you need to read and swap bytes only for that chunk of data
instead of for the entire data file.
■
Use the __BIG_ENDIAN__ and __LITTLE_ENDIAN__ macros only if you must. Do not use macros
that check for a specific processor type, such as __i386__ and __ppc__.
■
Choose a consistent byte-order approach and stick with it. That is, if you are reading and writing
data from disk on a regular basis, choose the endian format you want to use. This eliminates the
need for to you check the byte ordering of the data, and then to possibly have to swap the byte
order.
Guidelines for Swapping Bytes
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
27
C H A P T E R
3
Swapping Bytes
■
Be aware of which functions return big-endian data, and treat the data appropriately. These
include some of the DNSServiceDiscovery functions (port is in network byte order), and the
ColorSync profile functions (all data is big-endian). The IconFamilyElement and
IconFamilyResource data types (which also include the data types IconFamilyPtr and
IconFamilyHandle ) are always big-endian. There may be other functions and data types that
are not listed here. Consult the appropriate API reference for information on data returned by a
function.
■
Keep in mind that byte swapping comes at a performance cost so swap bytes only when absolutely
necessary.
Byte Swapping Routines
The APIs that provide byte swapping routines are listed below. For most situations it’s best to use
the routines that match the framework you’re programming in. The Core Foundation and Foundation
APIs have functions for swapping floating-point values, while the other APIs listed do not.
■
POSIX (Portable Operating System Interface) byte ordering functions (ntohl, htonl, ntohs, and
htons) are documented in man pages, which can be viewed using Terminal or in Xcode.
■
Darwin byte ordering functions and macros are defined in the header file
<libkern/OSByteOrder.h>. Even though this header is in libkern, it is acceptable to use it from
high-level applications.
■
Core Foundation byte-order functions are defined in the header file
<CoreFoundation/CFByteOrder.h> and described in the Byte-Order Utilities Reference. For details
on using these functions, see the Byte Swapping article in Memory Management.
■
Foundation byte-order functions are defined in the <Foundation/NSByteOrder.h> header file
and described in Foundation Reference for Objective-C.
■
The Core Endian API is defined in the <CarbonCore/Endian.h> header file. The byte swapping
functions available in this header are described in the QuickTime reference documentation. The
function names all begin with the prefix Endian. You can locate the descriptions of the functions
by using the Alphabetical Index of Functions available in QuickTime API Reference area.
Note: When you use byte swapping routines, the compiler optimizes your code so that the routines
are executed only if they are needed for the architecture on which your code is running.
Byte Swapping Strategies
The strategy for swapping bytes depends on the format of the data; there is no universal routine that
can take care of all byte ordering differences. Single-byte character strings don’t get swapped at all,
long words get swapped four bytes end-for-end, words get swapped two bytes end-for-end. Any
program that needs to swap data must know the data type, the source data endian order, and the
host endian order.
This section lists byte swapping strategies, organized alphabetically, for the following data:
28
Byte Swapping Routines
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
3
Swapping Bytes
■
“Constants” (page 29)
■
“Custom Apple Event Data” (page 29)
■
“Custom Resource Data” (page 30)
■
“Floating-Point Values” (page 30)
■
“Integers” (page 31)
■
“Network-Related Data” (page 32)
■
“OSType-to-String Conversions” (page 32)
■
“Unicode Text Files” (page 33)
Constants
Constants that are part of a compiled executable are in host byte order. You need to swap constants
only if they are part of data that is not maintained natively or travels between hosts. In most cases
you can either swap bytes ahead of time or let the preprocessor perform any needed math by using
shifts or other simple operators.
If you are defining and populating a structure that must use data of a specific endian format in
memory, use the OSSwapConst macros defined in the libkern/OSByteOrder.h header file. These
macros can be used from high-level applications.
Custom Apple Event Data
An Apple event is a high-level event that conforms to the Apple Event Interprocess Messaging
Protocol. The Apple Event Manager sends Apple events between applications on the same computer
or between applications on remote computers. You can define your own Apple event data types, and
send and receive Apple events using the Apple Event Manager API.
Mac OS X manages system-defined Apple event data types for you, handling them appropriately for
the currently executing code. You don't need to perform any special tasks. When the data that your
application extracts from an Apple event is system-defined, the system swaps the data for you before
giving the event to your application to process. You will want to treat system-defined data types from
Apple events as native endian. Similarly, if you put native-endian data into an Apple Event that you
are sending, and it is a system-defined data type, the receiver will be able to interpret the data in its
own native endian format.
However, you must account for byte ordering differences for the custom Apple event data types that
you define. You can accomplish this in one of the following ways:
■
Write a byte swapping callback routine (also known as a flipper) and provide it to the system.
Whenever the system determines that your Apple event data needs to be byte swapped it invokes
your flipper to ensure that the recipient of the data gets the data in the correct endian format. For
details, see “Writing and Installing a Callback to Byte Swap Data” (page 34).
■
Choose one endian format to use, regardless of architecture. Then, when you read or write your
custom Apple event data, use big-to-host and host-to-big routines, such as the Core Foundation
Byte Order Utilities CFSwapInt16BigToHost and CFSwapInt16HostToBig.
Byte Swapping Strategies
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
29
C H A P T E R
3
Swapping Bytes
Custom Resource Data
In Mac OS X, the preferred way to supply resources is to provide files in your application bundle that
define resources such as image files, sounds, localized text, and archived user-interface definitions.
The resource data types discussed in this section are those defined in Resource Manager-style files
supported by Carbon. The Resource Manager was created prior to Mac OS X. If your application uses
Resource Manager-style resource files, you should consider moving towards Mac OS X-style resources
in your application bundle instead.
Resources typically include data that describes menus, windows, controls, dialogs, sounds, fonts,
and icons. Although the system defines a number of standard resource types (such as 'movv', used
to specify a QuickTime movie and 'MENU', used to define menus) you can also create your own private
resource types for use in your application. You use the Resource Manager API to define resource data
types and to get and set resource data.
Mac OS X keeps track of resources in memory and allows your application to read or write resources.
Applications and system software interpret the data for a resource according to its resource type.
Although you'll typically let the operating system read resources for you (such as your application
icon), you can also call Resource Manager functions directly to read and write resources.
Mac OS X manages the system-defined resources for you, handling them appropriately for the currently
executing code. That is, if your application runs on an x86 system, Mac OS X swaps bytes so that your
application icon, menus, and other standard resources appear correctly. You don't need to perform
any special tasks. But if you define your own private resource data types for use in your application,
you need to account for byte ordering differences between architectures when you read or write
resource data from disk.
You can use either of the following strategies to handle custom Resource Manager-style resource
data. Notice that these are the same strategies used to handle custom Apple event data:
■
Provide a byte swapping callback routine for the system to invoke whenever the system determines
your resource data must be byte swapped. For details, see “Writing and Installing a Callback to
Byte Swap Data” (page 34).
■
Always write your data using the same endian format, regardless of the architecture. Then, when
you read or write your custom resource data, use big-to-host and host-to-big routines, such as
the Core Foundation Byte Order Utilities CFSwapInt16BigToHost and CFSwapInt16HostToBig.
Note: If you are revising old code that marks resources with a preload bit, you should remove the
preload bit from any resources that must be byte swapped. In Mac OS X, the preload bit is almost
always unnecessary. If you cannot remove the preload bit, you should swap the resource data after
you read the resource. You will not be able to use a flipper callback to byte swap automatcally because
in Mac OS X a preload bit causes the resources to be read before any of the application code runs.
Floating-Point Values
Core Foundation defines a set of functions and two special data types to help you work with
floating-point values. These functions allow you to encode 32- and 64-bit floating-point values in such
a way that they can later be decoded and byte swapped if necessary. Listing 3-2 shows you how to
encode a 64-bit floating-point number and Listing 3-3 shows how to decode it.
30
Byte Swapping Strategies
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
3
Swapping Bytes
Listing 3-2
Encoding a floating-point value
Float64
myFloat64;
CFSwappedFloat64
swappedFloat;
// Encode the floating-point value.
swappedFloat = CFConvertFloat64HostToSwapped(myFloat64);
The data types CFSwappedFloat32 and CFSwappedFloat64 contain floating-point values in a canonical
representation. A CFSwappedFloat data type is not itself a floating-point value, and should not be
directly used as one. You can however send one to another process, save it to disk, or send it over a
network. Because the format is converted to and from the canonical format by the conversion functions,
there is no need for explicit swapping. Byte swapping is taken care of for you during the format
conversion if necessary.
Listing 3-3
Decoding a floating-point value
Float64
myFloat64;
CFSwappedFloat64
swappedFloat;
// Decode the floating-point value.
myFloat64 = CFConvertFloat64SwappedToHost(swappedFloat);
The NSByteOrder.h header file defines functions that are comparable to the Core Foundation functions
discussed here.
Integers
The system library byte-access functions, such as OSReadLittleInt16 and OSWriteLittleInt16,
provide generic byte swapping. These functions perform byte swapping if the native endian format
is different from the endian format of the destination. They are defined in the libkern/OSByteOrder.h
header file.
Note: The OSReadXXX and OSWriteXXX functions provide higher performance than the OSSwapXXX
functions or any other functions in the higher-level frameworks.
Core Foundation provides three optimized primitive functions for byte swapping— CFSwapInt16,
CFSwapInt32, and CFSwapInt64. All of the other swapping functions use these primitives to accomplish
their work. In general you don’t need to use these primitives directly.
Although the primitive swapping functions swap unconditionally, the higher-level swapping functions
are defined in such a way that they do nothing when a byte swap is not required—in other words,
when the source and host byte orders are the same. For the integer types, these functions take the
forms CFSwapXXXBigToHost and CFSwapXXXLittleToHost, CFSwapXXXHostToBig, and
CFSwapXXXHostToLittle where XXX is a data type such as Int32. For example, on a little-endian
machine you use the function CFSwapInt16BigToHost to read a 16-bit integer value from a network
whose data is in network byte order (big-endian). Listing 3-4 demonstrates this process.
Listing 3-4
Swapping a 16-bit integer from big-endian to host-endian
SInt16 bigEndian16;
SInt16 swapped16;
// Swap a 16-bit value read from the network.
swapped16 = CFSwapInt16BigToHost(bigEndian16);
Byte Swapping Strategies
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
31
C H A P T E R
3
Swapping Bytes
Suppose the integers are in the fields of a data structure. Listing 3-5 demonstrates how to accomplish
the byte swapping.
Listing 3-5
Swapping integers from little-endian to host-endian
// Byte swap the values if necessary.
aStruct.int1 = CFSwapInt32LittleToHost(aStruct.int1)
aStruct.int2 = CFSwapInt32LittleToHost(aStruct.int2)
The byte swapping code swaps bytes only if necessary. If the host is a big-endian architecture, the
functions used in the code sample swap the bytes in each field. The byte swapping code does nothing
when run on a little-endian machine—the compiler optimizes out the code.
Network-Related Data
Network-related data (IP addresses, port numbers, and so forth) typically uses big-endian format
(also known as network byte order) so you may need to swap bytes when communicating between
the network and an x86 system. You probably never had to adjust your PowerPC code when you
transmitted data to, or received data from, the network. On x86, you must look closely at your
networking code and ensure that you always send network-related data in the appropriate byte order.
You must also handle data received from the network appropriately, byte swapping values to the
endian format appropriate to the host microprocessor.
You can use the following POSIX functions to convert between network byte order and host byte
order. (Other byte swapping functions, such as those defined in the OSByteOrder.h and CFByteOrder.h
header files, can also be useful for handling network data.)
■
network to host:
uint32_t ntohl (uint32_t netlong);
uint16_t ntohs (uint16_t netshort);
■
host to network:
uint32_t htonl (uint32_t hostlong);
uint16_t htons (uint16_t hostshort);
These functions are documented in the man pages, which can be viewed using Terminal or in Xcode.
The sin_saddr.s_addr and sin_port fields of a sockaddr_in structure should always be in network
byte order. You can find out the appropriate endian format of any argument to a BSD networking
function by reading the man page documentation.
OSType-to-String Conversions
You can use the functions UTCreateStringForOSType and UTGetOSTypeFromString to convert an
OSType to or from a CFString object (CFStringRef data type). These functions are discussed in
Uniform Type Identifiers Overview and defined in the UTType.h header file, which is part of the Launch
Services framework.
32
Byte Swapping Strategies
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
3
Swapping Bytes
When you use four-character literals, keep in mind that “abcd” != 'abcd'. Rather 'abcd' == 0x61626364.
You must treat 'abcd’ as an integer and not string data, as 'abcd' is a shortcut for a 32-bit integer. (A
FourCharCode data type is a UInt32 data type.) The compiler does not swap this for you. You can
use the shift operator if you need to deal with individual characters.
For example, if you currently print an OSType or FourCharCode type using the standard C printf-style
semantics, use
printf("%c%c%c%c", (char) (val >> 24), (char) (val >> 16),
(char) (val >> 8), (char) val)
instead of the following:
printf("%4.4s", (const char*) &val)
Unicode Text Files
Mac OS X often uses UTF-16 to encode Unicode; a UniChar data type is a double-byte value. As with
any multibyte data, Unicode characters are sensitive to the byte ordering method used by the
microprocessor. The Unicode standard states that in the absence of a byte order mark (BOM) the data
in a Unicode data file is to be taken as big-endian. Although a BOM is not mandatory, you should
make use of it to ensure that a file written on one architecture can be read from the other architecture.
A byte order mark written to the beginning of a file informs the program reading the data which byte
ordering method was used to write the data. The program can then act accordingly to make sure the
byte ordering of the Unicode text is compatible with the host.
Table 3-1 lists the standard byte order marks for UTF-8, UTF-16, and UTF-32. (Note that the UTF-8
BOM is not used for endian issues, but only as a tag to indicate that the file is UTF-8.)
Table 3-1
Byte order marks
Byte order mark Encoding form
EF BB BF
UTF-8
FF FE
UTF-16/UCS-2, little endian
FE FF
UTF-16/UCS-2, big endian
FF FE 00 00
UTF-32/UCS-4, little endian
00 00 FE FF
UTF-32/UCS-4, big endian
In practice, when your application reads a file, it does not need to look for a byte order mark nor does
it need to swap bytes as long as you follow these steps to read a file:
1.
Map the file using mmap to get a pointer to the contents of the file (or string).
Reading the entire file into memory ensures the best performance and is a prerequisite for the
next step.
Byte Swapping Strategies
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
33
C H A P T E R
3
Swapping Bytes
2.
Generate a CFString by calling the function CFStringCreateWithBytes with the
isExternalRepresentation parameter set to true or call the function
CFStringCreateWithExternalRepresentation to generate a CFString, passing in an encoding
of kCFStringEncodingUnicode (for UTF-16) or kCFStringEncodingUTF8 (for UTF-8).
Either function interprets a BOM and performs any necessary byte swapping. Note that a BOM
should not be used in memory; its use is solely for data transmission (files, Clipboard, and so
forth).
In summary, with respect to Unicode files, your application performs best when you follow these
guidelines:
■
Accept the BOM when taking UTF-16 or UTF-8 encoded files from outside the application.
■
Use native-endian UniChar data types internally
■
Generate a BOM when writing UTF-16 to a file. Ideally, you only need to generate a BOM for an
architecture that uses little-endian format, but it is also acceptable to generate a BOM for an
architecture that uses big-endian format.
■
When you put data on the Clipboard, make sure that 'utxt' data does not have a BOM. Only 'ut16'
data should have a BOM. If you use Cocoa to put an NSString on the pasteboard, you don’t need
to concern yourself with a BOM.
For more information, see “UTF & BOM,” available from the Unicode website:
http://www.unicode.org/faq/utf_bom.html
The Apple Event Manager provides text constants that you can use to specify the type of your data.
As of Mac OS X v10.4, only two text constants are recommended:
■
typeUTF16ExternalRepresentation, which specifies Unicode text in 16-bit external representation
with optional byte order mark (BOM). The presence of this constant guarantees that either there
is a BOM or the data is in UTF-16 big-endian format.
■
typeUTF8Text, which specifies 8-bit Unicode (UTF-8 encoding).
The constant typeUnicodeText indicates utxt text data, in native byte ordering format, with an
optional BOM. This constant does not specify an explicit Unicode encoding or byte order definition.
The Scrap Manager provides the flavor type constant kScrapFlavorTypeUTF16External which
specifies Unicode text in 16-bit external representation with optional byte order mark (BOM).
Writing and Installing a Callback to Byte Swap Data
You can provide a byte swapping callback routine, also referred to as a flipper, to the system for
custom resource data, custom pasteboard data, and custom Apple event data. When you install a
byte swapping callback you specify which domain that the data type belongs to. There are two data
domains—Apple event and resource. The resource data domain specifies custom pasteboard data or
custom resource data. If the callback can be applied to either domain (Apple event and resource), you
can specify that as well.
34
Writing and Installing a Callback to Byte Swap Data
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
3
Swapping Bytes
The Core Endian API defines a callback that you provide to byte swap custom resource and Apple
event data. You must provide one callback for each type of data you want to byte swap. The prototype
for the CoreEndianFlipProc callback is:
typedef CALLBACK_API (OSStatus, CoreEndianFlipProc)
(OSType dataDomain,
OSType dataType,
short id,
void *dataPtr,
UInt32 dataSize,
Boolean currentlyNative,
void *refcon
);
The callback takes the following parameters:
■
dataDomain—An OSType value that specifies the domain to which the flipper callback applies.
The value kCoreEndianResourceManagerDomain signifies that the domain is resource or
pasteboard data. The value kCoreEndianAppleEventManagerDomain signifies that the domain
is Apple event data.
■
dataType—The type of data to be byte swapped by the callback. This is the four-character code
of the resource type, pasteboard type, or Apple event.
■
id—The resource id of the data type. This field is ignored if the dataDomain parameter is not
kCoreEndianResourceManagerDomain.
■
dataPtr —On input, points to the data to be flipped. On output, points to the byte swapped data.
■
dataSize—The size of the data pointed to by the dataPtr parameter.
■
currentlyNative—A Boolean value that indicates the direction to byte swap. The value true
specifies the data pointed to by the dataPtr parameter uses the byte ordering of the currently
executing code. On a PowerPC system, true specifies that the data is in big-endian format. On
an x86 system, true specifies that the data is in little-endian format.
■
refcon—A 32-bit value that contains, or refers to, data needed by the callback.
The callback returns a result code that indicates whether the byte swapping is successful. Your callback
should return noErr if the data is byte swapped without error and the appropriate result code to
indicate an error condition—errCoreEndianDataTooShortForFormat,
errCoreEndianDataTooLongForFormat, or errCoreEndianDataDoesNotMatchFormat. The result
code you return is propagated through the appropriate manager (Resource Manager (ResError) or
Apple Event Manager) to the caller.
You do not need to byte swap non numerical quantities (such as strings, byte streams, and so forth).
You need to provide a callback only to byte swap data types for which the order of bytes in a word
or long word are important. (For the preferred way to handle Unicode strings, see “Unicode Text
Files” (page 33).)
Your callback should traverse the data structure that contains the data and byte swap:
■
All counts and lengths so that array indexes are associated with the appropriate value
■
All integers and longs so that when you read them into variables of a compatible type, you can
operate correctly on the values (such as numerical, offset, and shift operations)
The Core Endian API provides these functions for working with your callback:
Writing and Installing a Callback to Byte Swap Data
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
35
C H A P T E R
3
Swapping Bytes
■
CoreEndianInstallFlipper installs your callback for the specified data type (custom resource
or custom Apple Event). After you install a byte swapping callback for application-defined
resource data type, then any time you call a Resource Manager function that operates on that
resource type, the system invokes your callback if it is appropriate to do so. (If your callback
operates on pasteboard data, the system also invokes the callback at the appropriate time.)
Similarly, if you specify Apple event as the domain for your callback, then any time you call an
Apple Event Manager function that operates on that data type, your callback is invoked if it is
appropriate to do so.
■
CoreEndianGetFlipper obtains the callback that is installed for the specified data type. You can
call this function to determine whether a flipper is available for a given data type.
■
CoreEndianFlipData invokes the callback associated with the specified data type. You shouldn’t
need to call this function, because the system invokes your callback whenever it’s needed.
As an example, look at a callback for the custom resource type ('PREF') defined in Listing 3-6. The
MyPreferences structure is used to store preferences data on disk. The structure contains a number
of values and includes two instances of the RGBColor data type and an array of RGBColor values.
Listing 3-6
A declaration for a custom resource
#define kMyPreferencesType
'PREF'
struct MyPreferences {
SInt32
fPrefsVersion;
Boolean
Boolean
fHighlightLinks;
fUnderlineLinks;
RGBColor
RGBColor
SInt16
fHighlightColor;
fUnderlineColor;
fZoomValue;
char
fCString[32];
SInt16
RGBColor
fCount;
fPalette[];
};
You can handle the RGBColor data type by writing a function that swaps bytes in an RGBColor data
structure. See the function MyRGBSwap, shown in Listing 3-7. This function calls the Core Endian
macro EndianS16_Swap to swap bytes for each of the values in the RGBColor data structure. The
function doesn’t need to check for the currently executing system because the function is never called
unless the values in the RGBColor data type need to be byte swapped. The MyRGBSwap function is
called by the byte swapping callback routine (shown in Listing 3-8 (page 37)) that’s provided to
handle the custom 'PREF' resource (that is defined in Listing 3-6 (page 36)).
Listing 3-7
A flipper function for RGBColor data
static void MyRGBSwap (RGBColor *p)
{
p->red = Endian16_Swap(p->red);
p->blue = Endian16_Swap(p->blue);
p->green = Endian16_Swap(p->green);
}
36
Writing and Installing a Callback to Byte Swap Data
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
3
Swapping Bytes
Listing 3-8 shows a byte swapping callback for the custom 'PREF' resource. An explanation for each
numbered line of code appears following the listing. Note that the flipper checks for data that is
malformed or is of an unexpected length. If the data passed into the flipper routine is a shorter length
than the flipped type is normally, or (for example) contains garbage data instead of an array count,
the flipper must be careful not to read or write data beyond the end of the passed-in data. Instead,
the routine returns an error.
Listing 3-8
A flipper for the custom 'PREF' resource
#define kCurrentVersion
0x00010400
static OSStatus MyFlipPreferences (OSType dataDomain,
OSType dataType,
short id,
void * dataPtr,
UInt32 dataSize,
Boolean currentlyNative,
void* refcon)
{
UInt32 versionNumber;
OSStatus status = noErr;
MyPreferences* toFlip = (MyPreferences*) dataPtr;
int count, i;
if (dataSize < sizeof(MyPreferences))
return errCoreEndianDataTooShortForFormat;
if (currentlyNative)
{
count = toFlip->fCount;
versionNumber = toFlip->fPrefsVersion
toFlip->fPrefsVersion = Endian32_Swap (toFlip->fPrefsVersion);
toFlip->fCount = Endian16_Swap (toFlip->fCount);
toFlip->fZoomValue = Endian16_Swap (toFlip->fZoomValue);
}
else
{
toFlip->fPrefsVersion = Endian32_Swap (toFlip->fPrefsVersion);
versionNumber = toFlip->fPrefsVersion
toFlip->fCount = Endian16_Swap (toFlip->fCount);
toFlip->fZoomValue = Endian16_Swap (toFlip->fZoomValue);
count = toFlip->fCount;
}
if (versionNumber != kCurrentVersion)
return errCoreEndianDataDoesNotMatchFormat;
// 1
// 2
// 3
// 4
// 5
// 6
// 7
// 8
// 9
// 10
// 11
// 12
MyRGBSwap (&toFlip->fHighlightColor);
MyRGBSwap (&toFlip->fUnderlineColor);
// 13
// 14
if (dataSize < sizeof(MyPreferences) + count * sizeof(RGBColor))
return errCoreEndianDataTooShortForFormat;
// 15
for(i = 0; i < count; i++)
{
MyRGBSwap (&toFlip->fPalette[i]);
}
Writing and Installing a Callback to Byte Swap Data
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
// 16
37
C H A P T E R
3
Swapping Bytes
return status;
// 17
}
Here’s what the code does:
1.
The system passes your callback the domain to which the callback applies. You define the domain
when you install the callback using the function CoreEndianInstallFlipper.
2.
The system passes your callback the resource type you defined for the data. In this example, the
resource type is 'PREF'.
3.
The system passes your callback the resource ID of the data type. If the data is not a resource,
this value is 0.
4.
The system passes your callback a pointer to the resource data that must be byte swapped. In this
case, the pointer refers to a MyPreferences data structure.
5.
The system passes your callback the size of the data pointed to by the pointer described in the
previous step.
6.
The system passes your callback true if the data in the buffer passed to the callback is in the byte
ordering of the currently executing code. On a Macintosh using a PowerPC microprocessor, when
currentlyNative is true, the data is in big-endian order. On a Macintosh using an Intel
microprocessor, when currentlyNative is true, the data is in little-endian order. Your callback
needs to know this value, because if your callback uses a value in the data buffer to decide how
to process other data in the buffer (for example, the count variable shown in the code) you must
know whether that value needs to be flipped before the value can be used by the callback.
7.
The system passes your callback a pointer that refers to application-specific data. In this example,
the callback doesn’t require any application-specific data.
8.
Defines a variable for the MyPreferences data type and assigns the contents of the data pointer
to the newly-defined toFlip variable.
9.
Checks the static-length portion of the structure. If the size is less than it should be, the routine
returns the error errCoreEndianDataTooLongForFormat.
10. If currentlyNative is true, saves the count value to a local variable and then byte swaps the
other values in the MyPreferences data structure. You must save the count value before you
swap because you need it for an iteration later in the function. The fact that currentlyNative is
true indicates that the value does not need to be byte swapped if it is used in the currently
executing code. However, the value does need to be swapped to be stored to disk.
The values are swapped using the appropriate Core Endian macros.
11. If currentlyNative is false, flips the values in the MyPreferences data structure before it saves
the count value to a local variable. The fact that currentlyNative is false indicates that the
count value must be byte swapped before it can be used in the callback.
12. Checks to make sure the version of the data structure is supported by the application. If the
version is not supported, then your callback would not byte swap the data and would return the
result errCoreEndianDataDoesNotMatchFormat.
13. Calls the MyRGBSwap function (shown in Listing 3-7 (page 36)) to byte swap the fHighlightColor
field of the data structure.
38
Writing and Installing a Callback to Byte Swap Data
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
3
Swapping Bytes
14. Calls the MyRGBSwap function to byte swap the fUnderlineColor field of the data structure.
15. Checks the data size to make sure that it is less than it should be. If not, the routine returns the
error errCoreEndianDataTooLongForFormat.
16. Iterates through the elements in the fPalette array, calling the MyRGBSwap function to byte swap
the data in the array.
17. Returns noErr to indicate that the data is flipped without error.
Although the sample performs some error checking code, it does not include all the error-handling
code that it could. When you write a flipper you may want to include such code.
Note: The callback does not flip any of the Boolean values in the MyPreferences data structure
because these are single character values. The callback also ignores the C string.
You install a byte swapping callback routine by calling the function CoreEndianInstallFlipper.
You should install the callback when your application calls its initialization routine or when you open
your resources. For example, you would install the flipper callback shown in Listing 3-8 (page 37)
using the following code:
OSStatus status = noErr;
status = CoreEndianInstallFlipper (kCoreEndianResourceManagerDomain,
kMyPreferencesType,
MyFlipPreferences,
NULL);
The system invokes the callback for the specified resource type and data domain when
currentlyNative is false at the time a resource is loaded and true at the time the resource is set
to be written. For example, the sample byte swapping callback gets invoked any time the following
line of code is executed in your application:
MyPreferences** hPrefs = (MyPreferences**) GetResource ('PREF', 128);
After the data is byte swapped, you can modify it as much as you’d like.
When the Resource Manager reads a resource from disk, it looks up the resource type (for example,
'PREF') in a table of byte swapping routines. If a callback is installed for that resource type, the
Resource Manager invokes the callback. Similar actions are taken when the Resource Manager writes
a resource to disk. It finds the appropriate routine and invokes the callback to byte swap the resource
to big-endian byte ordering.
When you copy or drag custom data from an application that has a callback installed for pasteboard
data, the system invokes your callback at the appropriate time. If you copy or drag custom data to a
native application, the data callback is not invoked. If you copy or drag custom data to a nonnative
application, the system invokes your callback to byte swap the custom data. If you paste or drop
custom data into your application from a nonnative application, and a callback exists for that custom
data, the system invokes the callback at the time of the paste or drop. If the custom data is copied or
dragged from another native application, the callback is not invoked.
Note that different pasteboard APIs use different type specifiers. The Scrap Manager and Drag
Manager use OSTypes. The Pasteboard Manager uses Uniform Type Identifiers (UTI) and NSPasteboard
uses its own type mechanism. In each case, the type is converted by the system to an OSType to
discover if there is a byte swapping callback for that type.
Writing and Installing a Callback to Byte Swap Data
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
39
C H A P T E R
3
Swapping Bytes
Apple event data types are typically swapped to network byte order when sent over a network. The
callback you install is called only if a custom data type that you define is sent to another machine, or
if another machine sends Apple event data to your application. The byte ordering of Apple events
on the network is big-endian.
For cases in which the system would not normally invoke your byte swapping callback, you can call
the function CoreEndiaFlipData to invoke the callback function installed for the specified data type
and domain.
See Also
The following resources are available in the ADC Reference Library:
■
Writing PCI Drivers, see the section Endianess and Addressing.
■
Byte-Order Utilities Reference describes the Core Foundation byte order utilities API.
■
Byte Swapping, in Core Foundation Memory Management, shows how to swap integers and
floating-point values using Core Foundation byte-order utilities.
■
Resource Endian Flippers discusses writing a flipper.
http://developer.apple.com/quicktime/icefloe/dispatch025.html
■
40
File-System Performance Guidelines provides information useful for mapping Unicode files to
memory.
See Also
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
4
Guidelines for Specific Scenarios
This chapter lists an assortment of scenarios that relate to a specific technology or API. Although
many of these scenarios are uncommon, you will want to at least glance at the topics to determine
whether anything applies to your application. The topics are organized alphabetically.
Aliases
Aliases are big-endian on all systems. Applications that add extra information to the end of an
AliasHandle must ensure that the extra data is always endian-neutral or of a defined endian type,
preferably big-endian.
The AliasRecord data structure is opaque when building your application with the Mac OS X v10.4.1
SDK. Code that formerly accessed the userType field of an AliasRecord must use the Alias Manager
functions GetAliasUserType, GetAliasUserTypeFromPtr, SetAliasUserType, or
SetAliasUserTypeFromPtr. Code that formerly accessed the aliasSize field of an AliasRecord
must use the functions GetAliasSize or GetAliasSizeFromPtr.
These Alias Manger functions are available in Mac OS X v10.4 and later. For more information, see
Alias Manager Reference.
Archived Bit Fields
It’s best not to archive to NSArchives any structures that contain bit fields as integers. Individual
values are stored in the archives in an architecture and compiler dependent manner. In cases where
archives already contain such structures, you can read a structure correctly by changing its declaration
so that the bit fields are swapped appropriately, as shown in Listing 4-1.
You might want to examine your code to make sure such changes don’t affect other code. Note that
this workaround is specific to GCC and should not be used in any new code that you write.
Listing 4-1
A structure that swaps bit fields
typedef struct {
#ifdef __BIG_ENDIAN__
unsigned int
rotatedFromBase:1;
unsigned int
aboutToResize:1;
Aliases
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
41
C H A P T E R
4
Guidelines for Specific Scenarios
unsigned int
#else
unsigned int
unsigned int
unsigned int
#endif
} _flags;
stuff:30;
stuff:30;
aboutToResize:1;
rotatedFromBase:1;
Bit Shifting
When you shift a value by the width of its type or more the fill bits are undefined regardless of the
architecture. In fact, two different compilers on the same architecture could give different results for
the following:
uint32_t x = 0xDEADBEEF;
uint32_t y = x >> 32;
Bit Test, Set, and Clear Functions: Carbon and POSIX
Don’t mix using the Carbon functions BitTst, BitSet, and BitClr and the POSIX macros setbit,
clrbit, isset, and isclr with the C bitwise operators. If you consistently use the Carbon and POSIX
functions and avoid the C bitwise operators, you code will work correctly.
The Carbon and POSIX functions perform a byte-by-byte traversal, which causes problems on the
x86 architecture when they operate on data types that are larger than 1 byte. You can use these
functions only on a pointer to a string of endian-neutral bytes.
You need to change any code that uses these bit testing functions to perform bit manipulation on
integer values. For example, instead of BitTst(&int32, 5L), use (int32 & (1 << 26)).
You’ll encounter problems when you use the function BitTst to test for 24-bit mode. For example,
the following bit test returns false, which indicates that the process is running in 24-bit mode, or at
least that the code is not running in 32-bit mode. The POSIX equivalents perform similarly.:
Gestalt(gestaltAddressingModeAttr, &gestaltResult);
if (!(BitTst(&gestaltResult,31L)) )
/*If 24 bit
You can use any of the bit testing, setting, and clearing functions if you pass a pointer to data whose
byte order is fixed. Used in this way, these functions behave the same on both architectures.
For more information, see the ToolsUtils.h header file in the Core Services framework and Mathematical
and Logical Utilities Reference.
42
Bit Shifting
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
4
Guidelines for Specific Scenarios
Deprecated Functions
Many deprecated functions, such as those that use PICT + PS data, have byte swapping issues. You
may want to replace deprecated functions at the same time you prepare your code to run as a universal
binary. You’ll not only solve byte swapping issues, but your code will use functions that ultimately
benefit future development.
A function that is deprecated has an availability statement in its header file that states the version of
Mac OS X in which the function is deprecated. Many API reference documents provide a list of
deprecated functions. In addition, compiler warnings for deprecated functions are on by default in
Xcode version 2.1.
Disk Partitions
The partition format of the disk on a Macintosh using an Intel microprocessor differs from that using
a PowerPC microprocessor. If your application depends on the partitioning details of the disk, it may
not behave as expected. Partitioning details can affect tools that examine the hard disk at a low level.
Double-Precision Values: Bit-by-Bit Sensitivity
Although both architectures are IEEE 754 compliant, there are differences in the rounding procedure
used by each when operating on double-precision numbers. If your application is sensitive to bit-by-bit
values in double-precision numbers, be aware that the same computation performed on each
architecture may produce a different numerical result.
For more information, see Volume 1 of the Intel developer software manuals, available from the
following website:
http://developer.intel.com/design/Pentium4/documentation.htm
Finder Information and Low-Level File System Operations
If your code operates on the file system at a low level and handles Finder information, keep in mind
that the file system does not swap bytes for the following information:
■
The finderInfo field in the HFSPlus data structures HFSCatalogFolder, HFSPlusCatalogFolder,
HFSCatalogFile, HFSPlusCatalogFile, and HFSPlusVolumeHeader.
■
The FSPermissionInfo data structure, which is used when the constant kFSCatInfoPermissions
is passed to the HFSPlus functions GetCatalogInfo and GetCatalogInfoBulk.
The value of multibyte fields on disk always uses big-endian format. When running on a little-endian
system, you must byte swap any multibyte fields.
Deprecated Functions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
43
C H A P T E R
4
Guidelines for Specific Scenarios
The getxattr function, added in Mac OS X v10.4, retrieves extended attributes. When using this
function to access the legacy attribute "com.apple.FinderInfo", note that as with getattrlist,the
information returned by this call is not byte swapped. See the getxattr man page for more information
on this function.
Note: This issue pertains only to code that operates below CarbonCore. Calls to Carbon functions
such as FSGetCatalogInfo are not affected.
Font-Related Resources
Font-related resource types (FOND, NFNT, sfnt, and so forth) are in big-endian format on a Macintosh
using an Intel microprocessor. If your application accesses font-related resource types directly, you
need to change your code to use the Apple Type Services (ATS) for Fonts API. The functions in this
API read font data and swap it appropriately for you.
If for any reason your application can't use ATS for Fonts functions, you must swap the fields of
font-related resource types yourself.
For more information, see Apple Type Services for Fonts Reference.
GWorlds
When the QuickDraw function NewGWorld allocates storage for the pixel buffer, and the depth
parameter is 16 or 32 bits, the byte ordering within each pixel matters. The pixelFormat field of the
PixMap data structure can have the values k16BE555PixelFormat or k16LE555PixelFormat for 2-byte
pixels, and k32ARGBPixelFormat or k32BGRAPixelFormat for 4-byte pixels. (These constants are
defined in the Quickdraw.h header file.) By default, NewGWorld always creates big-endian pixel
formats (k16BE555PixelFormat or k32ARGBPixelFormat), regardless of the endian format of the
system.
For performance reasons it is generally preferable for you to use a pixel format that corresponds to
the native byte ordering of the system. When you pass kNativeEndianPixMap in the flags parameter
to NewGWorld, the byte ordering of the pixel format is big-endian on big-endian systems, and
little-endian on little-endian systems.
Note: QuickDraw does not support little-endian pixel formats on big-endian systems.
You can use the GWorld pixel storage as input to the Quartz function CGBitmapContextCreate or
as a data provider for the Quartz function CGImageCreate. The byte ordering of the source pixel
format needs to be communicated to Quartz through additional flags in the bitmapInfo parameter,
which are defined in the CGImage.h header file. Assuming that your bitmapInfo parameter is already
set up, you now need to combine it (by using a bitwise OR operator) withkCGBitmapByteOrder16Host
or kCGBitmapByteOrder32Hostif you created the GWorld with a kNativeEndianPixMap flag. Similarly,
you should use kCGBitmapByteOrder16Big or kCGBitmapByteOrder32Big when you know that
your pixel byte order is big-endian.
44
Font-Related Resources
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
4
Guidelines for Specific Scenarios
Java I/O API (NIO)
The I/O API (NIO) that was introduced in JDK 1.4 allows the use of native memory buffers. If you
are a Java programmer who uses this API, you may need to revise your code. NIO byte buffers have
a byte ordering which by default is big-endian. If you have Java code originally written for Mac OS
X on PowerPC, when you create java.nio.ByteBuffers you should call the function
ByteBuffer.order(ByteOrder.nativeOrder()) to set the byte order of the buffers to the native
byte order for the current architecture. If you fail to do this, you will obtain flipped data when you
read multibyte data from the buffer using JNI.
Machine Location Data Structure
The MachineLocation data type defines the format for the geographic location record. If your code
uses the MachineLocation data structure, you need to change it to use the
MachineLocation.u.dls.Delta field that was added to the structure in Mac OS X version 10.0.
To be endian-safe, change code that uses the old field:
MachineLocation.u.dlsDelta = 1;
to use the new field:
MachineLocation.u.dls.Delta = 1;
The gmtDelta field remains the same—the low 24 bits are used. The order of assignment is important.
The following is incorrect because it overwrites results:
MachineLocation.u.dls.Delta = 0xAA;
// u = 0xAAGGGGGG; G=Garbage
MachineLocation.u.gmtDelta = 0xBBBBBB;
// u = 0x00BBBBBB;
This is the correct way to assign the values:
MachineLocation.u.gmtDelta = 0xBBBBBB;
MachineLocation.u.dls.Delta = 0xAA;
// u = 0x00BBBBB;
// u = 0xAABBBBBB;
For more details see Memory Management Utilities Reference.
Metrowerks PowerPlant
Applications that use Metrowerks PowerPlant framework and its PPob resources need to supply a
resource flipper for PPob resources. Use the sample code in “Flipping PowerPlant Resources” (page 89)
as a model for writing a PPob flipper that is customized for your code.
Java I/O API (NIO)
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
45
C H A P T E R
4
Guidelines for Specific Scenarios
Multithreading
Multithreading is a technique used to improve performance and enhance the perceived responsiveness
of applications. On computers with one processor, this technique can allow a program to execute
multiple pieces of code independently. On computers with more than one processor, multithreading
can allow a program to execute multiple pieces of code simultaneously.
Intel processors support multiprocessor systems similar to the G5. The Intel architecture also supports
hardware multithreading within a single processor package to execute two or more separate threads
of execution simultaneously. If your application is single-threaded, consider threading your application
to take advantage of hardware multithreading processor capabilities. If your application is
multithreaded, you’ll want to ensure that the number of threads is not hard coded to a fixed number
of processors.
Hyper-Threading Technology (HT Technology) is the Intel implementation of a technique referred
to as simultaneous multithreading (SMT). Processors that are enabled for HT Technology duplicate
the architectural state to support two logical processors while sharing execution resources within a
single processor core. Dual core technology further improves performance by providing two physical
cores within a single physical processor package. Multiprocessor, Hyper-Threading, and dual core
technology all exploit thread-level parallelism (TLP) to improve application and system responsiveness
and to boost processor throughput.
When you prepare code to run as a universal binary, the multithreading capabilities of the
microprocessor are transparent to you. This is true whether your application is threaded or not.
However, you can optimize your code to take advantage of the specific way hardware multithreading
is implemented for each architecture. For a Macintosh using an Intel microprocessor, refer to the Intel
processor manuals and optimization guides for the targeted processor.
Objective-C: Messages to nil
On a Macintosh using an Intel microprocessor, Objective-C messages sent to nil return garbage for
return values that are typed as float or double. On a Macintosh using a PowerPC microprocessor
these messages return 0.0.
Objective-C Runtime: Low-Level Operations
If your application directly calls the Objective-C runtime function objc_msgSend_stret, you need
to change your code to have it work correctly on the x86 architecture.
The x86 ABI for struct-return functions differs from the ABI for struct-address-as-first-parameter
functions, but the two ABIs are identical on PowerPC. When you call objc_msgSend_stret you must
cast the function to a function pointer type that uses the expected struct return type. The same applies
for calls to objc_msgSendSuper_stret.
For other details on the ABI, see “Application Binary Interface” (page 85).
46
Multithreading
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
4
Guidelines for Specific Scenarios
Open Firmware
Macintosh computers using Intel microprocessors do not use Open Firmware. Although many parts
of the IO registry are present and work as expected, information that is provided by Open Firmware
on a Macintosh using a PowerPC microprocessor (such as a complete device tree) is not available in
the IO registry on a Macintosh using an Intel microprocessor. You can obtain some of the information
from IODeviceTree by using the sysctlbyname or sysctl commands.
OpenGL
When defining an OpenGL image or texture, you need to provide a type that specifies to OpenGL
which format the texture is in. Most of these functions (for example, glTexImage2D) take format and
type_ parameters that specify how the texture is laid out on disk or in memory. OpenGL supports a
number of different image types; some are endian-neutral but others are not.
For example, a common image format is GL_RGBA with a type of GL_UNSIGNED_BYTE. This means that
the image has a byte that specifies the red color data followed by a byte that specifies the green color
data, and so forth. This format is not endian-specific; the bytes are in the same order on all architectures.
Another common image format is GL_BGRA, often specified by the type
GL_UNSIGNED_INT_8_8_8_8_REV. This type means that every four bytes of image data are interpreted
as an unsigned int, with the most significant 8 bits representing the alpha data, the next most
significant 8 bits representing the red color data, and so forth. Because this format is specific to the
integer format of the host, the format is interpreted differently on little-endian systems than on
big-endian systems. When using GL_UNSIGNED_INT_8_8_8_8_REV, the OpenGL implementation
expects to find data in byte order ARGB on big-endian systems, but BGRA on little-endian systems.
Because there is no explicit way in OpenGL to specify a byte order of ARGB with 32-bit or 16-bit
packed pixels (which are common image formats on Macintosh PowerPC computers), many
applications specify GL_BGRA with GL_UNSIGNED_INT_8_8_8_8_REV. This practice works on a
big-endian system such as PowerPC, but the format is interpreted differently on a little-endian system,
and causes images to be rendered with incorrect colors.
Applications that have this problem are those that use the OpenGL host-order format types, but
assume that the data referred to is always big endian. These types include, but are not limited to the
following:
GL_SHORT
GL_UNSIGNED_SHORT
GL_INT
GL_UNSIGNED_INT
GL_FLOAT
GL_DOUBLE
GL_UNSIGNED_BYTE_3_3_2
GL_UNSIGNED_SHORT_4_4_4_4
GL_UNSIGNED_SHORT_5_5_5_1
GL_UNSIGNED_INT_8_8_8_8
GL_UNSIGNED_INT_10_10_10_2
GL_UNSIGNED_SHORT_5_6_5
GL_UNSIGNED_BYTE_2_3_3_REV
GL_UNSIGNED_SHORT_5_6_5_REV
GL_UNSIGNED_SHORT_4_4_4_4_REV
Open Firmware
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
47
C H A P T E R
4
Guidelines for Specific Scenarios
GL_UNSIGNED_SHORT_1_5_5_5_REV
GL_UNSIGNED_INT_8_8_8_8_REV
GL_UNSIGNED_INT_2_10_10_10_REV
If your application does not use any of these types, it is unlikely to have any problems with OpenGL.
Note that an application is not necessarily incorrect to use one of these types. Many applications might
already present host-order data tagged with one of these formats, especially with existing
cross-platform code, because the Mac OS X implementation behaves the same way as a Windows
implementation.
If an application incorrectly uses one of these types, its OpenGL textures and images are rendered
with incorrect colors. For example, red might appear green, or the image might appear to be tinted
purple.
You can fix this problem in one of the following ways:
1.
If the images are generated or loaded algorithmically, change the code to generate the textures
in host-order format that matches what OpenGL expects. For example, a JPEG decoder can be
modified to store its output in 32-bit integers instead of four 8-bit bytes. The resulting data is
identical on big-endian systems, but on a little-endian system, the bytes are in a different order.
This matches the OpenGL expectation, and the existing OpenGL code continues to work on both
architectures. This is the preferred approach.
In many cases, rewriting the algorithms may prove a significant amount of work to implement
and debug. If that’s the case, an approach that asks OpenGL to interpret the texture data differently
might be a better approach for you to take.
2.
If the application uses GL_UNSIGNED_BYTE_8_8_8_8_REV or GL_UNSIGNED_BYTE_8_8_8_8, it can
switch between them based on the architecture. Since these two types are exactly byte swapped
versions of the same format, using GL_UNSIGNED_BYTE_8_8_8_8_REV on a big-endian system is
equivalent to using GL_UNSIGNED_BYTE_8_8_8_8 on a little-endian system and vice versa. Code
might look as follows:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_BGRA_EXT,
#if __BIG_ENDIAN__
GL_UNSIGNED_BYTE_8_8_8_8_REV
#else
GL_UNSIGNED_BYTE_8_8_8_8
#endif
data);
If this is a common idiom, it might be easiest to define it as a macro that can be used multiple
times:
#if __BIG_ENDIAN__
#define ARGB_IMAGE_TYPE GL_UNSIGNED_BYTE_8_8_8_8_REV
#else
#define ARGB_IMAGE_TYPE GL_UNSIGNED_BYTE_8_8_8_8
#endif
/* later on, use it like this */
glTexImage2D (GL_TEXTURE_2D, 0, GL_RGB,
width, height, 0, GL_BGRA_EXT,
ARGB_IMAGE_TYPE, data);
48
OpenGL
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
4
Guidelines for Specific Scenarios
This approach, however, only works for this particular 32-bit packed-pixel data type. It is common
to store 16-bit ARGB data using GL_UNSIGNED_SHORT_1_5_5_5_REV, but there is no corresponding
byte swapped type. Avoid using GL_UNSIGNED_SHORT_5_5_5_1, because these are not the same.
The format is interpreted as bit-order arrrrrbbbbbggggg on a big-endian system, and as bit order
ggrrrrrabbbbbggg on a little-endian system.
3.
The final method is to instruct OpenGL to byte swap any texture loaded on a little-endian system.
You can accomplish this using the GL_UNPACK_SWAP_BYTES pixel store setting. This setting applies
to all texture or image calls made with the current OpenGL context, so it needs to be set only once
per OpenGL context, for example:
#if __LITTLE_ENDIAN__
glPixelStorei(GL_UNPACK_SWAP_BYTES, 1);
#endif
This method causes images that use the problematic formats to be loaded as they would be on
PowerPC. You should consider this option only if no other option is available. Enabling this
option causes OpenGL to use a slower rendering path than it would normally.
Performance-sensitive OpenGL applications may be significantly slower with this option enabled
than with it off. Although this method can get an OpenGL-based program up and running in as
little time as possible, it is highly recommended that you use one of the other two methods.
Note: Using the GL_UNSIGNED_BYTE_8_8_8_8 format for GL_BGRA data is not necessarily faster than
using GL_UNPACK_SWAP_BYTES. In some cases, performance decreases for rendering textures that use
either of those two methods compared to using a data type such as GL_UNSIGNED_BYTE_8_8_8_8_REV.
It’s advisable that you use Shark or other tools to analyze the performance of your OpenGL code and
make sure that you are not encountering particularly bad cases.
OSAtomic Functions
The kernel extension functions OSDequeueAtomic and OSEnqueAtomic are not available on x86.
For more information on these functions, see Kernel Extensions (Kernel/libkern) Reference.
Pixel Data
Applications that store pixel data in memory using ARGB format must take care in how they read
data. If the code is not written correctly, it’s possible to misread the data which leads to colors or
alpha that appear wrong.
The Quartz constants shown in Table 4-1 specify the byte ordering of pixel formats. These constants,
which are defined in the CGImage.h header file, are used in the bitmapInfo parameter. To specify
byte ordering to Quartz use a bitwise OR operator to combine the appropriate constant with the
bitmapInfo parameter.
OSAtomic Functions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
49
C H A P T E R
4
Guidelines for Specific Scenarios
Table 4-1
Quartz constants that specify byte ordering
Constant
Specifies
kCGBitmapByteOrderMask
kCGBitmapByteOrder16Big
16-bit, big endian format
kCGBitmapByteOrder32Big
32-bit, big endian format
kCGBitmapByteOrder16Little 16-bit, little endian format
kCGBitmapByteOrder32Little 32-bit, little endian format
kCGBitmapByteOrder16Host
16-bit, host endian format
kCGBitmapByteOrder32Host
32-bit, host endian format
QuickDraw Routines
If you have existing code that directly accesses the picFrame field of the QuickDraw Picture data
structure, you should use the QuickDraw function QDGetPictureBounds to get the
appropriately-swapped bounds for a Picture. This function is available in Mac OS X version 10.3
and later. Its prototype is as follows:
Rect * QDGetPictureBounds(
PicHandle
picH,
Rect
*outRect)
If you have existing code that uses the QuickDraw DeltaPoint function, make sure that you do not
cast the function result to a Point data structure. The horizontal difference is returned in the low 16
bits and the vertical difference is returned in the high 16 bits. You can obtain the horizontal and
vertical values by using code similar to the following:
Point pointDiff;
SInt32 difference = DeltaPoint (p1, p2);
pointDiff.h = LoWord (difference);
pointDiff.v = HiWord (difference);
Tip: The best solution is to convert your QuickDraw code to Quartz 2D. QuickDraw was deprecated
starting in Mac OS X v10.4. For help with converting to Quartz 2D, see Quartz Programming Guide for
QuickDraw Developers.
QuickTime Components
The Component Manager recognizes which architectures are supported by a component by looking
at the 'thng' resource for the component, not the architecture of the file. You must specify the
appropriate architectures in the 'thng' resource. To accomplish this, follow these steps:
50
QuickDraw Routines
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
4
Guidelines for Specific Scenarios
1.
Include in the .r file for your component (or a .h file included by the .r file), something similar to
what’s shown in Listing 4-2.
2.
In the .r file, where you define the 'thng' resource, modify your ComponentPlatformInfo array
to look similar to the following:
kMyComponentFlags, kMyCodeType, kMyCodeID, Target_PlatformType,
#if TARGET_REZ_UNIVERSAL_COMPONENTS
kMyComponentFlags, kMyCodeType, kMyCodeID, Target_SecondPlatformType
#endif
3.
In the target settings of your Xcode project, add this to your OTHER_REZFLAGS:
-d ppc_$(ppc) -d i386_$(i386)
4.
Rebuild your component. For details, see “Building a Universal Binary” (page 13).
Listing 4-2
Statements to include in the .r file for a component
#if !defined(ppc_YES)
#define ppc_YES
#endif
#if ppc_YES
#define TARGET_REZ_MAC_PPC
#endif
#if !defined(i386_YES)
#define i386_YES
#endif
#if i386_YES
#define TARGET_REZ_MAC_X86
#endif
#if !defined(TARGET_REZ_MAC_X86)
#define TARGET_REZ_MAC_X86
#endif
0
1
0
1
0
#if !(TARGET_REZ_MAC_PPC || TARGET_REZ_MAC_X86)
#if TARGET_CPU_X86
#undef TARGET_REZ_MAC_X86
#define TARGET_REZ_MAC_X86
1
#elif TARGET_CPU_PPC
#undef TARGET_REZ_MAC_PPC
#define TARGET_REZ_MAC_PPC
1
#endif
#endif
#if TARGET_REZ_MAC_PPC && TARGET_REZ_MAC_X86
#define TARGET_REZ_UNIVERSAL_COMPONENTS
1
#define Target_PlatformType
platformPowerPCNativeEntryPoint
#define Target_SecondPlatformType
platformIA32NativeEntryPoint
#elif TARGET_REZ_MAC_X86
#define Target_PlatformType
platformIA32NativeEntryPoint
#else
#define Target_PlatformType
platformPowerPCNativeEntryPoint
#endif
QuickTime Components
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
51
C H A P T E R
4
Guidelines for Specific Scenarios
Runtime Code Generation
If your application generates code at runtime, keep in mind that the compiler assumes that the stack
must be 16-byte aligned when calling into Mac OS X libraries or frameworks.
System-Specific Predefined Macros
The C preprocessor has several predefined macros whose purpose is to indicate the type of system
and machine in use. If your code uses system-specific predefined macros, evaluate whether you really
need to use them. In most cases applications need to know the capabilities available on a computer
and not the specific system or machine on which the application is running. For example, if your
application needs to know whether it is running on a little-endian or big-endian microprocessor, you
should use the __BIG_ENDIAN__ or __LITTLE_ENDIAN__ macros or the Core Foundation function
CFByteOrderGetCurrent. Do not use the __i386__ and __ppc__ macros for this purpose.
See GNU C 4.0 Preprocessor User Guide for additional information.
See Also
In addition to the following resources, check the ADC website periodically for updates and technical
notes that might address other specific situations:
■
Quartz Programming Guide for QuickDraw Developers provides information on moving code from
the deprecated QuickDraw API to Quartz.
■
IA-32 Intel Architecture Optimization Reference Manual, available from:
http://developer.intel.com/design/pentium4/manuals/index_new.htm
■
Hyper-Threading Technology Architecture and Microarchitecture, available from:
http://www.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p01_abstract.htm
52
Runtime Code Generation
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
5
Preparing Vector-Based Code
This chapter is relevant only for those developers whose applications already directly use the AltiVec
extension to the PowerPC instruction set or who want to start writing vector-based code. AltiVec
instructions, because they are processor-specific, must be replaced o Macintosh computers using Intel
microprocessors. You can choose from these two options:
■
Use the Accelerate framework. The Accelerate framework, introduced in Mac OS X v10.3 and
expanded in v10.4, is a set of high-performance vector-accelerated libraries. It provides a layer
of abstraction that lets you access vector-based code without needing to use vector instructions
yourself or to be concerned with the architecture of the target machine. The system automatically
invokes the appropriate instruction set.
■
Port AltiVec code to the Intel instruction set architecture (ISA). The MMX™, SSE, SSE2, and SSE3
extensions provide analogous functionality to AltiVec. Like the AltiVec unit, these extensions are
fixed-sized SIMD (Single Instruction Multiple Data) vector units, capable of a high degree of
parallelism. Just as for AltiVec, code that is written to use the Intel ISA typically performs many
times faster than scalar code.
“Accelerate Framework” (page 53) describes each of the libraries in the framework and tells where
to find additional information on using them. The rest of the chapter is devoted to rewriting AltiVec
code. It outlines the key differences between architectures in terms of vector-based programming,
gives an overview of the SIMD extensions on x86, lists what you need to do to build your code, and
provides an in-depth discussion on alignment.
Accelerate Framework
The Accelerate framework is an umbrella framework that wraps the existing vecLib and vImage
frameworks. It contains the following libraries:
■
vImage is the Apple image processing framework that includes high-level functions for image
manipulation—convolutions, geometric transformations, histogram operations, morphological
transformations, and alpha compositing—as well as utility functions that convert formats and
perform other operations.
■
vDSP provides mathematical functions that perform digital signal processing (DSP) for applications
such as speech, sound, audio, and video processing, diagnostic medical imaging, radar signal
processing, seismic analysis, and scientific data processing. The vDSP functions operate on real
and complex data types and include data type conversions, fast Fourier transforms (FFTs), and
vector-to-vector and vector-to-scalar operations.
Accelerate Framework
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
53
C H A P T E R
5
Preparing Vector-Based Code
■
vMathLib contains vector-accelerated versions of all routines in the standard math library.
■
LAPACK is a linear algebra package that solves simultaneous sets of linear equations, tackles
eigenvalue and singular solution problems, and determines least squares solutions for linear
systems.
■
BLAS (Basic Linear Algebra Subroutines) performs basic vector and matrix computations.
■
vForce contains routines that take matrices as input and output arguments, rather than single
variables.
The following documents describe libraries in the Accelerate framework:
■
Optimizing Image Processing With vImage is a programming guide that also includes a reference
for the vImage API.
http://developer.apple.com/documentation/Performance/Conceptual/vImage/index.html
■
Vector Libraries is a web page that describes the libraries available with the Accelerate framework
and provides links to sample and source code.
http://developer.apple.com/hardware/ve/vector_libraries.html
Rewriting AltiVec Instructions
Most of the tasks required to vectorize for AltiVec—restructuring data structures, designing parallel
algorithms, eliminating branches, and so forth— are the same as those you’d need to perform for the
Intel architecture. If you already have AltiVec code, you’ve already completed the fundamental
vectorization work needed to rewrite your application for the Intel architecture. In many cases the
translation process will be smooth, involving direct or nearly direct substitution of AltiVec intrinsics
with Intel equivalents.
However, there are a number of important differences between the two architectures. Before you start
rewriting AltiVec instructions for the Intel instruction set architecture, read “Differences Between
Instruction Set Architectures” (page 54). If the differences impact your code a great deal, you may
want to consider simply rewriting your code to use the Accelerate framework.
If you determine that rewriting your code for the Intel ISA is the best course of action, you will want
to read “The Programming Model” (page 56) which discusses the overall programming approach
for vector code on Intel and provides general strategies. When you are ready to build your code, see
“Building x86 ISA Code” (page 58) for information on compiler and other important settings. Read
“Aligning Data” (page 59) to become familiar with how alignment is accomplished on the Intel
architecture and for strategies that you can apply to your code.
Differences Between Instruction Set Architectures
Differences between the AltiVec instructions set architecture and the Intel instruction set architecture
will determine how much effort is required to move code from one architecture to the other. These
are the key differences:
■
54
Integer multiplication algorithms are not equivalent.
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
5
Preparing Vector-Based Code
AltiVec has 13 or so different flavors of integer multiplication with variations. The x86 architecture
has 3 with almost no variations. In certain cases, algorithms designed to showcase the AltiVec
multiply-accumulate facility need to be rewritten to showcase the x86 multipliers.
■
There is no direct translation for vec_perm.
There is no way to perform a permute operation on x86 for which the permute map is unknown
at compile time. Some byte permutes are also not possible. Operations like byte swapping in an
SIMD register, using the permute unit as a lookup table, and using the permute unit to handle
alignment simply don't work, or require a prohibitive amount of computation.
Vectorization may not be possible for AltiVec code such as small lookup tables that rely heavily
on vec_perm(). There is a permute-like shuffle facility (SHUFPS, SHUFPD, PSHUFD, PSHUFLW,
PSHUFHW) available. However, the permute map must be determined at compile time, meaning
that no run time decisions can be made about how to shuffle the data.
■
Misaligned vector loads are handled differently in each architecture.
AltiVec doesn't have misaligned loads and stores. The x86 architecture doesn't have a left shift
that can be used during runtime. (It has left shifts, but they are all left shifted by a value that must
be set at compile time, which is not useful for the purpose of aligning misaligned vector loads.)
This means that if you have AltiVec code for misaligned data structures, you must modify the
code. There is no acceptable way in SSE/SSE2 to make runtime decisions about how far to perform
a 128-bit left or right shift on a vector to account for misalignment. So, even though both
architectures have 16-byte aligned vector loads, you can not speedily extract the misaligned vector
from the middle of two adjacent loads in the classic AltiVec style on x86. All such shift instructions
take immediate arguments for the shift count that must be provided at compile time. Misaligned
vector loads and stores must be done explicitly using MOVDQU, MOVUPS, MOVUPD, and MASKMOVDQU.
With SSE3, you may want to use LDDQU for misaligned integer vector loads. All other loads and
stores, including immediate memory operands, must be 16-byte aligned. This may also mean
substantial changes to your loop structure to avoid reading off the end of an array whose length
is not a multiple of 16-bytes. See “Aligning Data” (page 59) for more information.
■
When a vector load or store occurs the entire vector is byte swapped in a full 16-byte swap.
The order of the elements in the vector is reversed relative to memory, as shown in Figure 5-1.
Not only are the bytes reversed, but the element order is too. You must take this reordering and
byte swapping into account when performing permute and 128-bit shift operations.
Figure 5-1
Vector elements in memory order compared to register order
Memory
00 01 02 03
04 05 06 07
08 09 0A 0B
0C 0D 0E 0F
0B 0A 09 08
07 06 05 04
03 02 01 00
xmm Register
0F 0E 0D 0C
■
The ANDN instruction, which performs an and-with-complement operation is backwards from
vec_andc().
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
55
C H A P T E R
5
Preparing Vector-Based Code
The complement is taken of the first argument, not the second. This usually means your mask is
destroyed. The operation does the following:
A = ~A & B
■
There are no fused multiply-add operations.
■
There are no x86 counterparts to vec_splat_u8() and vec_lvsl() for generating vector constants.
Most vector constants must be loaded from storage. A few such as 0 and –1 can be created with
clever application of XOR and the vector compare instructions.
The Programming Model
You don’t need to write in assembly language when you move your code to the Intel architecture.
Intel provides a C programming model for MMX, SSE, SSE2, and SSE3. The compiler provides x86
intrinsic functions that are equivalent to AltiVec intrinsic functions. The list of intrinsic functions that
begin with _mm_* are similar to the vec_* functions found in AltiVec.
For example, the following addition operation in AltiVec:
vector float a, b, c;
a = vec_add( b, c );
is accomplished in SSE, as follows:
__m128 a, b, c;
a = _mm_add_ps(b,c);
Even though the ADDPS instruction is destructive (b and a must be the same register in practice) this
works in the C programming language. The compiler issues a MOVAPS instruction to perform a
register-to-register copy first, to preserve the value of b if it is needed later.
The usual assortment of operators normally found in AltiVec is available to a large degree. SSE and
SSE2 also have full precision divide and square root instructions. SSE and SSE2 lack the multiply-add
fused core and most of the special purpose multiply-add and multiply-sum integer instructions in
the AltiVec vector complex integer unit. Something similar to vec_sel is also missing, but you can
easily replace it using Boolean operators as follows:
result = a & mask | b & ~mask
Floating-point code translates especially well. Performance is usually better if you reroll long, unrolled
AltiVec loops a bit.
The Intel intrinsics use a suffix to indicate the data type they operate on. These are listed in Table
5-1.
Table 5-1
Suffixes and the corresponding data type for Intel intrinsics
Suffix
Data Type
epi8
signed 8-bit integer
epi16 signed 16-bit integer
epi32 signed 32-bit integer
56
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
5
Preparing Vector-Based Code
Suffix
Data Type
epu8
unsigned 8-bit integer
epu16 unsigned 16-bit integer
si128 128-bit integer
ps
packed single-precision floating-point
ss
scalar single-precision floating-point
pd
packed double-precision floating-point
sd
scalar double-precision floating-point
Similar to AltiVec, there is a list of dedicated vector data types, as shown in Table 5-2. All of the
integer types share a single __m128i type.
Table 5-2
Equivalent SSE2/SSE and AltiVec data types
SSE/SSE2 AltiVec Equivalent
__m128
vector float
__m128d
vector double (if AltiVec had these)
__m128i
vector (signed/unsigned/bool) (char/short/long)
Although you can use the SSE/SSE2 data types that are listed in Table 5-2, it’s preferable for you to
use the data types listed in Table 5-3. You can use these types on PowerPC as well as x86. Using these
types allow you to write a single piece of vector code that works on both processors. The 64-bit and
double-precision types don’t have a hardware equivalent under AltiVec. The names exist under
AltiVec so you can use them, but you can’t perform any arithmetic on these types.
Before you use any of the types listed in Table 5-3 you must include the Accelerate framework header.
You do not need to link against the framework, simply include the following statement in your code:
#include <Accelerate/Accerlate.h>
You can use the data types in Table 5-3 with the Intel-style intrinsics, such as such as _mm_slli_epi32(
__m128i v, imm ). The compiler issues the appropriate warnings if you mix floats, doubles, and
integer types, but does not issue a warning if you mix any of the “integer flavor” types from Table
5-3—as long as you mix only the integer types.
Table 5-3
Apple and Intel names for vector data types
Vector of Packed
Apple Name Intel Name
unsigned char
vUInt8
__m128i
signed char
vSInt8
__m128i
unsigned short
vUInt16
__m128i
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
57
C H A P T E R
5
Preparing Vector-Based Code
Vector of Packed
Apple Name Intel Name
signed short
vSInt16
__m128i
unsigned int
vUInt32
__m128i
signed int
vSInt32
__m128i
unsigned long long vUInt64
__m128i
signed long long
vSInt64
__m128i
32-bit boolean
vBool32
__m128i
float
vFloat
__m128
double
vDouble
__m128d
See “x86 Equivalent Instructions for AltiVec Instructions” (page 73) for an extensive list of equivalents.
You can find a full listing of the Intel intrinsic functions in the Intel Architecture Software Developers
Manual, Volume 2: Instruction Set Reference, Appendix C. Note that some MMX and SSE2 instructions
share the same name.
Building x86 ISA Code
You need to use Xcode 2.1 to build x86 ISA code for a Macintosh using an Intel microprocessor. The
compiler enables the use of MMX, SSE, and SSE2 instructions by default. But it also supports SSE3.
To turn on SSE3 support, pass the -msse3 flag to the compiler. To use the Intel SIMD C intrinsics,
you must also include the appropriate header file, as shown in Table 5-4 (page 58).
Table 5-4
Header File
Header file and instruction set
Instruction Set
pmmintrin.h SSE3, SSE2, SSE, and MMX
emmintrin.h SSE2, SSE, MMX
In Max OS X AltiVec, by default, flushes denormals to zero. This is not true for denormals on the x86
architecture. If you expect your code to encounter denormals frequently, to get the best performance
on a Macintosh using an Intel microprocessor, you’ll want to turn on the FZ and DAZ bits in the MXCSR
register as shown in Listing 5-1.
A typical vector add (ADDPS) with normalized inputs takes from 2 to 5 cycles depending on the method
you use to count cycles. Performing an ADDPS operation on a denormalized input takes about 1500
cycles. Turning on the FZ and DAZ bits has the effect of converting all denormals to zero prior to
performing the calculation. Denormals produced as results are flushed to zero.
Listing 5-1
Code that turns on the FZ and DAZ bits
// Read the MXCSR register.
int oldmxcsr = _mm_getcsr();
58
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
5
Preparing Vector-Based Code
// Make a copy with the FZ (0x8000) and DAZ (0x0040) bits turned on.
int newmxcsr = oldmxcsr | 0x8040;
// Set the MXCSR register with the new value.
_mm_setcsr( newmxcsr );
Aligning Data
Data alignment is important for getting the best performance from your code. Misaligned data performs
more slowly—sometimes more than ten times slower—than aligned data. SIMD alignment on AltiVec
and under x86 use very different approaches to solve alignment issues. You’ll want to read this section
to understand the differences and for guidance on how to solve alignment problems.
AltiVec only performs aligned loads and relies on a patented permute crossbar to extract a misaligned
vector from the two bracketing aligned vectors. While x86 provides aligned vector loads, it doesn’t
have a permute operation. The long vector left and right shifts take immediate arguments that must
be determined at compile time, as do the shuffle instructions. As a result, it is not easy to perform
software alignment. The x86 architecture instead provides hardware support for misaligned vector
loads and stores.
Note: Memory allocated by the malloc routine is always 16-byte aligned, as are __m128, __m128i,
and __m128d data types on the stack or passed as functions arguments.
Hardware support for misaligned loads and stores is not free of alignment problems. It introduces
its own special difficulties. In particular, you must take additional care to avoid reading off the end
of an array onto unmapped memory. When writing code for AltiVec, since all vectors are 16-byte
aligned, as long as you know that at least one byte in the vector is a valid byte that is mapped in
memory space, the entire vector is safe to load.
Memory is mapped as aligned 4 KB pages. An aligned 16-byte vector never crosses a page boundary.
If one byte in the vector is known to exist, the entire 16-byte aligned vector must also be on the same
page and therefore must also exist. This is not true of misaligned loads, for which it is possible for a
vector to span a page boundary. You must be careful to make sure that all bytes in the vector exist
before you load it or write to it. Figure 5-2 illustrates the problem of misaligned data. Note that the
last unaligned vector in the figure contains some bytes that might not be mapped into memory. Trying
to perform an unaligned load or store of this vector causes a crash if some of the unknown bytes are
on an unmapped page.
Figure 5-2
Misaligned data
page boundary
Data
vector
This vector contains
bytes that might not be
mapped to memory.
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
59
C H A P T E R
5
Preparing Vector-Based Code
This restriction can cause the following problems at the ends of arrays:
1.
If the length of the misaligned array is not an integer multiple of 16 bytes, the last vector load
and store from a naive implementation will include bytes that extend off the end of the array.
You can solve this in one of the following ways:
■
Perform the calculation for the last few bytes in scalar code. For example, the data in Figure
5-3 that extends beyond the vector can be handled in scalar code.
Figure 5-3
Bytes that extend off the array
scalar
calculation
scalar
calculation
Data
16-byte aligned vector
■
Load the data into the register using scalar loads and stores and perform the calculation in
the vector unit. This might have negligible performance advantages.
■
Write the calculation to use only aligned loads, even if the data is misaligned, as shown in
Figure 5-4. Use the data in whatever alignment you are given, since there is no way to fix it
in the register. This means that some of the data that you load at the ends of the arrays is
mystery data and you must discard it appropriately such that the discarded data do not affect
the results. Use MASKMOVDQU at the array edges to store only to bytes that exist (see Figure
5-8 (page 63) and the associated discussion). As per AltiVec, be careful not to load vectors
that are entirely empty.
Figure 5-4
Misaligned data with unknown data at each end
Data
16-byte aligned vector
■
60
Exit the loop one iteration early, then back-step the pointers by an appropriate number of
bytes such that the last loads and stores in the loop terminate on the last byte in the array,
and then perform the last loop iteration. This yields some overlap between the first part of
the results of the last loop iteration with the last part of the previous loop iteration, as shown
in Figure 5-5. As long as the calculation always produces the same results given the same
inputs, you are in effect overwriting those values with the same result. Overwriting should
not be a problem. You need to provide code to handle the special case in which the entire
array is smaller than one loop iteration.
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
5
Preparing Vector-Based Code
Figure 5-5
The back-step method for handling unaligned data
page boundary
Data
unaligned vector
area of recalculation
2.
You cannot safely load scalars or other data quantities that are smaller than 16 bytes using a full
vector unaligned load. If you use an aligned vector that contains the scalar, it may be loaded
safely, but in general there is no good way to move the scalar to a known position where it is
useful. The best way to accomplish this task is to load the scalar to a known alignment in the
register. There are several methods for moving partial vectors into an xmm register.
■
You can load scalars into a known location in the low-order bytes of the vector using the
scalar load and store instructions, as shown in Figure 5-6. These do not load the scalar into
a relatively aligned location like vec_lde. Instead they always go to the low-order edge, so
that you can maneuver them using permute maps or shift values that can be determined at
compile time.
Figure 5-6
Loading a scalar into a known location
scalar data
_mm_load_ss
xmm register
■
All the scalar loads are for 4- or 8-byte quantities. If you need to load a short or byte, load it
in the integer unit using normal integer byte or word loads and then pass it to the vector unit
using MOVD.
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
61
C H A P T E R
5
Preparing Vector-Based Code
Figure 5-7
Loading a single-byte or word (16 bits)
single byte data
sign extend
mov
32-bit int register
xmm register
■
3.
If you know what the data that surrounds the scalar is, you can just load it using a misaligned
load.
For partial vector stores, use MASKMOVDQU as shown in Figure 5-8. This stores an unaligned vector
according to a mask. Only bytes that survive the mask are stored. The MASKMOVDQU mask for
misaligned data regions at the end of an array can be quickly created by performing an aligned
store of a vector of all zeros next to a vector of all 0xFF values and then performing a misaligned
load of the desired map from the middle region.
The following code shows how to create a mask that is 0xFF for the first N bytes, and 0x00
afterward.
__m128i CreateTrailingStoreMask( int N )
{
__m128i mask[2];
//Set second vector to 0
mask[1] = (__m128i) _mm_setzero_ps();
//Set first vector to –1
mask[0] = _mm_cmpeq_epi8( mask[1], mask[1] );
return _mm_loadu_si128( (__m128i*) ((char*) &mask[1] – N ));
}
If the array length is a multiple of 16 bytes, you can use the complement of this mask to perform
the leading partial vector store.
62
Rewriting AltiVec Instructions
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
5
Preparing Vector-Based Code
Figure 5-8
Using MASKMOVDQU for partial vector stores
mask
unaligned vector
MASKMOVDQU
4.
There is no vec_splat analog for 8- and 16-bit types. Use PUNPACK* to splat out the 8- or 16-bit
int to a 32-bit quantity and PSHUFD/SHUFPS/SHUFPD instructions to splat to the rest of the vector.
Detecting Vector Unit Availability
To detect the presence of the vector unit at run time, you can use the sysctlbyname function to obtain
information about the features supported by specific processors. The selectors you use to obtain
PowerPC features are listed in Table 5-5. The selectors for Intel processor features are listed in Table
5-6. The sysctlbyname function returns 1 if a feature is supported and recommended. It returns 0 if
the feature is supported but not expected to help performance. When you use this function, it’s best
to test for a nonzero value because future versions of the selectors might return values larger than 1.
Table 5-5
Selectors used to obtain features of a PowerPC processor
Selector
Specifies
hw.optional.floatingpoint Floating-point instructions
hw.optional.altivec
AltiVec instructions
hw.optional.graphicsops
Graphics operations
hw.optional.64bitops
64-bit instructions
hw.optional.fsqrt
Hardware floating-point square root instructions
hw.optional.stfiwx
Floating-point values stored as integer word indexed instructions
hw.optional.dcba
Data cache block allocate instruction
hw.optional.datastreams
Data streams instructions
hw.optional.dcbtstreams
Data cache block touch streams instruction form
Detecting Vector Unit Availability
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
63
C H A P T E R
5
Preparing Vector-Based Code
Table 5-6
Selectors used to obtain features of an Intel processor
Selector
Specifies
hw.optional.floatingpoint Floating-point instructions
hw.optional.mmx
Original MMX vector instructions
hw.optional.sse
Streaming SIMD extensions
hw.optional.sse2
Streaming SIMD extensions 2
hw.optional.sse3
Streaming SIMD extensions 3
Listing 5-2 shows how to use the sysctlbyname function to check for the availability of SSE3.
Listing 5-2
Code that checks for processor-specific features.
#include <sys/sysctl.h>
int hasSSE3 ( void )
{
int hasSSE3 = 0;
size_t length = sizeof(hasSSE3);
int error = sysctlbyname ("hw.optional.sse3",
&hasSSE3, &length, NULL, 0);
if( 0 == error )
return hasSSE3;
return 0;
}
On either architecture, you can use the HW_VECTORUNIT selector and the sysctl function to determine
whether or not a vector unit is available. The associated value for the selector is an int data type that
can be 0 or 1.
You have two options that you can use to determine specifically which vector units are available. You
can check the values of the hw.optional selectors (see Table 5-6 (page 64))or you can use the sysctl
function to check the machdep.cpu.feature_bits and machdep.cpu.extfeature_bits values,
which directly reflect the results of the cpuid instruction.
Listing 5-3 shows how to detect the availability of specific Intel vector unit types by checking the
hw.optional selectors.
Listing 5-3
Code that detects vector unit types
#include <sys/sysctl.h>
static int sysctlbynameuint64 ( const char* name, uint64_t * value )
{
int result;
size_t size = sizeof( *value );
result = sysctlbyname ( name, value, & size, NULL, 0 );
if ( result == 0 )
{
if ( size == sizeof( uint64_t ) )
64
Detecting Vector Unit Availability
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
C H A P T E R
5
Preparing Vector-Based Code
;
else if ( size == sizeof( uint32_t ) )
*value = * ( uint32_t *) value;
else if ( size == sizeof( uint16_t ) )
*value = * ( uint16_t *) value;
else if ( size == sizeof( uint8_t ) )
*value = * ( uint8_t *) value;
else
{
#if DEBUG
fprintf ( stderr, "** ERROR: sysctlbyname() returned something
other than a known sized item uint64_t sized result when
called for %s, size=%d\n", name ? name : "NULL", size );
#endif
}
}
return result;
}
int hasNamedVectorUnit(const char* name)
{
char buffer[1024];
uint64_t val;
snprintf(buffer, sizeof(buffer), "hw.optional.%s", name);
return sysctlbynameuint64(buffer, &val) == 0? val == 1 : 0;
}
enum
{
kScalarOnly = 0,
kAltiVec = 1,
kMMXPresent = 2,
kMMXandSSEPresent = 3,
kMMXandSSEandSSE2Present = 4,
kMMXandSSEandSSE2andSSE3Present = 5
/* larger values reserved for future expansion */
};
int GetVectorTypeAvailable( void )
{
#if __i386__
if (hasNamedVectorUnit("sse3"))
return kMMXandSSEandSSE2andSSE3Present;
if (hasNamedVectorUnit("sse2"))
return kMMXandSSEandSSE2Present;
if (hasNamedVectorUnit("sse"))
return kMMXandSSEPresent;
if (hasNamedVectorUnit("mmx"))
return kMMXPresent;
#endif
#if __ppc__
if (hasNamedVectorUnit("altivec"))
return kAltiVec;
#endif
return kScalarOnly;
}
Detecting Vector Unit Availability
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
65
C H A P T E R
5
Preparing Vector-Based Code
For more information see Man Pages for documentation for sysctl and sysctlbyname. and the
sysctl.h header file available in the Kernel framework (sys/sysctl.h)
See Also
The following resources are relevant for rewriting AltiVec instructions for the Intel architecture:
■
“Fast Matrix Multiplication” (page 79) shows how to write a fast matrix multiplication function
with a minimum of architecture-specific coding.
■
The Apple AltiVec website:
http://developer.apple.com/hardware/ve/
■
Intel software manuals, which describe the x86 vector extensions:
http://developer.intel.com/design/Pentium4/documentation.htm
■
Perf-Optimization-dev is a list for discussions on analyzing and optimizing performance in Mac
OS X. You can subscribe at:
http://lists.apple.com/mailman/listinfo/perfoptimization-devlists.apple.com
■
The SIMD website for developers who use SIMD microprocessor instructions:
http://www.simdtech.org/home
66
See Also
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
A
Rosetta
Rosetta is a translation process that runs a PowerPC binary on an Macintosh using an Intel
microprocessor—it allows applications to run as nonnative binaries. Many, but not all, applications
can run translated. Applications that run translated will never run as fast as they run as a native
binary because the translation process itself incurs a processing cost. How compatible your application
is with Rosetta depends on the type of application it is. Applications that have a lot of user interaction
and low computational needs, such as a word processor, are quite compatible. Those that have a
moderate amount of user interaction and some high computational needs or that use OpenGL are,
in most cases, also quite compatible. Those that have intense computing needs aren’t compatible.
This includes applications that need to repeatedly compute fast Fourier transforms (FFTs), that
compute complex models for 3-D modelling, or compute ray tracing. To the user, Rosetta is transparent.
Unlike Classic, when the user launches an application, there aren’t any visual cues to indicate that
the application is translated. The user may perceive that the application is slow to start up or that the
performance is slower than it is on a Macintosh using a PowerPC microprocessor. The user can
discover whether an application has only a PowerPC binary by looking at the Finder information for
the application. (See “Determining Whether a Binary is Universal” (page 18).) The purpose of this
appendix is to discuss the sorts of applications that can run translated, describe how Rosetta works,
point out special considerations for translated applications, and provide troubleshooting information
if your application won’t run translated but you think that it should.
What Can Be Translated?
Rosetta is designed to translate currently shipping applications that run on a PowerPC with a G3
processor and that are built for Mac OS X.
Rosetta does not run the following:
■
Applications built for Mac OS 8 or 9
■
Code written specifically for AltiVec
■
Code that inserts preferences in the System Preferences pane
■
Applications that require a G4 or G5 processor
■
Applications that depend on one or more kernel extensions
■
Kernel extensions
■
Bundled Java applications or Java applications with JNI libraries that can’t be translated
What Can Be Translated?
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
67
A P P E N D I X
A
Rosetta
How It Works
When an application launches on a Macintosh using an Intel microprocessor, the kernel detects
whether the application has a native binary. If the binary is not native, the kernel launches the binary
using Rosetta. If the application is one of those that can be translated, it launches and runs, although
not as fast as it would if run as a native binary. Behind the scenes, Rosetta translates and executes the
PowerPC binary code.
Rosetta runs in the same thread of control as the application. When Rosetta starts an application, it
translates a block of application code and executes that block. As Rosetta encounters a call to a routine
that it has not yet translated, it translates the needed routine and continues the execution. The result
is a smooth and continual transitioning between translation and execution. In essence, Rosetta and
your application work together in a kind of symbiotic relationship.
Rosetta optimizes translated code to deliver the best possible performance on the nonnative
architecture. It uses a large translation buffer and it caches code for reuse. Code that gets reused
repeatedly in your application benefits the most because it needs to be translated only once. The
system uses the cached translation, which is faster than translating the code again.
Special Considerations
Rosetta must run the entire process when it translates. This has implications for applications that use
third-party plug-ins or any other component that must be loaded at the time your application launches.
All parts (application, plug-ins, or other components needed at launch time) must run either
nonnatively or natively. For example, if your application has both an x86 binary and a PowerPC
binary, but it uses a plug-in that has only a PowerPC binary, then your application needs to run
nonnatively on a Macintosh using an Intel microprocessor in order to use the nonnative plug in.
Rosetta takes endian issues into account when it translates your application. Multibyte data that
moves between your application and any system process is automatically handled for you—you don’t
need to concern yourself with the endian format of the data.
The following kinds of multibyte data can have endian issues if the data moves between:
■
Your translated application and a native process that's not a system process.
■
A custom pasteboard provided by your translated application and a custom pasteboard provided
by a native application.
■
Data files or caches provided by your translated application and a native application.
You might encounter this scenario while developing a universal binary. For example, if you’ve created
a universal binary for a server process that your application relies on, and then test that process by
running your application as a PowerPC binary, the endian format of the data passed from the server
to your application would be wrong. You encounter the same problem if you create a universal binary
for your application, but have not yet done so for a server process needed by the application.
Structures that the system defines and that are written using system routines will work correctly. But
consider the code in Listing A-1.
68
How It Works
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
A
Rosetta
Listing A-1
A structure whose endian format depends on the architecture
typedef struct
{
int x;
int y;
} data_t
void savefile(data_t data, int filehandle)
{
write(filehandle, &data, sizeof(data));
}
When run under Rosetta, the application will write a big-endian structure; x and y are both written
as big-endian integers. When the application runs natively on a Macintosh using an Intel
microprocessor, it will write out a little-endian structure; x and y are written as little-endian integers.
It is up to you to define data formats on disk to be of a canonical endian format. Endian-specific data
formats are fine as long as any application that reads or write the data understands what the endian
format of the data is and treats the data appropriately.
Keep in mind that private frameworks and plug-ins can also encounter these sorts of endian issues.
If a private framework creates a cache or data file, and the framework is a universal binary, then it
will try to access the cache from both native and PPC processes. The framework either needs to account
for the endian format of the cache when reading or writing data or have two separate caches.
Forcing an Application to Run Translated
Applications that have only a PowerPC binary automatically run as translated on a Macintosh using
an Intel microprocessor. That is, assuming that the application meets the criteria described in “What
Can Be Translated?” (page 67). Applications that have a universal binary can be forced to launch as
a PowerPC binary on a Macintosh using an Intel microprocessor by selecting the “Open using Rosetta”
option shown in Figure A-1. To set the option, click the application icon, then press Cmd-I to open
the Info pane for the application.
You can force a command line tool to run translated by entering the following in Terminal:
ditto -arch ppc tool /tmp/<toolname>
/tmp/tool
You can set the default setting for the “Open using Rosetta” option by adding the following key to
the Info.plist of the application bundle.
<key>LSPrefersPPC</key>
<true/>
This key informs the system that the application should launch as a PowerPC binary and causes the
“Open using Rosetta” checkbox to be selected. You might find this useful if you ship an application
that has plug-ins that are not native at the time of shipping.
Forcing an Application to Run Translated
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
69
A P P E N D I X
A
Rosetta
Figure A-1
The Info pane for the Calculator application
Troubleshooting
If you are convinced that your application falls into the category of those that should be able to run
through Rosetta, but it doesn’t run or it has unexpected behavior, you can follow the procedure in
this section to debug your application. This procedure works only for PowerPC binaries—not for a
universal binary—and is the only way you can debug a PowerPC binary on a Macintosh using an
Intel microprocessor. Xcode debugging does not work for translated applications.
To debug a PowerPC binary on a Macintosh using an Intel microprocessor, follow these steps:
70
Troubleshooting
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
A
Rosetta
1.
Open Terminal.
2.
Enter the following two lines:
For tcsh:
setenv OAH750_GDB YES
/<path>/<your_application>.app/Contents/MacOS/<your_application>
For bash:
export OAH750_GDB=YES
/<path>/<your_application>.app/Contents/MacOS/<your_application>
You’ll see the Rosetta process launch and wait for a port connection.
Figure A-2
Rosetta listens for a port connection
3.
Launch your application.
4.
Open a second terminal window and startup GDB with the following command:
gdb --oah750
Using GDB on a Macintosh using an Intel microprocessor is just like using GDB on a Macintosh
using a PowerPC microprocessor.
5.
Attach your application.
attach <your_application>
6.
Press Tab.
GDB automatically appends the process id (pid) to your application name.
7.
Press Return.
8.
Type C to execute your application.
Important: Do not type run. Typing run will not execute your code. It will leave your application
in a state that requires you to start over from the first step.
Figure A-3 shows the commands for debugging a PowerPC binary.
Troubleshooting
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
71
A P P E N D I X
A
Rosetta
You can debug in much the same way as you would debug a native process except that you can’t call
functions—either explicitly or implicitly—from within GDB. For example, you can’t inspect CF objects
by calling CFShow. You also can’t set conditional breakpoints.
Keep in mind that symbol files aren’t loaded at the start of the debugging session. They are loaded
after your application is up and running. This means that any breakpoints you set are “pending
breakpoints” until the executable and libraries are actually loaded.
Figure A-3
72
Terminal windows with the commands for debugging a PowerPC binary on a Macintosh using an
Intel microprocessor
Troubleshooting
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
B
x86 Equivalent Instructions for AltiVec
Instructions
This section is for developers who need to move code from the AltiVec instruction set architecture
to the x86 vector instruction set architecture. It provides Intel intrinsic routines that are equivalent to
their AltiVec counterparts.
Intrinsics
The equivalent routines assume that the vector element order is not reversed in the register as a result
of a load operation—element 0 is on the left. This means that some of the operations that use
permutations on x86 (such as vec_unpack) are a bit different from what appears in the x86 manual.
Table B-1 lists AltiVec intrinsics and the nearest equivalent xmm_ intrinsics that are used by the GCC
4.0 compiler. When an equivalent routine is listed in the table, it should function properly when you
use it in your revised code. If you don’t find an equivalent for an AltiVec instruction, you need to
look elsewhere. The table isn’t complete, so it is possible that an equivalent exists but it simply isn’t
listed. See the Velocity Engine website for updates.
Table B-1
AltiVec intrinsics and x86 equivalent instructions
C Intrinsic for AltiVec
x86 Equivalent
vec_abs( A )
_mm_min_epu8(A, _mm_sub_epi8( 0, A)) _mm_max_epi16( A,
_mm_sub_epi16( 0, A)) _mm_and_ps( A, 0x7FFFFFFF )
vec_abss( A )
-
vec_add( A, B )
_mm_add_epi8(A, B ) _mm_add_epi16( A, B ) _mm_add_epi32( A,
B) _mm_add_ps( A, B )
vec_addc( A, B )
See PADDQ.
vec_adds( A, B )
_mm_adds_epi8( A, B) _mm_adds_epi16( A, B)
vec_adds( A, B )
_mm_adds_epu8( A, B) _mm_adds_epu16( A, B)
vec_and( A, B )
_mm_and_si128( A, B) _mm_and_ps( A, B )
vec_andc( A, B )
_mm_andnot_si128( A, B) _mm_andnot_ps( A, B )
Intrinsics
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
73
A P P E N D I X
B
x86 Equivalent Instructions for AltiVec Instructions
74
C Intrinsic for AltiVec
x86 Equivalent
vec_avg( A, B )
-
vec_avg( A, B )
_mm_avg_epu8( A, B) _mm_avg_epu16( A, B)
vec_ceil( A, B )
No easy translation. For an algorithm, see
http://developer.apple.com/hardware/ve/algorithms.html#fast_floor
vec_cmpb( A, B )
Nothing direct. See _mm_movemask_ps.
vec_cmpeq( A, B )
_mm_cmpeq_epi8( A, B) _mm_cmpeq_epi16( A, B) _mm_cmpeq_epi32(
A, B) _mm_cmpeq_ps( A, B )
vec_cmpge( A, B )
_mm_cmpge_ps( A, B )
vec_cmpgt( A, B )
_mm_cmpgt_epi8( A, B ) _mm_cmpgt_epi16( A, B) _mm_cmpgt_epi32(
A, B ) _mm_cmpgt_epi_ps( A, B )
vec_cmpgt( A, B )
!_mm_cmpeq_epi8(A, _mm_min_epu8(A,B)) !_mm_cmpeq_epi16(A,
_mm_min_epu16(A,B))
vec_cmple( A, B )
!_mm_cmpgt_epi8(A, B) !_mm_cmpgt_epi16(A, B)
!_mm_cmpgt_epi32(A, B) _mm_cmple_ps( A, B )
vec_cmplt( A, B )
_mm_cmpgt_epi8( B, A ) _mm_cmpgt_epi16( B, A )
_mm_cmpgt_epi32( B, A ) _mm_cmplt_ps(A, B)
vec_cmplt( A, B )
!_mm_cmpeq(A, _mm_max_epu8(A,B)) !_mm_cmpeq(A,
_mm_max_epu16(A,B))
vec_ctf( A )
_mm_cvtepi32_ps( A )
vec_ctf( A )
-
vec_cts( A, B )
_mm_cvttps_epi32( A)
_mm_cvtps_epi32(A)
vec_ctu( A, B )
See vec_cts.
vec_dss( A )
-
vec_dss_all( )
-
vec_dst( A, B, C )
-
vec_dstst( A, B, C )
-
vec_dststt( A, B, C )
-
vec_dstt( A, B, C )
-
vec_expte( A )
-
vec_floor( A, B )
See “vec_floor Routine” (page 78).
Intrinsics
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
B
x86 Equivalent Instructions for AltiVec Instructions
C Intrinsic for AltiVec
x86 Equivalent
vec_ld( A, B )
_mm_load_si128( B + A ) _mm_load_si128( (short*)((char*)B +
A) ) _mm_load_si128( (int*)((char*)B + A) ) _mm_load_ps(
(float*)((char*) B + A))
vec_ld( A, B )
_mm_loadu_si128( B + A ) _mm_loadu_si128( (short*)((char*)B
+ A) ) _mm_loadu_si128( (int*)((char*)B + A)) _mm_loadu_ps(
(float*)((char*) B + A))
vec_lde( A, B )
See “Aligning Data” (page 59). _mm_load_ss( (float*)((char*) B +
A))
vec_ldl
-
vec_loge( A )
-
vec_lvsl( A, B )
-
vec_lvsr( A, B )
-
vec_madd( A, B, C )
_mm_add_ps( _mm_mul_ps( A, B ), C )
vec_madds( A, B, C)
-
vec_max( A, B )
_mm_max_epi16( A, B) _mm_max_ps( A )
vec_max( A, B )
_mm_max_epu8( A, B)
vec_mergeh( A, B )
_mm_unpacklo_epi8( A, B) _mm_unpacklo_epi16( A, B)
_mm_unpacklo_epi32( A, B) _mm_unpacklo_ps( A, B )
vec_mergel( A, B )
_mm_unpackhi_epi8( A, B) _mm_unpackhi_epi16( A, B)
_mm_unpackhi_epi32( A, B) _mm_unpackhi_ps( A, B )
vec_mfvscr
-
vec_min( A, B )
_mm_min_epi16( A, B ) _mm_min_ps( A )
vec_min( A, B )
_mm_min_epu8( A, B)
vec_mladd( A, B, C)
_mm_add_epi16( _mm_mullo_epi16( A, B ), C )
vec_mradds( A, B, C)
-
vec_msum( A, B, C )
_mm_add_epi32( _mm_madd_epi16( A, B ), C )
vec_msum( A, B, C )
-
vec_msums( A, B, C)
_mm_adds_epi32( _mm_madd_epi16( A, B ), C )
vec_msums( A, B, C)
-
vect_mtvscr( A )
-
vec_mule( A, B )
See PMULHUW, PMULHW, PMULW, and PUNPCKLWD, PUNPCKHWD.
Intrinsics
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
75
A P P E N D I X
B
x86 Equivalent Instructions for AltiVec Instructions
C Intrinsic for AltiVec
x86 Equivalent
vec_mulo( A, B )
See PMULHUW, PMULHW, PMULW, and PUNPCKLWD,PUNPCKHWD.
vec_nmsub( A, B, C )
_mm_sub_ps( C, _mm_mul_ps( A, B ) )
vec_nor( A, B )
_mm_xor_si128( _mm_or_(A, B), -1) _mm_xor_ps( _mm_or(A, B
), -1L )
vec_or( A, B )
_mm_or_si128( A, B) _mm_or(A, B )
vec_pack( A, B )
-
vec_packpx( A, B )
-
vec_packs( A, B )
_mm_packs_epi16( A, B) _mm_packs_epi32( A, B)
vec_packsu( A, B )
_mm_packus_epi16( A, B)
vec_perm( A, B, C )
See PSHUFHW and PSHUFLW. See PSHUFD. See SHUFPS.
vec_re( A )
_mm_rcp_ps( A )
vec_rl( A, B )
-
vec_round( A, B )
-
vec_rsqrte( A )
_mm_rsqrt_ps( A )
vec_sel( A, B, MASK )
_mm_or_si128(_mm_and_si128( B, MASK),
_mm_andnot_si128( A, MASK ) ) _mm_or_ps(_mm_and_ps( B, MASK)
_mm_andnot_ps( A, MASK ) )
vec_sl( A, B )
_mm_slli_epi16(A, B ) or _mm_sll_epi16(A, B) _mm_slli_epi32(A,
B ) or _mm_sll_epi32(A, B)
vec_sld( A, A, i*4 )
_mm_shuffle_ps( A, A,
_MM_SHUFFLE( ((3-i) & 3),( (2-i)&3),((1-i)&3), ((-i)&3)))
vec_sld( A, B, i )
_mm_or_si128(_mm_slli_si128(A, i),
_mm_srli_si128(B, 16-i)) _mm_or_ps(_mm_slli_si128(A, i),
_mm_srli_si128(B, 16-i))
vec_sll( A, B )
-
vec_slo( A, i )
_mm_slli_si128(A, i)
vec_splat( A, i )
See PSHUFHW and PSHUFLW. _mm_shuffle_ps( A, A,
_MM_SHUFFLE(i,i,i,i))
76
vec_splat_s8( A )
-
vec_splat_s16( A )
-
vec_splat_s32( A )
-
Intrinsics
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
B
x86 Equivalent Instructions for AltiVec Instructions
C Intrinsic for AltiVec
x86 Equivalent
vec_splat_u8( A )
-
vec_splat_u16( A )
-
vec_splat_u32( A )
-
vec_sr( A, i )
_mm_sri_epi16(A, i) _mm_sri_epi32(A, i)
vec_sr( A, B )
_mm_sr_epi16(A, B) _mm_sr_epi16(A, B)
vec_sra( A, i )
_mm_srai_epi16(A, i) _mm_srai_epi32(A, i)
vec_sra( A, B )
_mm_sra_epi16(A, B) _mm_sra_epi32(A, B)
vec_srl( A, B )
-
vec_sro( A, i )
_mm_srli_si128(A, i)
vec_sro(A,B)
_mm_srl_si128(A, B)
vec_st( A, i, p )
_mm_store_si128( p + i, A) _mm_store_si128( (short*)((char*)p
+ i), A) _mm_store_si128( (int*)((char*)p + i), A)
_mm_store_ps( (float*)((char*) p + i), A)
vec_st( A, i, p )
_mm_storeu_si128( p + i, A) _mm_storeu_si128(
(short*)((char*)p + i), A) _mm_storeu_si128( (int*)((char*)p
+ i), A) _mm_storeu_ps( (float*)((char*) p + i), A)
vec_ste( A, i, p )
See MASKMOVDQU. (Maybe also MOVSS) _mm_store_ss( (float*)((char*)
p + i), A)
vec_stl( A,B, C)
-
vec_sub( A, B )
_mm_sub_ps( A, B )
vec_subc( A, B )
See SUBQ .
vec_subs( A, B )
_mm_subs_epi8( A, B ) _mm_subs_epi16( A, B )
vec_subs( A, B )
_mm_subs_epu8( A, B ) _mm_subs_epu16( A, B )
vec_sums( A, B )
-
vec_sum2s( A, B )
-
vec_sum4s( A, B )
_mm_adds_epi32( _mm_madd_epi16( A, 1 ), B )
vec_trunc( A )
_mm_cvttps_epi32( A)
This performs truncation on the way to an int.
(SSE2)
vec_unpackh( A )
_mm_srai_epi16( _mm_unpacklo_epi8( A, 0 ), 8 )
vec_unpackl( A )
_mm_srai_epi16( _mm_unpackhi_epi8( A, 0 ), 8 )
Intrinsics
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
77
A P P E N D I X
B
x86 Equivalent Instructions for AltiVec Instructions
C Intrinsic for AltiVec
x86 Equivalent
vec_xor( A, B )
_mm_xor_si128( A, B) _mm_xor_ps( A, B )
vec_floor Routine
Listing B-1 lists an equivalent routine for vec_floor. Note that the routine is slightly incorrect because
the sign of –0.0f is not preserved. Instead, the routine returns 0.0f. All other results are correct. This
routine is much faster than floor.
Listing B-1
An equivalent routine for vec_floor
static inline __m128 vec_floor_ps( __m128 v )
{
static const __m128 twoTo23 = (const __m128) { 0x1.0p23f,
0x1.0p23f,
0x1.0p23f,
0x1.0p23f };
// b = fabs(v)
__m128 b = (__m128) _mm_srli_epi32( _mm_slli_epi32( (__m128i) v,
1 ), 1 );
// The essence of the floor routine
__m128 d = _mm_sub_ps( _mm_add_ps( _mm_add_ps( _mm_sub_ps( v,
twoTo23 ), twoTo23 ),
twoTo23 ), twoTo23 );
// –1 if v >= 2**23
__m128 largeMaskE = _mm_cmpgt_ps( b, twoTo23 );
// Check for possible off by one error
__m128 g = _mm_cmplt_ps( v, d );
// Convert positive check result to -1.0, negative to 0.0
__m128 h = _mm_cvtepi32_ps( (__m128i) g );
// Add in the error if there is one
__m128 t = _mm_add_ps( d, h );
//Select between output result and input value based on v >= 2**23
v = _mm_and_ps( v, largeMaskE );
t = _mm_andnot_ps( largeMaskE, t );
return _mm_or_ps( t, v );
} __attribute__ ((always_inline))
78
vec_floor Routine
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
C
Fast Matrix Multiplication
This section shows how to write a fast matrix multiplication function that works for both the PowerPC
and x86 architectures, with a minimum of platform-specific coding. Matrix multiplication is ideal for
this example because the following basic operations are available on both platforms:
■
vector loads and stores
■
multiplication
■
addition
■
an instruction to splat a float across a vector
For other types of calculations, you may need to write separate versions of code. Because of the
differences in the number of registers and the pipeline depths between the two architectures, it is
often advantageous to provide separate versions.
Platform-Specific Code
Listing C-1 (page 79) shows the platform-specific code you need to support matrix multiplication.
The code calls the architecture-independent function MyMatrixMultiply, which is shown in Listing
C-2 (page 83). The code shown in Listing C-1 works properly for both instruction set architectures
only if you build the code as a universal binary. For more information, see “Building a Universal
Binary” (page 13).
Note: The sample code makes use of a GCC extension to return a result from a code block ({}). The
code may not compile correctly on other compilers. The extension is necessary because you cannot
pass immediate values to an inline function, meaning that you must use a macro.
Listing C-1
Platform-specific code needed to support matrix multiplication
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
// For each vector architecture...
#if defined( __VEC__ )
// AltiVec
// Set up a vector type for a float[4] array for each vector type
Platform-Specific Code
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
79
A P P E N D I X
C
Fast Matrix Multiplication
typedef vector float vFloat;
// Define some macros to map a virtual SIMD language to
// each actual SIMD language. For matrix multiplication, the tasks
// you need to perform are essentially the same between the two
// instruction set architectures (ISA).
#define vSplat( v, i ) ({ vFloat z = vec_splat( v, i );
/* return */ z; })
#define vMADD
vec_madd
#define vLoad( ptr )
vec_ld( 0, ptr )
#define vStore( v, ptr )
vec_st( v, 0, ptr )
#define vZero() (vector float) vec_splat_u32(0)
#elif defined( __SSE__ )
// SSE
// The header file xmmintrin.h defines C functions for using
// SSE and SSE2 according to the Intel C programming interface
#include <xmmintrin.h>
// Set up a vector type for a float[4] array for each vector type
typedef __m128 vFloat;
// Also define some macros to map a virtual SIMD language to
// each actual SIMD language.
// Note that because i MUST be an immediate, it is incorrect here
// to alias i to a stackebased copy and replicate that 4 times.
#define vSplat( v, i )({ __m128 a = v; a = _mm_shuffle_ps( a, a, \
_MM_SHUFFLE(i,i,i,i) ); /* return */ a; })
inline __m128 vMADD( __m128 a, __m128 b, __m128 c )
{
return _mm_add_ps( c, _mm_mul_ps( a, b ) );
}
#define vLoad( ptr )
_mm_load_ps( (float*) (ptr) )
#define vStore( v, ptr )
_mm_store_ps( (float*) (ptr), v )
#define vZero()
_mm_setzero_ps()
#else
// Scalar
#warning To compile vector code, you must specify -faltivec,
-msse, or both- faltivec and -msse
#warning Compiling for scalar code.
// Some scalar equivalents to show what the above vector
// versions accomplish
// A vector, declared as a struct with 4 scalars
typedef struct
{
float
a;
float
b;
float
c;
float
d;
}vFloat;
// Splat element i across the whole vector and return it
#define vSplat( v, i ) ({ vFloat z; z.a = z.b = z.c = z.d = ((float*)
80
Platform-Specific Code
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
C
Fast Matrix Multiplication
&v)[i]; /* return */ z; })
// Perfrom a fused-multiply-add operation on platforms that support it:
// result = X * Y + Z
inline vFloat vMADD( vFloat X, vFloat Y, vFloat Z )
{
vFloat result;
result.a
result.b
result.c
result.d
=
=
=
=
X.a
X.b
X.c
X.d
*
*
*
*
Y.a
Y.b
Y.c
Y.d
+
+
+
+
Z.a;
Z.b;
Z.c;
Z.d;
return result;
}
// Return a vector that starts at the given address
#define vLoad( ptr ) ( (vFloat*) ptr )[0]
// Write a vector to the given address
#define vStore( v, ptr )
( (vFloat*) ptr )[0] = v
// Return a vector full of zeros
#define vZero()
({ vFloat z; z.a = z.b = z.c = z.
d = 0.0f; /* return */ z; })
#endif
// Prototype for a vector matrix multiply function
void MyMatrixMultiply( vFloat A[4], vFloat B[4], vFloat C[4] );
int main( void )
{
// The vFloat type (defined previously) is a vector or scalar array
// that contains 4 floats
// Thus each one of these is a 4x4 matrix, stored in the C storage order.
vFloat
A[4];
vFloat
B[4];
vFloat
C1[4];
vFloat
C2[4];
int i, j, k;
// Pointers to the elements in A, B, C1 and C2
float *a = (float*) &A;
float *b = (float*) &B;
float *c1 = (float*) &C1;
float *c2 = (float*) &C2;
// Initialize the data
for( i = 0; i < 16; i++ )
{
a[i] = (double) (rand() - RAND_MAX/2) / (double) (RAND_MAX );
b[i] = (double) (rand() - RAND_MAX/2) / (double) (RAND_MAX );
c1[i] = c2[i] = 0.0;
}
// Perform the brute-force version of matrix multiplication
// and use this later to check for correctness
Platform-Specific Code
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
81
A P P E N D I X
C
Fast Matrix Multiplication
printf( "Doing simple matrix multiply...\n" );
for( i = 0; i < 4; i++ )
for( j = 0; j < 4; j++ )
{
float result = 0.0f;
for( k = 0; k < 4; k++ )
result += a[ i * 4 + k] * b[ k * 4 + j ];
c1[ i * 4 + j ] = result;
}
// The vector version
printf( "Doing vector matrix multiply...\n" );
MyMatrixMultiply( A, B, C2 );
// Make sure that the results are correct
// Allow for some rounding error here
printf( "Verifying results..." );
for( i = 0 ; i < 16; i++ )
if( fabs( c1[i] - c2[i] ) > 1e-6 )
printf( "failed at %i,%i: %8.17g %8.17g\n", i/4,
i&3, c1[i], c2[i] );
printf( "done.\n" );
return 0;
}
The 4x4 matrix multiplication algorithm shown in Listing C-2 (page 83) is a simple matrix
multiplication algorithm performed with 4 columns in parallel. The basic calculation is as follows:
C[i][j] = sum( A[i][k] * B[k][j], k = 0... width of A )
It can be rewritten in mathematical vector notation for rows of C as the following:
C[i][] = sum( A[i][k] * B[k][], k = 0... width of A )
Where:
C[i][] is the ith row of C
A[i][k] is the element of A at row i and column k
B[k][] is the kth row of B
An example calculation for C[0][] is as follows:
C[0][] = A[0][0] * B[0][] + A[0][1] * B[1][] + A[0][2] * B[2][] + A[0][3] * B[3][]
This calculation is simply a multiplication of a scalar times a vector, followed by addition of similar
elements between two vectors, repeated four times, to get a vector that contains four sums of products.
Performing the calculation in this way saves you from transposing B to obtain the B columns, and
also saves you from adding across vectors, which is inefficient. All operations occur between similar
elements of two different vectors.
82
Platform-Specific Code
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
C
Fast Matrix Multiplication
For more details, see Matrix Multiplication in the Algorithms and Special Topics section, available at
the following website:
http://developer.apple.com/hardware/ve/algorithms.html
Architecture-Independent Matrix Multiplication
Listing C-2 (page 83) shows architecture-independent vector code that performs matrix multiplication.
This code compiles as scalar if you do not set up the appropriate compiler flags for PowerPC
(-faltivec) or x86 (-msse), or if AltiVec is unavailable on the PowerPC. The matrices used in the
MyMatrixMultply function assume the C storage order for 2D arrays, not the FORTRAN storage
order.
Listing C-2
Architecture-independent code that performs matrix multiplication
void MyMatrixMultiply( vFloat A[4], vFloat B[4], vFloat C[4] )
{
vFloat A1 = vLoad( A );
//Row 1 of A
vFloat A2 = vLoad( A + 1 );
//Row 2 of A
vFloat A3 = vLoad( A + 2 );
//Row 3 of A
vFloat A4 = vLoad( A + 3);
//Row 4 of A
vFloat C1 = vZero();
//Row 1 of C, initialized to zero
vFloat C2 = vZero();
//Row 2 of C, initialized to zero
vFloat C3 = vZero();
//Row 3 of C, initialized to zero
vFloat C4 = vZero();
//Row 4 of C, initialized to zero
vFloat
vFloat
vFloat
vFloat
B1
B2
B3
B4
=
=
=
=
vLoad(
vLoad(
vLoad(
vLoad(
B
B
B
B
);
+ 1 );
+ 2 );
+ 3);
//Multiply the first row of B by
C1 = vMADD( vSplat( A1, 0 ), B1,
C2 = vMADD( vSplat( A2, 0 ), B1,
C3 = vMADD( vSplat( A3, 0 ), B1,
C4 = vMADD( vSplat( A4, 0 ), B1,
//Row
//Row
//Row
//Row
1
2
3
4
of
of
of
of
B
B
B
B
the first column of A (do not sum across)
C1 );
C2 );
C3 );
C4 );
//
//
C1
C2
C3
C4
Multiply the second row of B by the second column of A and
add to the previous result (do not sum across)
= vMADD( vSplat( A1, 1 ), B2, C1 );
= vMADD( vSplat( A2, 1 ), B2, C2 );
= vMADD( vSplat( A3, 1 ), B2, C3 );
= vMADD( vSplat( A4, 1 ), B2, C4 );
//
//
C1
C2
C3
C4
Multiply the third row of B by the third column of A and
add to the previous result (do not sum across)
= vMADD( vSplat( A1, 2 ), B3, C1 );
= vMADD( vSplat( A2, 2 ), B3, C2 );
= vMADD( vSplat( A3, 2 ), B3, C3 );
= vMADD( vSplat( A4, 2 ), B3, C4 );
Architecture-Independent Matrix Multiplication
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
83
A P P E N D I X
C
Fast Matrix Multiplication
//
//
C1
C2
C3
C4
Multiply the fourth row of B by the fourth column of A and
add to the previous result (do not sum across)
= vMADD( vSplat( A1, 3 ), B4, C1 );
= vMADD( vSplat( A2, 3 ), B4, C2 );
= vMADD( vSplat( A3, 3 ), B4, C3 );
= vMADD( vSplat( A4, 3 ), B4, C4 );
// Write out the result to the destination
vStore( C1, C );
vStore( C2, C + 1 );
vStore( C3, C + 2 );
vStore( C4, C + 3 );
}
Note: It is not necessary for you to write your own matrix multiplication function, as fast versions of
matrix multiplication already exist. There is a function for 4x4 matrix multiplication in the Accelerate
framework (vecLib) that is tuned for both architectures. You can also call sgemm from Basic Linear
Algebra Subprograms (BLAS) (also available in the Accelerate framework) to operate on larger
matrices. The MyMarixMultiply function shown in the listing is provided only to illustrate how to
write architecture-independent code.
84
Architecture-Independent Matrix Multiplication
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
D
Application Binary Interface
The IA-32 Application Binary Interface (ABI) on a Macintosh using an Intel microprocessor is the
same as the System V IA-32 ABI with the following changes:
Small structs are returned in registers.
■
■
The stack is kept 16-byte aligned.
■
Large types are kept at their natural alignment.
This appendix lists the differences between the Mac OS X and System V IA-32 ABIs. You should use
the information here in conjunction with the System V information that’s detailed in System V
Application Binary Interface: Intel386 Architecture Processor Supplement, Fourth Edition, available from:
http://www.caldera.com/developers/devspecs/abi386-4.pdf
Important: This is preliminary documention for an application binary interface (ABI) in development.
This information is subject to change, and software implemented according to this document should
be tested with final operating system software and final documentation.
Data Types and Alignment
For IA-32, the following built in data types have the sizes and alignment listed in Table D-1.
Table D-1
Data types, sizes, and alignment for IA-32
Data Type
Size Alignment
bool
1
1
char
1
1
short
2
2
int
4
4
long
4
4
long long
8
8*
Data Types and Alignment
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
85
A P P E N D I X
D
Application Binary Interface
Data Type
Size Alignment
float
4
4
double
8
4
long double 16*
16*
* differs from the System V IA-32 ABI
Stack Structure
Similar to PowerPC, the IA-32 stack grows downward. That is, the stack pointer decreases as items
are pushed on the stack. The stack pointer (ESP) is kept 16-byte aligned at call site boundaries. (This
differs from the System V IA-32 ABI which keeps the stack pointer 4-byte aligned.) Unlike PowerPC,
the stack pointer is not atomically moved. A 16-byte aligned ESP ensures that EBP is 16-byte aligned
(plus 8) which enables the compiler to locate local variables at addresses that match their natural
alignment.
Parameter Passing
When a function is called, parameters are pushed on the stack in right-to-left order. The rightmost
argument in a C function is pushed on the stack first, and thus has the highest address in the stack.
All parameters are pushed in 32-bit multiples, so 8-bit and 16-bit integral types are promoted to 32-bit
before being pushed, and structs are tail-padded to 32-bit multiples. Parameters on the stack are only
guaranteed to be 4-byte aligned, thus long long and long double may not be naturally aligned.
Figure D-1 shows an example stack frame layout.
86
Stack Structure
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
D
Application Binary Interface
Figure D-1
Stack frame layout
Position
4n + 8 (%EBP)
Contents
High addresses
argument word n
....
8 (%EBP)
argument word 0
4 (%EBP)
return address
0 (%EBP)
previous %EBP (optional)
–4 (%EBP)
Frame
unspecified
Previous
16-byte aligned
Current
....
variable size
Low addresses
Because of the 16-byte alignment constraint for the stack pointer, there are two ways that the compiler
places parameters onto the stack. (This differs from System V IA-32 ABI which simply pushes
parameters.) The first way is that the compiler does not actually issue PUSH instructions when passing
parameters. Instead the stack frame is moved down sufficiently with a SUB instruction, then arguments
passed to the function are stored into this space between the EBP and ESP with MOV instructions.
The second way is for the compiler to total up the sizes of all parameters for a call site and calculate
the pad needed to increase that to a multiple of 16. Then, subtract the pad amount from the stack
pointer (for example, SUB 4,ESP), followed by a PUSH of each parameter. The end result is that the
stack pointer is 16-byte aligned at the time of the CALL instruction.
Return Values
Scalars (integers and pointers) are returned in register EAX.
Floating point values (float, double, and long double) are returned on the top of the 387 stack.
Structures and unions are handled in one of two ways:
■
The general case is that a function stores the return value into space provided by the caller. The
address of this space is specified through a new, “hidden” leftmost parameter to the function.
The function also returns the address of the space in register EAX.
■
For structs that are 8 bytes in size or less, an optimization is made to not have a hidden parameter
and instead directly return these small structs in the EAX register and (if size is greater that 4 bytes)
in the EDX register. (This differs from the System V IA-32 ABI which always returns structs through
a hidden parameter.) For example, the C99 type _Complex float is 8 bytes and is returned in
EAX/EDX, while _Complex double is 16 bytes and is returned through a hidden parameter.
Return Values
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
87
A P P E N D I X
D
Application Binary Interface
88
Return Values
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
PowerPlant has specific resources—PPob resources—that you must flip (that is, byte swap) to ensure
that your code runs correctly after you create a universal build. You can use the code in Listing E-1
as a starting point for flipping PPob resources. You can write a flipper using the lanuage of your
choice. The more complete the flipper, the more it is a good idea for you to leverage the static type
checking that the compiler provides. For that reason, and to provide a clean example, the code in
Listing E-1 uses C++. The following idiom refers to a function that swaps a 16-bit integer and advances
a pointer:
SInt16 SWAP_COUNT_S16(char*& p, Boolean currentlyNative);
The return value is the swapped value in a native endian format and the pointer is advanced past the
16 bit value. While this is effectively the same as passing a pointer to the pointer, the syntax of C++
makes it slightly cleaner. Note that endian format does not take the sign byte into consideration—an
unsigned value swaps identically to a signed value. However, it is useful to have signed and unsigned
variants of the swapping routines to promote type safety in your flippers. Listing E-1 includes an
InstallPowerplantFlippers routine that installs the flippers. You need to call this routine early in
your code execution, before your application initializes the PowerPlant framework. This example
does not ensure that the length of the buffer passed in is correct. It is the responsibility of the flipper
routine to ensure that it does not write beyond the (ptr + length) passed to it.
Listing E-1
Code that flips PPob resources
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/*
* Utilities for flipping and advancing a pointer
*/
static inline void SWAP_16(char*& p)
{
UInt16* p16 = (UInt16*) p;
*p16 = Endian16_Swap(*p16);
p += sizeof(UInt16);
}
static inline void SWAP_S16(char*& p) { return SWAP_16(p); }
static inline void SWAP_U16(char*& p) { return SWAP_16(p); }
static inline void SWAP_32(char*& p)
{
89
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
UInt32* p32 = (UInt32*) p;
*p32 = Endian32_Swap(*p32);
p += sizeof(UInt16);
}
static inline void SWAP_S32(char*& p) { return SWAP_32(p); }
static inline void SWAP_U32(char*& p) { return SWAP_32(p); }
static inline UInt16 SWAP_COUNT_U16(char*& p, Boolean currentlyNative)
{
UInt16 count;
if (currentlyNative) {
UInt16* p16 = (UInt16*) p;
count = *p16;
*p16 = Endian16_Swap(*p16);
} else {
UInt16* p16 = (UInt16*) p;
*p16 = Endian16_Swap(*p16);
count = *p16;
}
p += sizeof(UInt16);
return count;
}
static inline UInt16 SWAP_COUNT_S16 (char*& p, Boolean currentlyNative)
{ return (SInt16) SWAP_COUNT_U16(p, currentlyNative);}
static inline UInt32 SWAP_COUNT_U32(char*& p, Boolean currentlyNative)
{
UInt32 count;
if (currentlyNative) {
UInt32* p32 = (UInt32*) p;
count = *p32;
*p32 = Endian32_Swap(*p32);
} else {
UInt32* p32 = (UInt32*) p;
*p32 = Endian32_Swap(*p32);
count = *p32;
}
p += sizeof(UInt32);
return count;
}
static inline SInt32 SWAP_COUNT_S32(char*& p, Boolean currentlyNative)
{ return (SInt32) SWAP_COUNT_U32(p, currentlyNative);}
/*
* Note that the code doesn’t really swap byte or string values; these
* functions merely advance the pointer an appropriate amount.
*/
static inline void SWAP_BYTE(char*& p)
{
++p;
}
static inline void SWAP_PSTRING(char*& p)
{
// Advance p by the length of the string, plus the length byte
p += p[0] + 1;
}
90
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
/*
* Not used in this example, but if a flipper must align to an even
* byte boundary this function will do so. Note that the alignment is
* based on the *start* of the byte array. It is possible that the
* data pointer passed to the flipper will not be aligned on a 16-bit
* boundary.
*/
static inline void ALIGN_16(char*& p, char* base)
{
if ((UInt32(p - base) & 1) != 0)
++p;
}
/*
* Other utilities that swap specific well-known data types
*/
static inline void SWAP_POINT(char*& p)
{
SWAP_S16(p);
// .v
SWAP_S16(p);
// .h
}
static inline void SWAP_RECT(char*& p)
{
SWAP_S16(p);
// .top
SWAP_S16(p);
// .left
SWAP_S16(p);
// .bottom
SWAP_S16(p);
// .right
}
/*
* Prototypes for functions called by the PPob flipper
*/
static void FlipPaneData(char*& p);
static void FlipViewData(char*& p);
static void FlipCaptionData(char*& p);
static void FlipScrollerData(char*& p);
static void FlipControlData(char*& p);
static void FlipStandardControlData(char*& p);
static void FlipWindowData(char*& p);
static void FlipAttachmentData(char*& p);
static void FlipAppearanceControlData(char*& p);
static void FlipAppearanceViewData(char*& p);
static void FlipBevelButtonData(char*& p);
/*
* The main flipper
*/
OSStatus _FlipPPob(OSType dataDomain, OSType dataType,
short id, void* dataPtr, UInt32 dataSize,
Boolean currentlyNative, void* refcon)
{
91
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
char *p = (char *) dataPtr; // A pointer to the data (as a char* so
// it can advance)
SWAP_S16(p);
// Version (always 2)
// Loop until there are no more items (leaving one UInt32 at
// the end for the end marker.)
while ((UInt32) (p - (char *) dataPtr) < dataSize - sizeof(UInt32)) {
UInt32 ppobDataType = SWAP_COUNT_U32(p, currentlyNative);
switch (ppobDataType) {
case 'objd': {
// longint = (ObjectDataEnd[$$ArrayIndex(TagArray)] // ObjectDataStart[$$ArrayIndex(TagArray)]) / 8 - 4;
SWAP_S32(p);
UInt32 objectDataType = SWAP_COUNT_U32(p, currentlyNative);
switch (objectDataType) {
// Pane classes
case 'pane':
FlipPaneData(p);
break;
case 'view':
FlipViewData(p);
break;
case 'ascr':
FlipScrollerData(p);
break;
case 'butn':
FlipControlData(p);
SWAP_S32(p);
// Graphics resource
SWAP_S16(p);
// Normal graphic ID
SWAP_S16(p);
// Pushed graphic ID
break;
case 'capt':
FlipCaptionData(p);
break;
case 'cicn':
FlipControlData(p);
SWAP_S16(p);
// Normal 'cicn' ID
SWAP_S16(p);
// Pushed 'cicn' ID
break;
case 'cntl':
FlipControlData(p);
break;
case 'colv':
FlipViewData(p);
SWAP_S16(p);
// Column width
SWAP_S16(p);
// Row height
SWAP_BYTE(p);
// Single selector
SWAP_BYTE(p);
// Drag select
SWAP_S32(p);
// Data size
SWAP_S32(p);
// Double-click message
SWAP_S32(p);
// Selection message
break;
case 'dlog':
FlipWindowData(p);
SWAP_S32(p);
// Default button ID
SWAP_S32(p);
// Cancel button ID
92
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
break;
case 'edit':
FlipPaneData(p);
SWAP_PSTRING(p);
SWAP_S16(p);
SWAP_S16(p);
SWAP_BYTE(p);
// Initial text
// Text traits ID
// Maximum characters
// Box around field
// Wrap text to frame
// Autoscroll text
// Text buffering of keyboard input
// Outline highlighting when inactive
// Allow inline input
// Use Text Services
// Reserved bit 1
SWAP_BYTE(p);
// Keystroke filter
break;
case 'gpvw':
FlipViewData(p);
break;
case 'gbox':
FlipCaptionData(p);
break;
case 'icnp':
FlipPaneData(p);
SWAP_S16(p);
// Icon resource ID
break;
case 'lbox': {
FlipPaneData(p);
SWAP_BYTE(p);
// Horizontal scroll
SWAP_BYTE(p);
// Vertical scroll
SWAP_BYTE(p);
// Grow box
SWAP_BYTE(p);
// Focus box
SWAP_S32(p);
// Double-click message
SWAP_S16(p);
// Text traits ID
SWAP_S16(p);
// LDEF ID
UInt16 count = SWAP_COUNT_U16(p, currentlyNative);
for(int i = 0; i < count; i++) {
SWAP_PSTRING(p);
// Text of list item
}
}
break;
case 'mpvw': {
FlipViewData(p);
UInt16 count = SWAP_COUNT_U16(p, currentlyNative);
for(int i = 0; i < count; i++) {
SWAP_S16(p);
// PPob ID
}
SWAP_S16(p);
// Initial panel
SWAP_S32(p);
// Switch message
SWAP_BYTE(p);
// Listen to superview
}
break;
case 'offv':
FlipViewData(p);
break;
case 'ovlv':
FlipViewData(p);
break;
case 'pict':
FlipViewData(p);
93
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
SWAP_S16(p);
// PICT resource ID
break;
case 'plac':
FlipViewData(p);
SWAP_S16(p);
// Alignment of occupant
break;
case 'prnt':
SWAP_POINT(p); // Width, height
SWAP_BYTE(p);
// Active
SWAP_BYTE(p);
// Enabled
SWAP_S32(p);
// Refcon
SWAP_S16(p);
// Page numbering order
// and bitstring[15]
SWAP_S16(p);
// Reserved bytes
break;
case 'rgpv':
FlipViewData(p);
break;
case 'scrl':
FlipScrollerData(p);
break;
case 'sclv':
FlipPaneData(p);
SWAP_S16(p);
// Filler
SWAP_S16(p);
// Thickness of scroll bars
SWAP_S32(p);
// Image height
SWAP_S32(p);
// Horizontal scroll position
SWAP_S32(p);
// Vertical scroll position
SWAP_S32(p);
// Horizontal scroll unit
SWAP_S32(p);
// Vertical scroll unit
SWAP_S16(p);
// Reconcile after resize
SWAP_S16(p);
// Horiz. scroll bar left indent
SWAP_S16(p);
// Horiz. scroll bar right indent
SWAP_S16(p);
// Vertical scroll bar top indent
SWAP_S16(p);
// Vert. scroll bar bottom indent
SWAP_S32(p);
// Scrolling view ID
SWAP_BYTE(p);
// Live scrolling
break;
case 'pbut':
FlipStandardControlData(p);
break;
case 'cbox':
FlipStandardControlData(p);
break;
case 'sctl':
FlipStandardControlData(p);
break;
case 'popm':
FlipPaneData(p);
SWAP_S32(p);
// Message
SWAP_S32(p);
// Title position and style
SWAP_S16(p);
// Filler
SWAP_S16(p);
// MENU resource ID
SWAP_S16(p);
// Filler
SWAP_S16(p);
// Pixel width of title
SWAP_S16(p);
// Variation
SWAP_S16(p);
// Text traits ID
SWAP_PSTRING(p);
// Title
94
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
SWAP_S32(p);
// ResType for menu items
SWAP_S16(p);
// Initial item
break;
case 'rbut':
FlipStandardControlData(p);
break;
case 'solv':
FlipViewData(p);
break;
case 'tbgv':
FlipViewData(p);
break;
case 'tabl':
FlipViewData(p);
SWAP_S32(p);
// Number of rows
SWAP_S32(p);
// Number of columns
SWAP_S32(p);
// Row height
SWAP_S32(p);
// Column width
SWAP_S32(p);
// Cell data size
break;
case 'htab':
case 'tabv':
FlipViewData(p);
break;
case 'txbt':
FlipControlData(p);
SWAP_PSTRING(p);
// Title
SWAP_S16(p);
// Text traits ID
SWAP_S16(p);
// Selected text style
break;
case 'txcl':
FlipViewData(p);
SWAP_S16(p);
// Column width
SWAP_S16(p);
// Row height
SWAP_BYTE(p);
// Single selector
SWAP_BYTE(p);
// Drag select
SWAP_S32(p);
// Data size
SWAP_S32(p);
// Double-click message
SWAP_S32(p);
// Selection message
SWAP_S16(p);
// Text traits ID
SWAP_S16(p);
// STR# resource ID
break;
case 'text':
FlipViewData(p);
SWAP_S16(p);
// Single- or multi-style
// Read-only
// Allow selection
// Wrap text to frame
// Reserved bits
SWAP_S16(p);
// Text traits ID
SWAP_S16(p);
// TEXT resource ID
break;
case 'txtv':
FlipViewData(p);
SWAP_S16(p);
// Single- or multi-style
// Read-only
// Allow selection
// Wrap text to frame
95
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
//
//
//
//
//
Autoscroll text
Outline hilite when inactive
Reserved bits
Text traits ID
TEXT resource ID
SWAP_S16(p);
SWAP_S16(p);
break;
case 'tbut':
FlipControlData(p);
SWAP_S32(p);
// Graphics resource type
SWAP_S16(p);
// On graphic ID
SWAP_S16(p);
// On click graphic ID
SWAP_S16(p);
// Off graphic ID
SWAP_S16(p);
// Off click graphic ID
SWAP_S16(p);
// Transition graphic ID
break;
case 'twin':
case 'wind':
FlipWindowData(p);
break;
// Support classes
case 'tabg':
break;
case 'radg': {
UInt16 count = SWAP_COUNT_U16(p, currentlyNative);
for(int i = 0; i < count; i++) {
SWAP_S32(p);
// Radio button pane ID
}
}
break;
// Attachments
case 'atch':
FlipAttachmentData(p);
break;
case 'beep':
FlipAttachmentData(p);
break;
case 'brda':
FlipAttachmentData(p);
SWAP_POINT(p); // Pen size
SWAP_S16(p);
// Pen mode
SWAP_S16(p);
// Pen pattern
SWAP_U16(p);
// Foreground color:
SWAP_U16(p);
// Foreground color:
SWAP_U16(p);
// Foreground color:
SWAP_U16(p);
// Background color:
SWAP_U16(p);
// Background color:
SWAP_U16(p);
// Background color:
break;
case 'cena':
FlipAttachmentData(p);
SWAP_S32(p);
// Command to enable
break;
case 'cers':
FlipAttachmentData(p);
SWAP_U16(p);
// Foreground color:
SWAP_U16(p);
// Foreground color:
SWAP_U16(p);
// Foreground color:
96
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
red
green
blue
red
green
blue
red
green
blue
A P P E N D I X
E
Flipping PowerPlant Resources
SWAP_U16(p);
// Background color: red
SWAP_U16(p);
// Background color: green
SWAP_U16(p);
// Background color: blue
break;
case 'cmat': {
FlipAttachmentData(p);
SWAP_S16(p);
// Menu ID
SWAP_S16(p);
// Cursor ID
SWAP_S32(p);
// Help type
SWAP_PSTRING(p);
// Help item text
SInt16 count = SWAP_COUNT_S16(p, currentlyNative);
for(int i = 0; i < count; i++) {
SWAP_S32(p);
// Command number
}
SWAP_S32(p);
// Command target pane ID
}
break;
case 'eras':
FlipAttachmentData(p);
break;
case 'ksca':
FlipAttachmentData(p);
break;
case 'pnta':
FlipAttachmentData(p);
SWAP_POINT(p); // Pen size
SWAP_S16(p);
// Pen mode
SWAP_S16(p);
// Pen pattern
SWAP_U16(p);
// Foreground color: red
SWAP_U16(p);
// Foreground color: green
SWAP_U16(p);
// Foreground color: blue
SWAP_U16(p);
// Background color: red
SWAP_U16(p);
// Background color: green
SWAP_U16(p);
// Background color: blue
break;
case 'wtha':
FlipAttachmentData(p);
SWAP_S16(p);
// Active background brush
SWAP_S16(p);
// Inactive background brush
SWAP_S16(p);
// Active text color
SWAP_S16(p);
// Inactive text color
break;
// Appearance controls
case 'bbut':
FlipBevelButtonData(p);
break;
case 'carr':
FlipAppearanceControlData(p);
break;
case 'chbx':
FlipAppearanceControlData(p);
break;
case 'cbgb':
FlipAppearanceViewData(p);
break;
case 'clck':
FlipAppearanceControlData(p);
break;
97
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
case 'cbbt':
FlipBevelButtonData(p);
SWAP_S32(p);
// Command
break;
case 'dtri':
FlipAppearanceControlData(p);
break;
case 'etxt':
FlipAppearanceControlData(p);
SWAP_S16(p);
// Maximum characters
SWAP_BYTE(p);
// Box around field
// Wrap text to frame
// Autoscroll text
// Text buffering of
// keyboard input
// Outline highlighting when
// inactive
// Allow inline input
// Use Text Services
// Reserved bit
SWAP_BYTE(p);
// Keystroke filter
break;
case 'ictl':
FlipAppearanceControlData(p);
SWAP_S16(p);
// Icon alignment
break;
case 'iwel':
FlipAppearanceViewData(p);
break;
case 'larr':
FlipAppearanceControlData(p);
break;
case 'picd':
FlipAppearanceViewData(p);
break;
case 'plcd':
FlipAppearanceViewData(p);
break;
case 'popb':
FlipAppearanceControlData(p);
SWAP_S32(p);
// Resource type for menu items
SWAP_S16(p);
// Initial item choice
break;
case 'pgbx':
FlipAppearanceViewData(p);
SWAP_S16(p);
// Initial item choice
break;
case 'pbar':
FlipAppearanceControlData(p);
SWAP_BYTE(p);
// Determinate
break;
case 'push':
FlipAppearanceControlData(p);
SWAP_BYTE(p);
// Default button
break;
case 'rdbt':
FlipAppearanceControlData(p);
break;
98
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
case 'sbar':
FlipAppearanceControlData(p);
break;
case 'sepl':
FlipAppearanceControlData(p);
break;
case 'slid':
FlipControlData(p);
SWAP_S16(p);
// Proc ID (12 bits)
// Directional indicator
// Direction of indicator
// Tick marks
// Live feedback
SWAP_S16(p);
// Number of tick marks
break;
case 'stxt':
FlipAppearanceControlData(p);
break;
case 'tabs':
FlipAppearanceViewData(p);
SWAP_S16(p);
// Initial tab choice
break;
case 'tgbx':
FlipAppearanceViewData(p);
break;
case 'winh':
FlipAppearanceViewData(p);
break;
// Grayscale classes (not supported)
// Null object
case 'null':
break;
// User-defined objects.
// You *MUST* add your types here.
default: {
// Abort. We're hosed.
fprintf(stderr, "Unknown PPob object data type
encountered (0x%x)! Aborting.\n",
objectDataType);
assert(0);
}
break;
}
}
case 'begs':
// Beginning of sub-object list
break;
case 'ends':
// End of sub-object list
break;
case 'user':
// User object
SWAP_S32(p);
// Superclass ID for next object
break;
case 'dopl':
// Class alias
SWAP_S32(p);
// Class ID for next object
break;
case 'comm':
// Comment
// longint = (CommentEnd[$$ArrayIndex(TagArray)] -
99
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
// CommentStart[$$ArrayIndex(TagArray)]) / 8 - 4;
SWAP_S32(p);
// hex string[$$Long(CommentStart
// [$$ArrayIndex(TagArray)])];
SWAP_PSTRING(p);
break;
default:
break;
}
}
SWAP_S32(p);
// End of tags marker ('end.')
return noErr;
}
static void FlipPaneData(char*& p)
{
SWAP_S32(p);
// Pane ID
SWAP_POINT(p); // Width, height
SWAP_BYTE(p);
// Visible
SWAP_BYTE(p);
// Enabled
SWAP_BYTE(p);
// Left binding
SWAP_BYTE(p);
// Top binding
SWAP_BYTE(p);
// Right binding
SWAP_BYTE(p);
// Bottom binding
SWAP_S32(p);
// Left location
SWAP_S32(p);
// Top location
SWAP_S32(p);
// UserCon
SWAP_S32(p);
// Superview
}
static void FlipViewData(char*& p)
{
FlipPaneData(p);
SWAP_S32(p);
// Image width
SWAP_S32(p);
// Image height
SWAP_S32(p);
// Horizontal scroll position
SWAP_S32(p);
// Vertical scroll position
SWAP_S32(p);
// Horizontal scroll unit
SWAP_S32(p);
// Vertical scroll unit
SWAP_S16(p);
// Reconcile after resize
}
static void FlipCaptionData(char*& p) {
FlipPaneData(p);
SWAP_PSTRING(p);
// Caption text
SWAP_S16(p);
// Text traits ID
}
static void FlipScrollerData(char*& p)
{
FlipViewData(p);
SWAP_S16(p);
// Horizontal scroll bar left indent
SWAP_S16(p);
// Horizontal scroll bar right indent
SWAP_S16(p);
// Vertical scroll bar top indent
SWAP_S16(p);
// Vertical scroll bar bottom indent
SWAP_S32(p);
// Scrolling view ID
}
100
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
static void FlipControlData(char*& p)
{
FlipPaneData(p);
SWAP_S32(p);
// Message
SWAP_S32(p);
// Initial value
SWAP_S32(p);
// Minimum value
SWAP_S32(p);
// Maximum value
}
static void FlipStandardControlData(char*& p)
{
FlipControlData(p);
SWAP_S16(p);
// Control proc
SWAP_S16(p);
// Text traits ID
SWAP_PSTRING(p);
// Title
SWAP_S32(p);
// Refcon
}
static void FlipWindowData(char*& p)
{
SWAP_S16(p);
// WIND resource ID
SWAP_S16(p);
// Window layer
SWAP_S32(p);
// Placeholder for layer (bitstring[3])
// Close box
// Title bar
// Resizable
// Draw size box
// Zoomable
// Visible after creating
// Enabled
// Targetable
// Get select click
// Hide when suspended
// Delayed selection
// Erase when updating
// Reserved bytes (16 bits)
SWAP_S16(p);
// Minimum width
SWAP_S16(p);
// Minimum height
SWAP_S16(p);
// Maximum width
SWAP_S16(p);
// Maximum height
SWAP_S16(p);
// Standard width
SWAP_S16(p);
// Standard height
SWAP_S32(p);
// Refcon
}
static void FlipAttachmentData(char*& p)
{
SWAP_S32(p);
// Message type
SWAP_BYTE(p);
// Execute host
SWAP_BYTE(p);
// Host is owner
}
static void FlipAppearanceControlData(char*& p)
{
FlipControlData(p);
SWAP_S16(p);
// Control-specific data
SWAP_S16(p);
// Text traits ID
101
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
SWAP_PSTRING(p);
// Title
}
static void FlipAppearanceViewData(char*& p)
{
FlipViewData(p);
SWAP_S32(p);
// Message
SWAP_S32(p);
// Initial value
SWAP_S32(p);
// Minimum value
SWAP_S32(p);
// Maximum value
SWAP_S16(p);
// Control-specific data
SWAP_S16(p);
// Text traits ID
SWAP_PSTRING(p);
// Title
}
static void FlipBevelButtonData(char*& p)
{
FlipAppearanceControlData(p);
SWAP_S16(p);
// Initial value
SWAP_S16(p);
// Title placement
SWAP_S16(p);
// Title alignment
SWAP_S16(p);
// Title offset
SWAP_S16(p);
// Graphic alignment
SWAP_POINT(p); // Graphic offset
SWAP_BYTE(p);
// Center popup arrow
}
OSStatus _FlipMcmd(OSType dataDomain, OSType dataType,
short id, void* dataPtr, UInt32 dataSize,
Boolean currentlyNative, void* refcon)
{
char* p = (char*) dataPtr;
SInt16 count = SWAP_COUNT_S16(p, currentlyNative);
for (int i = 0; i < count;
SWAP_S32(p);
}
i++) {
return noErr;
}
OSStatus _FlipRidL(OSType dataDomain, OSType dataType,
short id, void* dataPtr, UInt32 dataSize,
Boolean currentlyNative, void* refcon)
{
char* p = (char*) dataPtr;
SInt16 count = SWAP_COUNT_S16(p, currentlyNative);
for (int i = 0; i < count;
SWAP_S32(p);
}
i++) {
return noErr;
}
OSStatus _FlipRID_(OSType dataDomain, OSType dataType, short id,
102
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
void* dataPtr, UInt32 dataSize,
Boolean currentlyNative, void* refcon)
{
char* p = (char*) dataPtr;
SInt16 count = SWAP_COUNT_S16(p, currentlyNative);
for (int i = 0; i < count;
SWAP_S16(p);
}
i++) {
return noErr;
}
OSStatus _FlipTxtr(OSType dataDomain, OSType dataType, short id,
void* dataPtr, UInt32 dataSize,
Boolean currentlyNative, void* refcon)
{
char* p = (char*) dataPtr;
SWAP_S16(p);
SWAP_S16(p);
SWAP_S16(p);
SWAP_S16(p);
SWAP_U16(p);
SWAP_U16(p);
SWAP_U16(p);
SWAP_S16(p);
SWAP_PSTRING(p);
//
//
//
//
//
//
//
//
//
Size
Style
Justification
Transfer mode
Color: red
Color: green
Color: blue
Font number
Font name
return noErr;
}
OSStatus InstallPowerplantFlippers()
{
struct PerFlipper {
OSType resType;
CoreEndianFlipProc proc;
};
PerFlipper theFlippers[] = {
{ 'PPob', _FlipPPob },
{ 'Mcmd', _FlipMcmd },
{ 'RidL', _FlipRidL },
{ 'RID#', _FlipRID_ },
{ 'Txtr', _FlipTxtr },
};
OSStatus result = noErr;
for (int i = 0; i < sizeof(theFlippers) / sizeof(theFlippers[0]); i++) {
OSStatus err =
CoreEndianInstallFlipper(kCoreEndianResourceManagerDomain,
theFlippers[i].resType,
t
heFlippers[i].proc, nil);
if (err != noErr) {
result = err;
break;
}
}
103
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
A P P E N D I X
E
Flipping PowerPlant Resources
return result;
}
104
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
R E V I S I O N
H I S T O R Y
Document Revision History
This table describes the changes to Universal Binary Programming Guidelines.
Date
Notes
2005-06-06
First version.
105
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
R E V I S I O N
H I S T O R Y
Document Revision History
106
2005-06-06 | Preliminary © 2005 Apple Computer, Inc. All Rights Reserved.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising