Streaming SIMD Extensions 2 Instructions

Microsoft SpecificThis section describes the C/C++ language-level features supporting the Streaming SIMD Extensions 2 (SSE2) instructions:
Floating-Point Intrinsics Using Streaming SIMD Extensions 2 Instructions that describe the intrinsic operations for the double-precision, floating-point data type (__m128d).
Integer Intrinsics Using Streaming SIMD Extensions 2 that describe the intrinsics for the extended-precision integer data type (__m128i).
Other topics discussed in this section include:
Floating-Point Memory and Initialization Operations Using Streaming SIMD Extensions 2
Cache Support for Streaming SIMD Extensions 2 Floating-Point Operations
Integer Memory and Initialization Using Streaming SIMD Extensions 2
Cache Support for Streaming SIMD Extensions 2 Integer Operations
Macro Function for Shuffle Using Streaming SIMD Extensions 2
The emmintrin.h header file contains the declarations for the SSE2 instructions intrinsics. The file dvec.h contains operator overloads for some of the SSE2 instructions intrinsics, which are available for use in C++ programs.
SSE2 intrinsics use the __m128, __m128i, and __m128d data types, which are not supported on Itanium Processor Family (IPF) processors. Any SSE2 intrinsics that use the __m64 data type are not supported on x64 processors.
END Microsoft Specific

Streaming SIMD Extensions (SSE)

Microsoft SpecificThis section describes the C/C++ language-level features supporting SSE. The following features of the intrinsics are explained:
Streaming SIMD Extensions Supported by 3DNow!
Floating-Point Intrinsics Using Streaming SIMD Extensions
Miscellaneous Intrinsics Using Streaming SIMD Extensions
Memory and Initialization Using Streaming SIMD Extensions
Integer Intrinsics Using Streaming SIMD Extensions
Cache Support Using Streaming SIMD Extensions
In addition, the following macro functions are described:
Macro Function for Shuffle Using Streaming SIMD Extensions
Macro Functions to Read and Write the Control Registers
Macro Function for Matrix Transposition
The header file xmmintrin.h contains the declarations for the SSE intrinsics. The file fvec.h contains operator overloads for some of the SSE intrinsics, which are available for use in C++ programs.
SSE intrinsics use the __m128, __m128i, and __m128d data types, which are not supported on Itanium Processor Family (IPF) processors. Any SSE intrinsics that use the __m64 data type are not supported on x64 processors.
END Microsoft Specific

AMD 3DNow! Technology Overview and Intrinsics

Microsoft SpecificThe AMD 3DNow! technology is a group of instructions that opens the traditional processing bottlenecks for multimedia and floating-point-intensive applications. The 3DNow! technology enables faster frame rates on high-resolution scenes, much better physical modeling of real-world environments, sharper and more detailed 3D imaging, smoother video playback, and near-theater–quality audio.
The 3DNow! technology is compatible with today's existing x86 software and requires no operating system support, allowing 3DNow! applications to work with all existing operating systems. This technology is implemented by processors from AMD beginning with AMD-K6-2, AMD-K6-III, and AMD Athlon processors.
Beginning with the AMD Athlon processor, 3DNow! technology has been enhanced to add five new 3DNow! digital signal processing (DSP) instructions and 19 MMX Extensions, including streaming functionality.
This overview of AMD 3DNow! technology contains the following sections:
Key Functionality
Feature Detection
Register Set
Data Types
3DNow! Instruction Formats
Intrinsics Overview
Task Switching
Exceptions
Prefixes
3DNow! Intrinsics
END Microsoft Specific

Intel Overview of New Instructions and Extensions

Microsoft Specific
The Pentium III Processor and other processors such as the Pentium processor with MMX technology and Pentium II processor have instructions to enable development of optimized multimedia applications. The instructions are implemented through extensions to previously implemented instructions. This technology uses the single-instruction, multiple-data (SIMD) technique. By processing data elements in parallel, applications with media-rich bitstreams can significantly improve performance by using SIMD instructions.
You can access the Intel performance libraries at http://developer.intel.com/.
The most direct way to use these instructions is to inline the assembly language instructions into your source code. However, this can be time consuming and tedious. Instead, Intel provides easy implementation by using API extension sets, referred to as intrinsics.
Intrinsics Availability on Intel Processors
Processors MMX technology intrinsics Streaming SIMD Extensions (SSE) Streaming SIMD Extensions 2 (SSE2) instructions Processors that support SSE2YesYesYesPentium III YesYesNot availablePentium II YesNot availableNot availablePentium with MMX technologyYesNot availableNot availablePentium Pro Not availableNot availableNot availablePentium Not availableNot availableNot available
The following topics are covered:
Benefits of Using Intrinsics
Intrinsic Conventions
Intrinsic Categories and Supporting Extensions
END Microsoft Specific

Compiler Support for the MMX, SSE, and SSE2 Intrinsics

Microsoft SpecificTo support the use of MMX, SSE, and SSE2 intrinsics, the compiler includes the following features:
Data alignment
Inline assembly
Data AlignmentPreviously, alignment issues in programs were addressed either by the compiler or directly in hardware. Also, any alignment changes needed for a program to run correctly were automatically enabled. However, with the advent of intrinsic support, the user must take a more active role to guarantee that alignment issues are appropriately addressed.
Many of the new intrinsics have data alignment requirements. If these intrinsics are used and data is not appropriately aligned, the program will throw an exception that must be handled by the program; otherwise, the program will fault.
The new intrinsics require aligned data to allow better performance. With the size of new registers implemented to support the new, enhanced instruction sets, new alignment requirements were defined to make the best use of recent cache architectures. Specific alignment requirements for each intrinsic can be found in the documentation for the intrinsic.
There are different tools to specify appropriate rules for the alignment of data. For alignment of user declared variables, for example, static or automatic data, refer to the align section documentation. For data dynamically allocated from the heap, refer to the data alignment functions.
Note The __m64, __m128, __m128i and __m128d new data types already have an alignment value.
align
__alignof
Inline AssemblyThe compiler supports use of intrinsic assembly instructions in inline assembly (__asm) blocks. The compiler also accepts the new syntax MMWORD PTR and XMMWORD PTR to refer to 64- and 128-bit data.
END Microsoft SpecificFor information on how to detect the capabilities of a CPU, see CPUID

MMX, SSE, and SSE2 Intrinsics

This section discusses intrinsic support for the enhanced instruction sets supported by Intel and Advanced Micro Devices (AMD) processors.
Microsoft SpecificCompiler Support for the MMX, SSE, and SSE2 Intrinsics
Intel Technology Overview of New Instructions and Extensions
AMD 3DNow! Technology Overview and Intrinsics
MMX Technology
Streaming SIMD Extensions (SSE)
Streaming SIMD Extensions 2 (SSE2) Instructions
An intrinsic is a function known by the compiler that directly maps to a sequence of one or more assembly language instructions. Intrinsic functions are inherently more efficient than called functions because no calling linkage is required.
Intrinsics make the use of processor-specific enhancements easier because they provide a C/C++ language interface to assembly instructions. In doing so, the compiler manages things that the user would normally have to be concerned with, such as register names, register allocations, and memory locations of data.
For information on how to detect the capabilities of a CPU, see CPUID Sample: Determines CPU Capabilities.
All the MMX, SSE and SSE2 intrinsics are only available as intrinsics, thus, they are not affected by the setting of /Oi, and #pragma function may not be used on them.
END Microsoft Specific

Exception Handling in Visual C++

Robust code anticipates and handles exceptions. Exceptions occur when a program executes abnormally because of conditions outside the program's control. Certain operations, including object creation and file input/output, are subject to failures that go beyond errors. Out-of-memory conditions, for example, can occur even when your program is running correctly.
Abnormal situations should be handled by throwing and catching exceptions. Such situations are not the same as normal error conditions, such as a function executing correctly, but returning a result code indicating an error. A normal error condition, for example, would be a file status function indicating that a file does not exist. For normal error conditions, the program should examine the error code and respond appropriately.
Abnormal situations are also not the same as erroneous execution, in which, for example, the caller makes a mistake in passing arguments to a function or calls it in an inappropriate context. For erroneous execution, test your inputs and other assumptions with an assertion (see Using Assertions).
Visual C++ supports three kinds of exception handling:
C++ exception handling
Although structured exception handling works with C and C++ source files, it is not specifically designed for C++. For C++ programs, you should use C++ exception handling.
Structured exception handling
Windows supplies its own exception mechanism, called SEH. It is not recommended for C++ or MFC programming. Use SEH only in non-MFC C programs.
MFC exceptions
Since version 3.0, MFC has used C++ exceptions but still supports its older exception handling macros, which are similar to C++ exceptions in form. The older MFC exception handling macros have been supported since version 1.0. Although these macros are not recommended for new programming, they are still supported for backward compatibility. In programs that already use the macros, you can freely use C++ exceptions as well. During preprocessing, the macros evaluate to the exception handling keywords defined in the Visual C++ implementation of the C++ language as of Visual C++ version 2.0. You can leave existing exception macros in place while you begin to use C++ exceptions.
Do not mix the error handling mechanisms; for example, do not use C++ exceptions with SEH. For advice about mixing MFC macros and C++ exceptions, see Exceptions: Using MFC Macros and C++ Exceptions.
For information on handling exceptions in CLR applications, see Exception Handling under /clr.
For information about exception handling on x64 processors, see Exception Handling (x64).