SseMath_sin_ps

Evaluation of 4 sines at onces, using only SSE1+MMX intrinsics so it runs also on old athlons XPs and the pentium III of your grand mother.

Windows
MacOS
Linux

References

Module

Core

Header

/Engine/Source/Runtime/Core/Public/Math/sse_mathfun.h

Include

#include "Math/sse_mathfun.h"

Syntax

v4sf SseMath_sin_ps
(
    v4sf x
)

Remarks

Evaluation of 4 sines at onces, using only SSE1+MMX intrinsics so it runs also on old athlons XPs and the pentium III of your grand mother.

The code is the exact rewriting of the cephes sinf function. Precision is excellent as long as x < 8192 (I did not bother to take into account the special handling they have for greater values it does not return garbage for arguments over 8192, though, but the extra precision is missing).

Note that it is such that sinf((float)M_PI) = 8.74e-8, which is the surprising but correct result.

Performance is also surprisingly good, 1.33 times faster than the macos vsinf SSE2 function, and 1.5 times faster than the __vrs4_sinf of amd's ACML (which is only available in 64 bits). Not too bad for an SSE1 function (with no special tuning) ! However the latter libraries probably have a much better handling of NaN, Inf, denormalized and other special arguments..

On my core 1 duo, the execution of this function takes approximately 95 cycles.

From what I have observed on the experiments with Intel AMath lib, switching to an SSE2 version would improve the perf by only 10%.

Since it is based on SSE intrinsics, it has to be compiled at -O2 to deliver full speed.

Help shape the future of Unreal Engine documentation! Tell us how we're doing so we can serve you better.
Take our survey
Dismiss