Skip to content

Different type of FastSin#109

Open
GlebKonstPub wants to merge 1 commit intologicomacorp:masterfrom
GlebKonstPub:temp
Open

Different type of FastSin#109
GlebKonstPub wants to merge 1 commit intologicomacorp:masterfrom
GlebKonstPub:temp

Conversation

@GlebKonstPub
Copy link
Copy Markdown

@GlebKonstPub GlebKonstPub commented Mar 12, 2026

Different fastSin implementation, without LUT.

Here is couple illustrations. Green is error with linear interpolation on LUT with 512 entries. Red line is error with this implementation. Both are scaled by 2048 for visibility.
01
02

This method has much smaller assembly size, since it doesn't require table preparation. Also it is considerably more performant:

  • up to 5.5 times on SSE2
  • up to 2 times on AVX2

PolySin here is the new implementation.
03

Both basically without actual vectorization, just by virtue of pure and simple computation without retrieving stuff from memory.

Basically it generates triangle wave with sine shaper afterwards. Sine shapers idea is a simple 7-th degree polynomial. Idea of exact polynomial construction was taken from NI Reaktor Core library and recalculated with better precision.

@GlebKonstPub
Copy link
Copy Markdown
Author

IMHO, this version is precise enough. However, if necessary, it can be improved dramatically by the cost of couple more multiply/accumulate instructions. Approximately 1 mul/acc per order of magnitude of error.

@GlebKonstPub
Copy link
Copy Markdown
Author

GlebKonstPub commented Mar 13, 2026

04 Just for fun I tried benchmarking this on x87 FPU. Results are stading. 05 Same with 32-bit x86. Actually this target shows biggest difference in performance. I can assume that this is somehow connected with x86 calling convention. Double results are stored on ST(0), so this means they should be pushed from MMX registers into memory and than loaded into FPU stack. PolySin just interact with memory less.

@GlebKonstPub
Copy link
Copy Markdown
Author

I recompiled benchmark with /fp:fast. I understand that original FastSin was there mostly there to avoid calls to std, but I hope this further demonstrates that PolySin does not only have a smaller footprint, but also is so optimized that even default strict fp precision doesn't affect it in any detectable way.

01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant