Evidently, a working example
Evidently, a working example would go a long way to make our responses more productive than the wild speculation you are calling for. Why not compare a single path vectorized code against a single...
View ArticleIs that VML library function?
Is that VML library function?Is this issue consistent with every argument beign passed to vectorized sincos() function?How do you call the functions?Do you have some variable interdependencies?I...
View Article>>..."__svml_sincos4_e9"
>>..."__svml_sincos4_e9" function which is apparently the vectorized version of trigonometric functions...Yes, you're correct and SVML stands for Short Vector Math Library. There is a desription...
View ArticleAnd one more thing.
And one more thing.>>..."__svml_sincos4_e9" functionSince the function __svml_sincos4_e9 is optimized for processors with Intel AVX instruction set for 64-bit platforms ( function with e9 code is...
View ArticleTahnks for your replys,
Tahnks for your replys,Basically, the reason I'm using "#pragma omp simd" is portability, so in the future, we may move to AMD platform, or other co-processors, or even other compilers, so using...
View ArticleAnother point of slowness in
Another point of slowness in the code is call to the __svml_exp4_e9 which I'm using "exp" function in the other part of my code. According to VTune analysis, in the non-vectorized code the exp function...
View Article>>...Do I need to do some
>>...Do I need to do some tunning before call to math functions?..It looks like No because codes are portable ( you've mentioned that ) and implemented without any intrinsic functions ( is that...
View ArticleAs it was already mentioned
As it was already mentioned your code could have AVX-to-SSE transition penalties.Your programme is single-threaded so there is no execution ports stalls.But I am thinking about the possibility that...
View ArticleWell, as you can see in the
Well, as you can see in the attachment, there is no AVX to SSE and SSE to AVX conversions in the __svml_ functions, so I'm still wondering what the reason of __svml_ slowness is, as the non-vec...
View Article>>...I'm still wondering what
>>...I'm still wondering what the reason of __svml_ slowness is, as the non-vec functions are fast.A test case is needed in order to understand what is going on and to answer your questions.
View ArticleEvidently, a working example
Evidently, a working example would go a long way to make our responses more productive than the wild speculation you are calling for. Why not compare a single path vectorized code against a single...
View ArticleIs that VML library function?
Is that VML library function?Is this issue consistent with every argument beign passed to vectorized sincos() function?How do you call the functions?Do you have some variable interdependencies?I...
View Article>>..."__svml_sincos4_e9"
>>..."__svml_sincos4_e9" function which is apparently the vectorized version of trigonometric functions...Yes, you're correct and SVML stands for Short Vector Math Library. There is a desription...
View ArticleAnd one more thing.
And one more thing.>>..."__svml_sincos4_e9" functionSince the function __svml_sincos4_e9 is optimized for processors with Intel AVX instruction set for 64-bit platforms ( function with e9 code is...
View ArticleTahnks for your replys,
Tahnks for your replys,Basically, the reason I'm using "#pragma omp simd" is portability, so in the future, we may move to AMD platform, or other co-processors, or even other compilers, so using...
View ArticleAnother point of slowness in
Another point of slowness in the code is call to the __svml_exp4_e9 which I'm using "exp" function in the other part of my code. According to VTune analysis, in the non-vectorized code the exp function...
View Article>>...Do I need to do some
>>...Do I need to do some tunning before call to math functions?..It looks like No because codes are portable ( you've mentioned that ) and implemented without any intrinsic functions ( is that...
View ArticleAs it was already mentioned
As it was already mentioned your code could have AVX-to-SSE transition penalties.Your programme is single-threaded so there is no execution ports stalls.But I am thinking about the possibility that...
View ArticleWell, as you can see in the
Well, as you can see in the attachment, there is no AVX to SSE and SSE to AVX conversions in the __svml_ functions, so I'm still wondering what the reason of __svml_ slowness is, as the non-vec...
View Article>>...I'm still wondering what
>>...I'm still wondering what the reason of __svml_ slowness is, as the non-vec functions are fast.A test case is needed in order to understand what is going on and to answer your questions.
View Article