SIMD instruction.docx
- 文档编号:16232709
- 上传时间:2023-07-12
- 格式:DOCX
- 页数:25
- 大小:23.53KB
SIMD instruction.docx
《SIMD instruction.docx》由会员分享,可在线阅读,更多相关《SIMD instruction.docx(25页珍藏版)》请在冰点文库上搜索。
SIMDinstruction
B.5SIMDInstructions(MMX,SSE)
B.5.1 ADDPS:
AddPackedSingle-PrecisionFPValues
ADDPSxmm1,xmm2/mem128;0F58/r[KATMAI,SSE]
ADDPS performsadditiononeachoffourpackedsingle-precisionFPvaluepairs:
dst[0-31]:
=dst[0-31]+src[0-31],
dst[32-63]:
=dst[32-63]+src[32-63],
dst[64-95]:
=dst[64-95]+src[64-95],
dst[96-127]:
=dst[96-127]+src[96-127].
Thedestinationisan XMM register.Thesourceoperandcanbeeitheran XMM registerora128-bitmemorylocation.
B.5.2 ADDSS:
AddScalarSingle-PrecisionFPValues
ADDSSxmm1,xmm2/mem64;F20F58/r[KATMAI,SSE]
ADDSS addsthelowsingle-precisionFPvaluesfromthesourceanddestinationoperandsandstoresthesingle-precisionFPresultinthedestinationoperand.
dst[0-31]:
=dst[0-31]+src[0-31],
dst[32-127]remainsunchanged.
Thedestinationisan XMM register.Thesourceoperandcanbeeitheran XMM registerora32-bitmemorylocation.
B.5.3 ANDNPS:
BitwiseLogicalANDNOTofPackedSingle-PrecisionFPValues
ANDNPSxmm1,xmm2/mem128;0F55/r[KATMAI,SSE]
ANDNPS invertsthebitsofthefoursingle-precisionfloating-pointvaluesinthedestinationregister,andthenperformsalogicalANDbetweenthefoursingle-precisionfloating-pointvaluesinthesourceoperandandthetemporaryinvertedresult,storingtheresultinthedestinationregister.
dst[0-31]:
=src[0-31]ANDNOTdst[0-31],
dst[32-63]:
=src[32-63]ANDNOTdst[32-63],
dst[64-95]:
=src[64-95]ANDNOTdst[64-95],
dst[96-127]:
=src[96-127]ANDNOTdst[96-127].
Thedestinationisan XMM register.Thesourceoperandcanbeeitheran XMM registerora128-bitmemorylocation.
B.5.4 ANDPS:
BitwiseLogicalANDForSingleFP
ANDPSxmm1,xmm2/mem128;0F54/r[KATMAI,SSE]
ANDPS performsabitwiselogicalANDofthefoursingle-precisionfloatingpointvaluesinthesourceanddestinationoperand,andstorestheresultinthedestinationregister.
dst[0-31]:
=src[0-31]ANDdst[0-31],
dst[32-63]:
=src[32-63]ANDdst[32-63],
dst[64-95]:
=src[64-95]ANDdst[64-95],
dst[96-127]:
=src[96-127]ANDdst[96-127].
Thedestinationisan XMM register.Thesourceoperandcanbeeitheran XMM registerora128-bitmemorylocation.
B.5.5 CMPccPS:
PackedSingle-PrecisionFPCompare
CMPPSxmm1,xmm2/mem128,imm8;0FC2/rib[KATMAI,SSE]
CMPEQPSxmm1,xmm2/mem128;0FC2/r00[KATMAI,SSE]
CMPLTPSxmm1,xmm2/mem128;0FC2/r01[KATMAI,SSE]
CMPLEPSxmm1,xmm2/mem128;0FC2/r02[KATMAI,SSE]
CMPUNORDPSxmm1,xmm2/mem128;0FC2/r03[KATMAI,SSE]
CMPNEQPSxmm1,xmm2/mem128;0FC2/r04[KATMAI,SSE]
CMPNLTPSxmm1,xmm2/mem128;0FC2/r05[KATMAI,SSE]
CMPNLEPSxmm1,xmm2/mem128;0FC2/r06[KATMAI,SSE]
CMPORDPSxmm1,xmm2/mem128;0FC2/r07[KATMAI,SSE]
The CMPccPS instructionscomparethetwopackedsingle-precisionFPvaluesinthesourceanddestinationoperands,andreturnstheresultofthecomparisoninthedestinationregister.Theresultofeachcomparisonisaquadwordmaskofall1s(comparisontrue)orall0s(comparisonfalse).
Thedestinationisan XMM register.Thesourcecanbeeitheran XMM registerora128-bitmemorylocation.
Thethirdoperandisan8-bitimmediatevalue,ofwhichthelow3bitsdefinethetypeofcomparison.Foreaseofprogramming,the8two-operandpseudo-instructionsareprovided,withthethirdoperandalreadyfilledin.The "ConditionPredicates" are:
EQ
0
Equal
LT
1
Lessthan
LE
2
Lessthanorequal
UNORD
3
Unordered
NE
4
Notequal
NLT
5
Notlessthan
NLE
6
Notlessthanorequal
ORD
7
Ordered
Formoredetailsofthecomparisonpredicates,anddetailsofhowtoemulatethe "greaterthan" equivalents,see SectionB.2.3.
B.5.6 COMISS:
ScalarOrderedSingle-PrecisionFPCompareandSet EFLAGS
COMISSxmm1,xmm2/mem64;660F2F/r[KATMAI,SSE]
COMISS comparesthelow-ordersingle-precisionFPvalueinthetwosourceoperands. ZF, PF,and CF aresetaccordingtotheresult. OF, AF,and AF arecleared.Theunorderedresultisreturnedifeithersourceisa NaN (QNaN or SNaN).
Thedestinationoperandisan XMM register.Thesourcecanbeeitheran XMM registeroramemorylocation.
Theflagsaresetaccordingtothefollowingrules:
Result
Flags
Values
Unordered
ZF,PF,CF
111
Greaterthan
ZF,PF,CF
000
Lessthan
ZF,PF,CF
001
Equal
ZF,PF,CF
100
B.5.7 CVTPI2PS:
PackedSignedINT32toPackedSingle-FPConversion
CVTPI2PSxmm,mm/mem64;0F2A/r[KATMAI,SSE]
CVTPI2PS convertstwopackedsigneddoublewordsfromthesourceoperandtotwopackedsingle-precisionFPvaluesinthelowquadwordofthedestinationoperand.Thehighquadwordofthedestinationremainsunchanged.
Thedestinationoperandisan XMM register.Thesourcecanbeeitheran MMX registerora64-bitmemorylocation.
Formoredetailsofthisinstruction,seetheIntelProcessormanuals.
B.5.8 CVTPS2PI:
PackedSingle-PrecisionFPtoPackedSignedINT32Conversion
CVTPS2PImm,xmm/mem64;0F2D/r[KATMAI,SSE]
CVTPS2PI convertstwopackedsingle-precisionFPvaluesfromthesourceoperandtotwopackedsigneddoublewordsinthedestinationoperand.
Thedestinationoperandisan MMX register.Thesourcecanbeeitheran XMM registerora64-bitmemorylocation.Ifthesourceisaregister,theinputvaluesareinthelowquadword.
Formoredetailsofthisinstruction,seetheIntelProcessormanuals.
B.5.9 CVTSD2SS:
ScalarDouble-PrecisionFPtoScalarSingle-PrecisionFPConversion
CVTSD2SSxmm1,xmm2/mem64;F20F5A/r[KATMAI,SSE]
CVTSD2SS convertsadouble-precisionFPvaluefromthesourceperandtoasingle-precisionFPvalueinthelowdoublewordoftheestinationoperand.Theupper3doublewordsareleftunchanged.
Thedestinationoperandisan XMM register.Thesourcecanbeeitheran XMM registerora64-bitmemorylocation.Ifthesourceisaregister,theinputvalueisinthelowquadword.
Formoredetailsofthisinstruction,seetheIntelProcessormanuals.
B.5.10 CVTSI2SS:
SignedINT32toScalarSingle-PrecisionFPConversion
CVTSI2SSxmm,r/m32;F30F2A/r[KATMAI,SSE]
CVTSI2SS convertsasigneddoublewordfromthesourceoperandtoasingle-precisionFPvalueinthelowdoublewordofthedestinationoperand.Theupper3doublewordsareleftunchanged.
Thedestinationoperandisan XMM register.Thesourcecanbeeitherageneralpurposeregisterora32-bitmemorylocation.
Formoredetailsofthisinstruction,seetheIntelProcessormanuals.
B.5.11 CVTSS2SI:
ScalarSingle-PrecisionFPtoSignedINT32Conversion
CVTSS2SIreg32,xmm/mem32;F30F2D/r[KATMAI,SSE]
CVTSS2SI convertsasingle-precisionFPvaluefromthesourceoperandtoasigneddoublewordinthedestinationoperand.
Thedestinationoperandisageneralpurposeregister.Thesourcecanbeeitheran XMM registerora32-bitmemorylocation.Ifthesourceisaregister,theinputvalueisinthelowdoubleword.
Formoredetailsofthisinstruction,seetheIntelProcessormanuals.
B.5.12 CVTTPS2PI:
PackedSingle-PrecisionFPtoPackedSignedINT32ConversionwithTruncation
CVTTPS2PImm,xmm/mem64;0F2C/r[KATMAI,SSE]
CVTTPS2PI convertstwopackedsingle-precisionFPvaluesinthesourceoperandtotwopackedsigneddoublewordsinthedestinationoperand.Iftheresultisinexact,itistruncated(roundedtowardzero).Ifthesourceisaregister,theinputvaluesareinthelowquadword.
Thedestinationoperandisan MMX register.Thesourcecanbeeitheran XMM registerora64-bitmemorylocation.Ifthesourceisaregister,theinputvalueisinthelowquadword.
Formoredetailsofthisinstruction,seetheIntelProcessormanuals.
B.5.13 CVTTSS2SI:
ScalarSingle-PrecisionFPtoSignedINT32ConversionwithTruncation
CVTTSD2SIreg32,xmm/mem32;F30F2C/r[KATMAI,SSE]
CVTTSS2SI convertsasingle-precisionFPvalueinthesourceoperandtoasigneddoublewordinthedestinationoperand.Iftheresultisinexact,itistruncated(roundedtowardzero).
Thedestinationoperandisageneralpurposeregister.Thesourcecanbeeitheran XMM registerora32-bitmemorylocation.Ifthesourceisaregister,theinputvalueisinthelowdoubleword.
Formoredetailsofthisinstruction,seetheIntelProcessormanuals.
B.5.14 DIVPS:
PackedSingle-PrecisionFPDivide
DIVPSxmm1,xmm2/mem128;0F5E/r[KATMAI,SSE]
DIVPS dividesthefourpackedsingle-precisionFPvaluesinthedestinationoperandbythefourpackedsingle-precisionFPvaluesinthesourceoperand,andstoresthepackedsingle-precisionresultsinthedestinationregister.
Thedestinationisan XMM register.Thesourceoperandcanbeeitheran XMM registerora128-bitmemorylocation.
dst[0-31]:
=dst[0-31]/src[0-31],
dst[32-63]:
=dst[32-63]/src[32-63],
dst[64-95]:
=dst[64-95]/src[64-95],
dst[96-127]:
=dst[96-127]/src[96-127].
B.5.15 DIVSS:
ScalarSingle-PrecisionFPDivide
DIVSSxmm1,xmm2/mem32;F30F5E/r[KATMAI,SSE]
DIVSS dividesthelow-ordersingle-precisionFPvalueinthedestinationoperandbythelow-ordersingle-precisionFPvalueinthesourceoperand,andstoresthesingle-precisionresultinthedestinationregister.
Thedestinationisan XMM register.Thesourceoperandcanbeeitheran XMM registerora32-bitmemorylocation.
dst[0-31]:
=dst[0-31]/src[0-31],
dst[32-127]remainsunchanged.
B.5.16 LDMXCSR:
LoadStreamingSIMDExtensionControl/Status
LDMXCSRmem32;0FAE/2[KATMAI,SSE]
LDMXCSR loads32-bitsofdatafromthespecifiedmemorylocationintothe MXCSR control/statusregister. MXCSR isusedtoenablemasked/unmaskedexceptionhandling,tosetroundingmodes,tosetflush-to-zeromode,andtoviewexceptionstatusflags.
Fordetailsofthe MXCSR register,seetheIntelprocessordocs.
Seealso STMXCSR (SectionB.5.72).
B.5.17 MASKMOVQ:
ByteMaskWrite
MASKMOVQmm1,mm2;0FF7/r[
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- SIMD instruction
![提示](https://static.bingdoc.com/images/bang_tan.gif)