双三次插值及优化.docx
- 文档编号:6996422
- 上传时间:2023-05-10
- 格式:DOCX
- 页数:45
- 大小:81.66KB
双三次插值及优化.docx
《双三次插值及优化.docx》由会员分享,可在线阅读,更多相关《双三次插值及优化.docx(45页珍藏版)》请在冰点文库上搜索。
双三次插值及优化
1.数学模型
对于一个目的像素,其坐标通过反向变换得到的在原图中的浮点坐标为(i+u,j+v),其中i、j均为非负整数,u、v为[0,1)区间的浮点数,双三次插值考虑一个浮点坐标(i+u,j+v)周围的16个邻点,目的像素值f(i+u,j+v)可由如下插值公式得到:
f(i+u,j+v)=[A]*[B]*[C]
[A]=[S(u+1) S(u+0) S(u-1) S(u-2)]
┏f(i-1,j-1) f(i-1,j+0) f(i-1,j+1) f(i-1,j+2)┓
[B]=┃f(i+0,j-1) f(i+0,j+0) f(i+0,j+1) f(i+0,j+2)┃
┃f(i+1,j-1) f(i+1,j+0) f(i+1,j+1) f(i+1,j+2)┃
┗f(i+2,j-1) f(i+2,j+0) f(i+2,j+1) f(i+2,j+2)┛
┏S(v+1)┓
[C]=┃S(v+0)┃
┃S(v-1)┃
┗S(v-2)┛
┏1-2*Abs(x)^2+Abs(x)^3 ,0<=Abs(x)<1
S(x)={4-8*Abs(x)+5*Abs(x)^2-Abs(x)^3 ,1<=Abs(x)<2
┗0 ,Abs(x)>=2
S(x)是对Sin(x*Pi)/x的逼近(Pi是圆周率——π),为插值核。
2.计算流程
1.获取16个点的坐标P1、P2……P16
2.由插值核计算公式S(x)分别计算出x、y方向的插值核向量Su、Sv
3.进行矩阵运算,得到插值结果
iTemp1=Su0*P1+Su1*P5+Su2*P9+Su3*P13
iTemp2=Su0*P2+Su1*P6+Su2*P10+Su3*P14
iTemp3=Su0*P3+Su1*P7+Su2*P11+Su3*P15
iTemp4=Su0*P4+Su1*P8+Su2*P12+Su3*P16
iResult=Sv1*iTemp1+Sv2*iTemp2+Sv3*iTemp3+Sv4*iTemp4
4.在得到插值结果图后,我们发现图像中有“毛刺”,因此对插值结果做了个后处理,即:
设该点在原图中的像素值为pSrc,若abs(iResult-pSrc)大于某阈值,我们认为插值后的点可能污染原图,因此用原像素值pSrc代替。
3.算法优化
由于双三次插值计算一个点的坐标需要其周围16个点,更有多达20次的乘法及15次的加法,计算量可以说是非常大,势必要进行优化。
我们选择了Intel的SSE2优化技术,它只支持在P4及以上的机器。
测试当前CPU是否支持SSE2,可由CPUID指令得到,代码为:
BOOLg_bSSE2=FALSE;
__asm
{
moveax,1;
cpuid;
testedx,0x04000000;
jzNotSupport;
movg_bSSE2,1
NotSupport:
}
支持SSE2的CPU引入了8个128位的寄存器,这样一个寄存器中就可以存放4个点(RGB),有利于并行计算。
详细代码见Transform.cpp中函数Optimize_Bicubic。
优化中遇到的问题:
1.图像每个点由RGB通道组成,由于1个SSE2寄存器有16个字节,这样读入4个像素点后,要浪费4个字节,同时要花费时间将数据对齐,即由BRGB|RGBR|GBRG|BRGB对齐成0RGB|0RGB|0RGB|0RGB;
2.读16字节数据到寄存器时,由于图像地址不能保证是16字节对齐,因此需用更多时钟周期的MOVDQU指令(6个以上时钟周期);如能使地址16字节对齐,则可用MOVDQA指令(1个时钟周期);
3.为了消除除法及浮点运算,对权值放大256倍,这样在计算插值核时,必须用2Bytes来表示1个系数,而图像数据都是1Byte,这样在对齐做乘法时,要浪费一半的SSE2寄存器的空间,导致运算时间变长;而若降低插值核的精度,使其在1Byte表示围时,运算的精度又大为下降;
4.对各指令的周期以及若干行指令是否能够并行流水缺乏经验和认识。
附:
SSE2指令整理
算术(Arithmetic)指令:
ADDPD--PackedDouble-PrecisionFloating-PointAddSSE2
2个double对应相加
ADDPDxmm0,xmm1/m128
ADDPS--PackedSingle-PrecisionFloating-PointAddSSE
4个float对应相加
ADDPSxmm0,xmm1/m128
ADDSD--ScalarDouble-PrecisionFloating-PointAdd
1个double(低端)对应相加SSE2
ADDSDxmm0,xmm1/m64
ADDSS--ScalarSingle-PrecisionFloating-PointAddSSE
1个float(低端)对应相加
ADDSSxmm0,xmm1/m32
PADDB/PADDW/PADDD--PackedAdd
Opcode
Instruction
Description
0FFC/r
PADDBmm,mm/m64
Addpackedbyteintegersfrommm/m64andmm.
660FFC/r
PADDBxmm1,xmm2/m128
Addpackedbyteintegersfromxmm2/m128andxmm1.
0FFD/r
PADDWmm,mm/m64
Addpackedwordintegersfrommm/m64andmm.
660FFD/r
PADDWxmm1,xmm2/m128
Addpackedwordintegersfromxmm2/m128andxmm1.
0FFE/r
PADDDmm,mm/m64
Addpackeddoublewordintegersfrommm/m64andmm.
660FFE/r
PADDDxmm1,xmm2/m128
Addpackeddoublewordintegersfromxmm2/m128andxmm1.
PADDQ--PackedQuadwordAdd
Opcode
Instruction
Description
0FD4/r
PADDQmm1,mm2/m64
Addquadwordintegermm2/m64tomm1
660FD4/r
PADDQxmm1,xmm2/m128
Addpackedquadwordintegersxmm2/m128toxmm1
PADDSB/PADDSW--PackedAddwithSaturation
Opcode
Instruction
Description
0FEC/r
PADDSBmm,mm/m64
Addpackedsignedbyteintegersfrommm/m64andmmandsaturatetheresults.
660FEC/r
PADDSBxmm1,
xmm2/m128
Addpackedsignedbyteintegersfromxmm2/m128andxmm1saturatetheresults.
0FED/r
PADDSWmm,mm/m64
Addpackedsignedwordintegersfrommm/m64andmmandsaturatetheresults.
660FED/r
PADDSWxmm1,xmm2/m128
Addpackedsignedwordintegersfromxmm2/m128andxmm1andsaturatetheresults.
PADDUSB/PADDUSW--PackedAddUnsignedwithSaturation
Opcode
Instruction
Description
0FDC/r
PADDUSBmm,mm/m64
Addpackedunsignedbyteintegersfrommm/m64andmmandsaturatetheresults.
660FDC/r
PADDUSBxmm1,xmm2/m128
Addpackedunsignedbyteintegersfromxmm2/m128andxmm1saturatetheresults.
0FDD/r
PADDUSWmm,mm/m64
Addpackedunsignedwordintegersfrommm/m64andmmandsaturatetheresults.
660FDD/r
PADDUSWxmm1,xmm2/m128
Addpackedunsignedwordintegersfromxmm2/m128toxmm1andsaturatetheresults.
PMADDWD--PackedMultiplyandAdd
Opcode
Instruction
Description
0FF5/r
PMADDWDmm,mm/m64
Multiplythepackedwordsinmmbythepackedwordsinmm/m64.Addthe32-bitpairsofresultsandstoreinmmasdoubleword
660FF5/r
PMADDWDxmm1,xmm2/m128
Multiplythepackedwordintegersinxmm1bythepackedwordintegersinxmm2/m128,andaddtheadjacentdoublewordresults.
PSADBW--PackedSumofAbsoluteDifferences
Opcode
Instruction
Description
0FF6/r
PSADBWmm1,mm2/m64
Absolutedifferenceofpackedunsignedbyteintegersfrommm2/m64andmm1;differencesarethensummedtoproduceanunsignedwordintegerresult.
660FF6/r
PSADBWxmm1,xmm2/m128
Absolutedifferenceofpackedunsignedbyteintegersfromxmm2/m128andxmm1;the8lowdifferencesand8highdifferencesarethensummedseparatelytoproducetwowordintegerresults.
PSUBB/PSUBW/PSUBD--PackedSubtract
Opcode
Instruction
Description
0FF8/r
PSUBBmm,mm/m64
Subtractpackedbyteintegersinmm/m64frompackedbyteintegersinmm.
660FF8/r
PSUBBxmm1,xmm2/m128
Subtractpackedbyteintegersinxmm2/m128frompackedbyteintegersinxmm1.
0FF9/r
PSUBWmm,mm/m64
Subtractpackedwordintegersinmm/m64frompackedwordintegersinmm.
660FF9/r
PSUBWxmm1,xmm2/m128
Subtractpackedwordintegersinxmm2/m128frompackedwordintegersinxmm1.
0FFA/r
PSUBDmm,mm/m64
Subtractpackeddoublewordintegersinmm/m64frompackeddoublewordintegersinmm.
660FFA/r
PSUBDxmm1,xmm2/m128
Subtractpackeddoublewordintegersinxmm2/mem128frompackeddoublewordintegersinxmm1.
PSUBQ--PackedSubtractQuadword
Opcode
Instruction
Description
0FFB/r
PSUBQmm1,mm2/m64
Subtractquadwordintegerinmm1frommm2/m64.
660FFB/r
PSUBQxmm1,xmm2/m128
Subtractpackedquadwordintegersinxmm1fromxmm2/m128.
PSUBSB/PSUBSW--PackedSubtractwithSaturation
Opcode
Instruction
Description
0FE8/r
PSUBSBmm,mm/m64
Subtractsignedpackedbytesinmm/m64fromsignedpackedbytesinmmandsaturateresults.
660FE8/r
PSUBSBxmm1,xmm2/m128
Subtractpackedsignedbyteintegersinxmm2/m128frompackedsignedbyteintegersinxmm1andsaturateresults.
0FE9/r
PSUBSWmm,mm/m64
Subtractsignedpackedwordsinmm/m64fromsignedpackedwordsinmmandsaturateresults.
660FE9/r
PSUBSWxmm1,xmm2/m128
Subtractpackedsignedwordintegersinxmm2/m128frompackedsignedwordintegersinxmm1andsaturateresults.
PSUBUSB/PSUBUSW--PackedSubtractUnsignedwithSaturation
Opcode
Instruction
Description
0FD8/r
PSUBUSBmm,mm/m64
Subtractunsignedpackedbytesinmm/m64fromunsignedpackedbytesinmmandsaturateresult.
660FD8/r
PSUBUSBxmm1,xmm2/m128
Subtractpackedunsignedbyteintegersinxmm2/m128frompackedunsignedbyteintegersinxmm1andsaturateresult.
0FD9/r
PSUBUSWmm,mm/m64
Subtractunsignedpackedwordsinmm/m64fromunsignedpackedwordsinmmandsaturateresult.
660FD9/r
PSUBUSWxmm1,xmm2/m128
Subtractpackedunsignedwordintegersinxmm2/m128frompackedunsignedwordintegersinxmm1andsaturateresult.
SUBPD--PackedDouble-PrecisionFloating-PointSubtract
Opcode
Instruction
Description
660F5C/r
SUBPDxmm1,xmm2/m128
Subtractpackeddouble-precisionfloating-pointvaluesinxmm2/m128fromxmm1.
SUBPS--PackedSingle-PrecisionFloating-PointSubtract
Opcode
Instruction
Description
0F5C/r
SUBPSxmm1xmm2/m128
Subtractpackedsingle-precisionfloating-pointvaluesinxmm2/memfromxmm1.
SUBSD--ScalarDouble-PrecisionFloating-PointSubtract
Opcode
Instruction
Description
F20F5C/r
SUBSDxmm1,xmm2/m64
Subtractsthelowdouble-precisionfloating-pointnumbersinxmm2/mem64fromxmm1.
SUBSS--ScalarSingle-FPSubtract
Opcode
Instruction
Description
F30F5C/r
SUBSSxmm1,xmm2/m32
Subtractthelowersingle-precisionfloating-pointnumbersinxmm2/m32fromxmm1.
------------------------------------------------------------------------------------------------------
PMULHUW--PackedMultiplyHighUnsigned
Opcode
Instruction
Description
0FE4/r
PMULHUWmm1,mm2/m64
Multiplythepackedunsignedwordintegersinmm1registerandmm2/m64,andstorethehigh16bitsoftheresultsinmm1.
660FE4/r
PMULHUWxmm1,xmm2/m128
Multiplythepackedunsignedwordintegersinxmm1andxmm2/m128,andstorethehigh16bitsoftheresultsinxmm1.
PMULHW--PackedMultiplyHighSigned
Opcode
Instruction
Description
0FE5/r
PMULHWmm,mm/m64
Multiplythepackedsignedwordintegersinmm1registerandmm2/m64,andstorethehigh16bitsoftheresultsinmm1.
660FE5/r
PMULHWxmm1,xmm2/m128
Multiplythepackedsignedwordintegersinxmm1andxmm2/m128,andstorethehigh16bitsoftheresultsinxmm1.
PMULLW--PackedMultiplyLowSigned
Opcode
Instruction
Description
0FD5/r
PMULLWmm,mm/m64
Multiplythepackedsignedwordintegersinmm1registerandmm2/m64,andstorethelow16bitsoftheresultsinmm1.
660FD5/r
PMULLWxmm1,xmm2/m128
Multiplythepackedsignedwordintegersinxmm1andxmm2/m128,andstorethelow16bitsoftheresultsinxmm1.
PMULUDQ--MultiplyDoublewordUnsigned
Opcode
Instruction
Description
0FF4/r
PMULUDQmm1,mm2/m64
Multiplyunsigneddoublewordintegerinmm1byunsigneddoublewordintegerinmm2/m64,andstorethequadwordresultinmm1.
66OFF4/r
PMULUDQxmm1,xmm2/m128
Multiplypackedunsigneddoublewordintegersinxmm1bypackedunsigneddoublewordintegersinxmm2/m128,andstorethequadwordresultsin
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 双三次插值 优化