C++写WebAssembly

Z时代
2024-01-10
分类：技术分享

上一篇（环境搭建，简单接入）：C++编写WebAssembly初探

这一次，我们尝试使用WebAssembly来做简单的图片处理。

我们选取一种最基本的图像处理——高斯模糊来尝试实现。原理可参考高斯模糊和卷积滤波简介

js向wasm传递数组

与传递number不同，传递数组时，需要js将数组拷贝到wasm内存中，并通过传递指针（数据在内存中的位置），让wasm通过访问内存的具体位置，来获取或修改数组。

另外，不同于js，wasm的内存管理由开发者进行控制，我们需要手动分配和释放内存。

这里的过程是，首先我们获得表示图片像素的数组，将这个数组复制到wasm内存，再调用wasm模块处理这些像素数据，处理完后js重新读取这块内存，并将处理过的图片画到canvas上。

// 被处理的图片
const srcImg =  document.getElementById('srcImg');
srcImg.onload = () => {
// onload时将图片画到canvas上，以获得像素数据
const { clientWidth = 0, clientHeight = 0 } = srcImg;
var canvas = document.getElementById("drawerCanvas");
canvas.width = clientWidth;
canvas.height = clientHeight;
var ctx = canvas.getContext("2d");
ctx.drawImage(srcImg, 0, 0, clientWidth, clientHeight);
// 获得像素数据
const imageData = ctx.getImageData(0, 0, clientWidth, clientHeight);
// 处理数据
const resImageData = wasmProcess(imageData, clientWidth, clientHeight);
// 将处理后的图片数据画到canvas上
ctx.putImageData(resImageData, 0, 0);
}
// 将js的typedarray复制到wasm的内存
functioncopyToHeap(typedArray) {
const numBytes = typedArray.byteLength;
const ptr = Module._malloc(numBytes);
const heapBytes = newUint8Array(Module.HEAPU8.buffer, ptr, numBytes);
heapBytes.set(newUint8Array(typedArray.buffer));
return heapBytes;
}
// 释放一块wasm内存
functionfreeHeap(heapBytes) {
Module._free(heapBytes.byteOffset);
}
// 图片处理的函数
functionwasmProcess(imgData, width, height) {
const heapBytes = copyToHeap(imgData.data);
// 调用c++暴露的方法。其中heapBytes.byteoffset传递的是wasm内存中数组的指针
Module.ccall(
'easyBlur',
'number',
['number', 'number', 'number', 'number', 'number'],
[heapBytes.byteOffset, width, height, 3, 3]
);
// 从wasm内存读取出处理后的数据
const newData = newUint8ClampedArray(heapBytes);
// 释放wasm内存
freeHeap(heapBytes);
const newImageData = new ImageData(newData, width, height);
return newImageData;
}

简单的高斯模糊算法实现

这里取最简单的滤波器，即矩阵所有项都相等的滤波器。要使得滤波器的各项和为1，则每一项的值为1 / (cw*ch).

如一个3*3的滤波器为 [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11].我们可以简单地通过改变cw和ch来调整模糊的强度，cw和ch越大，扩散程度越大，则模糊强度也越大。

另外我们需要观察ctx.getImageData()得到的数组格式：获得的data是一个一维数组，按照从从左到右，从上到下的顺序记录了图片每个像素的值。其中每4个值为一组，分别代表同一个像素的r, g, b, a四个通道的数值。我们模糊时对每个通道进行单独处理。

我的代码：

#include<cstdint>
#include<cmath>
// 卷积操作，传入imageData像素数组的指针，imageData宽高，滤波器及滤波器宽高。
voidconv(uint8_t *ptr, int width, int height, float* filter, int cw, int ch){
for (int i = ch / 2; i < height - ceil((float)ch / 2) + 1; i++) {
for (int j = cw / 2; j < width - ceil((float)cw / 2) + 1; j++) {
// rgba取前3个通道进行处理
for (int k = 0; k < 3; k++) {
float sum = 0;
int count = 0;
for (int x = -ch / 2; x < ceil((float)ch / 2); x++) {
for (int y = -cw / 2; y < ceil((float)cw / 2); y++) {
sum += filter[count] * (float)ptr[((i+x)*width+(y+j))*4+k];
count++;
}
}
ptr[(i*width+j)*4+k] = (uint8_t)sum;
}
}
}
}
#ifdef __cplusplus
extern"C"
{
#endif
// 供js调用的函数，传入像素数组的指针，宽高，以及滤波器的宽高
// 这里为了简单，默认滤波器矩阵每一项的值相同，即1/(cw*ch)。
voideasyBlur(uint8_t *ptr, int width, int height, int cw = 3, int ch = 3){
float* filter = newfloat[cw * ch];
float value = 1 / (float)(cw * ch);
for (int i = 0; i < cw * ch; i++) {
filter[i] = value;
}
conv(ptr, width, height, filter, cw, ch);
delete [] filter;
}
#ifdef __cplusplus
}
#endif

效果预览

对于宽度200px左右的图片，使用长宽为5的滤波器效果如下：

C++ 编写 WebAssembly初探(二)

瓶颈

使用js以相同的方法重新实现了一次，发现在图片较小时js处理的耗时更短，而图片较大时wasm虽然速度快于js，但处理的时间也非常长，是不能忍受的。

问题的原因很可能是：

js调用C时有一定的执行代价

将数据在js内存和wasm内存之间复制消耗大量的时间，影响性能。
所以这种数据量非常大的场景下，wasm虽然优化了计算的时间，但因为传递的的时间大大增加，反而成为了性能的瓶颈。

另外，对于前端来说，自己实现相关的处理算法，性能远不如线上一些库优化得好。这里有更多前端可用的图片处理库可以参考。

Ref

Emscripten: Pointers and Pointers

ArrayBuffer - ECMAScript 6入门

[译]WebAssembly 中的Memory

以上是 C++写WebAssembly 的全部内容，来源链接： utcz.com/a/116002.html

C++写WebAssembly

js向wasm传递数组

简单的高斯模糊算法实现

效果预览

瓶颈

Ref

其他人也看了：