前端验证码破解

8,610 阅读51分钟

最近受够了公司内部站点每次登陆都需要填写用户名和密码,还有输入验证码。

要是能够直接跳过登陆页面就好啦。

说干就干,决定使用油猴插件实现自动登陆功能。

其中最难解决的就是验证码破解,花了一天的时间完美解决,现在整理出来。

1.分析验证码

05.png 6.png 06.png 7.png 07.png 08.png 09.png 10.png 11.png 12.png 13.png 14.png 15.png

分析验证码,是破解验证码一切工作的开始。

  • 验证码有哪些特征?
  • 是否容易破解?
  • 采用什么策略破解?

特征总结

这里仅是总结一下公司网站验证码(上面验证码图片)的特征。

  1. 仅有字母(大小写)和数字,并且剔除了难以区分的字符:1iIlL0oO
  2. 同一字符每次出现的大小、粗细、倾斜都一致(容易做成标准的字符样本库)
  3. 首字符开始的位置一致(方便裁剪左侧背景)
  4. 有干扰线和背景色,颜色相较于字符都比较亮(方便通过阈值来区分像素是否属于字符)

制定破解策略

根据上一步分析的验证码特征来制定破解该验证码的策略。

  1. 制作标准样本库
  2. 使用标准样本对验证码图片进行卷积比对(下面会有介绍)

2.制作样本库

  1. 请求获取验证码
  2. 提取图片像素
  3. 二值化(将像素处理成0和1)
  4. 用canvas绘制二值化后的验证码(白底黑字,也可等比放大以便查看和截图)
  5. 从绘制的二值化后的验证码上截取合适的字符
  6. 处理字符截图(去白边,去噪点)
  7. 还原图片的放大比例(若之前有放大处理)
  8. 保存为模板字符串

获取验证码

// 返回图片base64数据
function getVerifyCode() {
  return fetch(VERIFY_CODE_API)
    .then(rsp => rsp.json())
    .then(data => `data:image/png;base64,${data.data}`)
}

将base64数据转成像素

使用canvas。

// 支持base64数据或本地图片路径
async function getImageData(imageSrc) {
  const image = new Image();
  image.src = imageSrc;
  // 等待图片加载完成
  await new Promise(resolve => {
    image.onload = resolve;
  });
  // 创建canvas
  const canvas = document.createElement('canvas');
  const context = canvas.getContext('2d');
  context.drawImage(image, 0, 0);
  return context.getImageData(0, 0, image.width, image.height);
}

返回ImageData类型的对象。

data是一个Uint8ClampedArray,一个类型数组,每4位表示一个像素的rgba值(0-255)。

image.png

二值化处理

首先需设置好一个阈值,亮度高于阈值认定为背景,低于阈值暂认定为字符(有可能是噪点或干扰线)。

阈值需要根据实际效果进行调优(不断修改)。

推荐初始阈值可以设置为[130, 130, 130](rgb通道值,alpha固定是255就不设置了),约是0-255的中间数。

const threshold = [130, 130, 130];

// 返回每一项都是0或1的二维数组
function binarization(imageData) {
  const pixel2binary = pixel => 
    pixel.every((chValue, index) => chValue > threshold[index]) ? '0' : '1';
  
  // data中每4位表示一个像素
  const { data, width, height } = imageData;
  const binaryData = [];
  let x, y, row, rowLoc, pixel, pixelLoc;
  for (y = 0; y < height; y++) {
    row = [];
    // 当前行起始位置
    rowLoc = y * width * 4;
    for (x = 0; x < width; x++) {
      pixelLoc = rowLoc + x * 4;
      // 取该点的rgb色值
      pixel = imageData.slice(pixelLoc, 3);
      row.push(pixel2binary(pixel));
    }
    binaryData.push(row);
  }
  return binaryData;
}

绘制二值化的数据(黑字白底)

function drawBinaryData(context, data, scale = 1) {
  const binary2pixel = binary => 
    binary === '0' ? [255, 255, 255, 255] : [0, 0, 0, 255];
  const repeatAction = (action) => {
    for (let i = 0; i < scale; i++) action();
  };
  const h = data.length;
  const w = data[0].length;
  let x, y, row;
  cosnt pixelData = [];
  for (y = 0; y < h; y++) repeatAction(() => {
    for (x = 0; x < w; x++) repeatAction(() => {
      pixelData.push(...binary2pixel(data[y][x]));
    });
  });
  // 创建ImageData实例
  const imageData = new ImageData(
    Uint8ClampedArray.from(pixelData),
    w * scale,
    h * scale
  );
  return context.putImageData(imageData, 0, 0);
}

输出宽高都放大4倍的验证码:

image.png

截图保存样本

挑选合适的验证码将字符截图出来。

image.png

上面验证码中的字符5就不适合作为样本,因为截取后右下方会有其它字符的点。当然也可以使用工具或写代码去除.

image.png

将所有字符样本都保存下来。这需要不断请求获取验证码图片。

image.png

去掉字符截图白边

function cutWhiteEdge(data) {
  let edge;
  const isWhiteEdge = () => 
    edge.every(binary => binary === '0');
  // 连续切边
  const cutEdgeContinuous = (resetEdge, cutEdge) => {
    const _resetEdge = () => (edge = resetEdge());
    for (_resetEdge(); isWhiteEdge(); cutEdge(), _resetEdge());
  };
  // 切边顺序:上下左右
  // 上
  cutEdgeContinuous(
    () => data[0],
    () => data.shift()
  );
  // 下
  cutEdgeContinuous(
    () => data[data.length - 1],
    () => data.pop()
  );
  // 左
  cutEdgeContinuous(
    () => data.map(r => r[0]),
    () => data.forEach(r => r.shift())
  );
  // 右
  cutEdgeContinuous(
    () => data.map(r => r[r.length - 1]),
    () => data.forEach(r => r.pop())
  );
}

还原二值化数据的缩放

function restoreDataScale(data, scale) {
  const scaleData = [];
  let x, y, row;
  const h = data.length;
  const w = data[0].length;
  for (y = 0; y < h; y += scale) {
    row = [];
    for (x = 0; x < w; x += scale) {
      row.push(data[y][x]);
    }
    scaleData.push(row);
  }
  return scaleData;
}

保存模板字符串

就是将处理后的二值化数组,转为字符串形式,方便保存(数据库等)。

function binaryData2Template(data) {
  return data.map(r => r.join('')).join(' ');
}

image.png

右侧控制台打印出的就是模板字符串,不过是使用换行符进行每行的分隔。

读取字符截图

上面刚刚介绍了字符截图和处理截图,当中少了读取字符截图这一步。

可以写代码直接读取字符截图的文件夹,一次性处理所有字符截图。

我在做这一步时,是使用input[type=file]手动每次选择一张字符截图进行处理的(时间紧张),这里贴一下代码。

fileInput.addEventListener('change', e => {
  // 获取文件
  if (fileInput.files.length === 0) return;
  const file = fileInput.files[0];
  const reader = new FileReader();
  reader.addEventListener('load', async e => {
    // e.target.result是图片的base64资源
    const imageData = getImageData(e.target.result);
    const binaryData = binarization(imageData);
    cutWhiteEdge(binaryData);
    // 还原之前对图片的放大
    const restoreData = restoreDataScale(binaryData, 4);
    const template = binaryData2Template(restoreData);
    // 使用clipboard将模板写入剪切板
    navigator.clipboard.writeText(template);
    // 也可以发接口写入数据库...
  });
  reader.readAsDataURL(file);
});

FileReader的load事件

image.png

二值化阈值调整

经过多次获取验证码、二值化、然后输出查看发现,有些验证码的图片二值化后有的字符被去除了或去除了部分,原因是这些字符的颜色也比较亮。

03.png

比如这一张验证码,打印出来是这样的(字符S亮度较高):

image.png

此时需要调整阈值(调高一点):

const threshold = [140, 140, 140];

image.png

3.卷积比对

上面介绍了如何获取字符模板。在进行卷积比对前,需要处理和保存好所有字符的模板(这是一个辛苦活😭)。

获取模板

我这里直接使用常量定义了所有字符模板。

const CODE_TEMPLATES = {
  2: '0000001111100 0000111111110 0001110000111 0001100000011 0011100000011 0000000000011 0000000000110 0000000001110 0000000001100 0000000011000 0000000110000 0000011100000 0000111000000 0001110000000 0011100000000 0111000000000 0111111111110 1111111111110',
  3: '000001111000 000111111110 001110000110 001100000011 011100000011 000000000011 000000000110 000000001110 000011111000 000011111000 000000001100 000000001110 000000000110 110000000110 110000001100 111000011100 011111111000 001111100000',
  4: '0000000000111 0000000001110 0000000011110 0000000111110 0000000110110 0000001101110 0000011001100 0000110001100 0001110001100 0001100001100 0011000001100 0110000011100 1111111111111 1111111111111 0000000011000 0000000011000 0000000111000 0000000111000',
  5: '000111111111 000111111111 001100000000 001100000000 001100000000 001100000000 011011110000 011111111000 011100011100 000000001100 000000001110 000000001110 000000001100 110000001100 110000011100 111000111000 011111110000 001111100000',
  6: '0000001111 0000111111 0001111000 0011100000 0011000000 0110000000 0110111100 1111111110 1111000111 1110000011 1100000011 1100000011 1100000011 1100000011 1100000111 1110001110 0111111100 0011111000',
  7: '111111111111 111111111111 000000000110 000000000110 000000001100 000000011100 000000011000 000000110000 000000110000 000001100000 000011100000 000011000000 000111000000 000110000000 001100000000 011100000000 011000000000 111000000000',
  8: '000001111100 000011111110 000111000111 001110000011 001100000011 001100000011 001100000111 001110001110 000111111100 000111111100 011100001100 011000000110 110000000110 110000000110 110000001110 111000011100 011111111000 000111110000',
  9: '00001111000 00111111100 01110001110 01100000111 11100000011 11000000011 11000000011 11000000011 11100000111 01100001110 01111111110 00111100110 00000001100 00000001100 00000011000 00001110000 01111100000 01110000000',
  a: '00001111100 00111111110 01110000110 01100000111 00000000111 00011111110 01111111110 11100000110 11000000110 11000001110 11000011110 11111111100 01111101110',
  A: '000000000111000 000000000111000 000000001111000 000000001111000 000000011001100 000000011001100 000000110001100 000000110001100 000001100001100 000001100001110 000011000000110 000011111111110 000111111111110 001110000000110 001100000000111 011100000000011 011000000000011 111000000000011',
  b: '000110000000 000110000000 001110000000 001100000000 001100000000 001100000000 001101111000 011111111110 011110001110 011100000110 011000000111 011000000111 011000000111 111000000110 111000000110 111000001110 111000011100 111111111000 110111110000',
  B: '0001111111100 0011111111110 0011100000111 0011000000011 0011000000011 0011000000011 0011000000111 0111000001110 0111111111100 0111111111100 0110000001110 0110000000110 0110000000110 1110000000110 1100000001110 1100000011100 1111111111000 1111111110000',
  c: '00001111100 00011111110 00111000111 01100000011 01100000011 11100000000 11000000000 11000000000 11000000000 11100000111 01100001110 01111111100 00011110000',
  C: '000000111110000 000011111111100 000111100001110 000110000000110 001100000000110 001100000000111 011100000000000 011000000000000 011000000000000 011000000000000 011000000000000 111000000000000 011000000001100 011000000001100 011000000011000 001100000111000 001111111110000 000011111000000',
  d: '0000000000011 0000000000011 0000000000111 0000000000110 0000000000110 0000000000110 0000111100110 0011111111110 0011100011110 0110000001100 0110000001100 1110000001100 1100000001100 1100000001100 1100000011100 1110000011000 0110000111000 0111111111000 0011111011000',
  D: '00011111110000 00011111111100 00111000011110 00110000000110 00110000000111 00110000000011 00110000000011 00110000000011 01110000000011 01100000000111 01100000000110 01100000000110 01100000001110 11100000001100 11100000011100 11000001111000 11111111110000 11111111000000',
  e: '00001111100 00011111110 00110000111 01100000011 01100000011 11111111111 11111111111 11000000000 11000000000 11100000000 01110000110 01111111100 00011111000',
  E: '00011111111111 00011111111110 00111000000000 00111000000000 00110000000000 00110000000000 00110000000000 00110000000000 01111111111000 01111111111000 01100000000000 01100000000000 01100000000000 11100000000000 11100000000000 11000000000000 11111111111000 11111111111000',
  f: '000001111 000111110 000111000 001110000 001100000 001100000 111111100 111111100 001100000 011100000 011000000 011000000 011000000 011000000 011000000 111000000 110000000 110000000 110000000',
  F: '00011111111111 00011111111110 00111000000000 00111000000000 00110000000000 00110000000000 00110000000000 00110000000000 01111111111000 01111111111000 01100000000000 01100000000000 01100000000000 11100000000000 11100000000000 11000000000000 11000000000000 11000000000000',
  g: '0000011110011 0001111111111 0001110001111 0011100000111 0011000000110 0111000000110 0110000000110 0110000000110 0110000001110 0111000001100 0011000011100 0011111111100 0001111101100 0000000011100 0100000011000 1110000111000 0111111110000 0011111000000',
  G: '00000111111000 00001111111100 00011100001110 00110000000110 01110000000111 01100000000000 11100000000000 11000000000000 11000000000000 11000001111110 11000001111110 11000000000110 11000000001110 11000000001100 11100000001100 01110000011100 00111111111000 00011111100000',
  h: '000111000000 000110000000 000110000000 000110000000 000110000000 001110000000 001110111100 001101111110 001111000111 001100000111 001100000011 011100000111 011100000110 011000000110 011000000110 011000000110 011000001110 111000001110 110000001100',
  H: '0001100000000011 0001100000000011 0011100000000111 0011100000000110 0011000000000110 0011000000000110 0011000000000110 0011000000000110 0111111111111110 0111111111111100 0110000000001100 0110000000001100 0110000000001100 1110000000011100 1110000000011100 1100000000011000 1100000000011000 1100000000011000',
  j: '000000110 000000111 000000110 000000000 000000000 000000110 000001110 000001110 000001100 000001100 000001100 000001100 000011100 000011000 000011000 000011000 000011000 000011000 000111000 000110000 000110000 111110000 111100000',
  J: '0000000000011 0000000000011 0000000000011 0000000000011 0000000000111 0000000000110 0000000000110 0000000000110 0000000000110 0000000001110 0000000001110 0000000001100 0000000001100 1110000001100 1110000011100 0111000111000 0111111110000 0001111100000',
  k: '0000110000000 0001110000000 0001100000000 0001100000000 0001100000000 0001100000000 0001100001111 0011100011100 0011000111000 0011001110000 0011011100000 0011111000000 0011111000000 0111111100000 0110001100000 0110000110000 0110000111000 0110000011000 1110000011100',
  K: '0001100000001111 0001100000011100 0011100000111000 0011100001110000 0011000011100000 0011000111000000 0011001110000000 0011011100000000 0111111100000000 0111111100000000 0111101110000000 0111000110000000 0110000111000000 1110000011000000 1110000011100000 1100000001100000 1100000001110000 1100000000111000',
  m: '00111011110000111100 00111111111011111110 00111000011110000110 00110000011100000111 00110000001100000111 01110000011100000110 01110000011000000110 01100000011000000110 01100000011000000110 01100000011000000110 01100000111000001110 11100000111000001100 11000000110000001100',
  M: '00011100000000000111 00011100000000001111 00111100000000001111 00111100000000011110 00110110000000111110 00110110000000110110 00110110000001110110 00110110000001100110 01110111000011101110 01100011000011001100 01100011000110001100 01100011000110001100 01100011001100001100 11100011111100011100 11100001111000011000 11000001111000011000 11000001110000011000 11000001110000011000',
  n: '00110111110 00111111111 01111000111 01110000011 01100000011 01100000011 01100000011 01100000111 11100000110 11000000110 11000000110 11000000110 11000001110',
  N: '00011100000000111 00011100000000111 00011110000000110 00011110000000110 00011111000000110 00011011000000110 00111011100001110 00111001100001110 00110001110001100 00110000110001100 00110000111001100 00110000011001100 01110000011011100 01110000011111000 01100000001111000 01100000001111000 01100000000111000 11100000000111000',
  p: '0001101111000 0001111111110 0011110001110 0011100000110 0011000000111 0011000000111 0011000000110 0111000000110 0110000000110 0110000001110 0111000011100 0111111111000 0110111110000 1110000000000 1100000000000 1100000000000 1100000000000 1100000000000',
  P: '000111111111000 000111111111110 000110000000110 000110000000111 000110000000011 000110000000011 001110000000111 001110000000111 001100000001110 001111111111100 001111111110000 001100000000000 011100000000000 011100000000000 011000000000000 011000000000000 011000000000000 111000000000000',
  q: '000011110011 001111111111 001110001111 011100000110 011000000110 111000000110 110000000110 110000001110 110000001110 111000001100 011000011100 011111111100 001111101100 000000001100 000000011000 000000011000 000000011000 000000011000',
  Q: '00000111110000 00011111111100 00111100001110 00110000000110 01100000000110 01100000000111 11100000000111 11000000000111 11000000000111 11000000000111 11000000000110 11000000000110 11000000001110 11000000001100 11100000011100 01110000111000 01111111110000 00011111110000 00000000111000 00000000011100 00000000010000',
  r: '001110111 001111111 001110000 001100000 001100000 001100000 011100000 011000000 011000000 011000000 011000000 111000000 111000000',
  R: '00011111111000 00011111111100 00111000001110 00110000000110 00110000000111 00110000000111 00110000000110 01110000001110 01110000011100 01111111111000 01111111110000 01100000110000 01100000110000 11100000111000 11100000011000 11000000011000 11000000011100 11000000001100',
  s: '00001111100 00111111110 01110000111 01100000011 01110000000 00111110000 00011111100 00000011110 00000000110 11000000110 11100001110 01111111100 00111110000',
  S: '00000111111000 00001111111100 00011100001110 00111000000110 00110000000111 00110000000000 00110000000000 00011100000000 00001111000000 00000111110000 00000000111000 00000000001100 00000000001100 11000000001100 11000000011100 01110000111000 01111111111000 00011111100000',
  t: '0001100 0001100 0001100 1111111 1111111 0011000 0011000 0011000 0011000 0111000 0110000 0110000 0110000 0110000 0111100 0011100',
  T: '11111111111111 11111111111110 00000111000000 00000111000000 00000110000000 00000110000000 00000110000000 00000110000000 00001110000000 00001110000000 00001100000000 00001100000000 00001100000000 00001100000000 00011100000000 00011100000000 00011000000000 00011000000000',
  u: '011100000111 011000000110 011000000110 011000000110 011000000110 011000001110 111000001100 110000001100 110000001100 110000011100 111000111100 011111111100 001111011000',
  U: '000110000000011 001110000000011 001100000000111 001100000000110 001100000000110 001100000000110 011100000000110 011100000000110 011000000001110 011000000001100 011000000001100 011000000001100 011000000001100 111000000011100 011000000011000 011100001111000 001111111110000 000111111000000',
  v: '11100000011 01100000111 01100000110 01100001110 01100001100 00100011100 00110011000 00110110000 00110110000 00111100000 00011100000 00011000000 00011000000',
  V: '111000000000111 011000000000110 011000000001110 011000000001100 011000000011100 011100000011000 001100000111000 001100000110000 001100001110000 001100001100000 001100011100000 000110011000000 000110111000000 000110110000000 000111110000000 000111100000000 000011100000000 000011000000000',
  w: '111000001100000111 011000011100000110 011000011100001100 011000111100001100 011000110100011000 011001100100011000 011001100110111000 011011000110110000 011011000110110000 011110000111100000 001110000111100000 001100000011000000 001100000011000000',
  W: '111000000111000000111 111000000111000000110 011000001111000001110 011000001111000001100 011000001111000001100 011000011011000011100 011000011011000011000 011000110011000011000 011000110001000110000 011001110001100110000 011001100001101110000 011001100001101100000 011011000001101100000 011011000001111000000 011110000001111000000 001110000001111000000 001110000001110000000 001100000000110000000',
  x: '0001100000111 0001110000110 0000110001100 0000111011100 0000011111000 0000011110000 0000001100000 0000011110000 0000110110000 0001110111000 0011100011000 0111000011100 1110000001100',
  X: '00011100000000111 00001110000001110 00000110000011100 00000111000011000 00000011000111000 00000011101110000 00000011111100000 00000001111000000 00000001111000000 00000001110000000 00000011111000000 00000111011000000 00000110011100000 00001110001100000 00011100001110000 00111000000110000 00110000000111000 11110000000011000',
  y: '0001100000011 0001100000111 0001100000110 0001100001110 0001110001100 0000110011100 0000110011000 0000110111000 0000110110000 0000111100000 0000111100000 0000011000000 0000011000000 0000110000000 0000110000000 0001100000000 1111100000000 1110000000000',
  Y: '11100000000111 01100000001110 01100000001100 01110000011100 00110000111000 00110000110000 00111001110000 00011011100000 00011011000000 00011111000000 00001110000000 00001100000000 00001100000000 00001100000000 00011100000000 00011100000000 00011000000000 00011000000000',
  z: '001111111111 001111111111 000000001110 000000011100 000000111000 000001110000 000011100000 000111000000 000110000000 001100000000 011000000000 111111111100 111111111100',
  Z: '000111111111111 000111111111111 000000000001110 000000000001100 000000000011100 000000000011000 000000001110000 000000011100000 000000111000000 000000110000000 000001100000000 000011100000000 000111000000000 001110000000000 011100000000000 011000000000000 111111111111100 111111111111000', 
};

统计字符模板中有效像素

统计字符模板中有效像素,是指统计模板中出现1的个数(0表示背景,无效像素)。

统计有效像素的目的是为了后面判断相似度时使用。

这一步也可以在得到模板的时候就做好,然后保存到数据库。

const tplEffectPoints = CODE_TEMPLATES.reduce((calc, code) => {
  // 统计每个字符模板中1的个数
  calc[code] = CODE_TEMPLATES[code].split('').filter(c => c === '1').length;
  return calc;
}, {});

什么是卷积比对

未命名.gif

我制作了一个gif示意图。卷积比对,我之前称之为扫描比对,就相当于拿着模板在图片上不停的移动(从左往右,从上往下),判断图片上的有效像素点(为1的点)是否与该字符模板的有效像素点重合度(也是相似度)。

可以想一下,为什么只判断有效像素点的重合度,而不判断非有效像素。

实现卷积比对代码

// 返回是否匹配,匹配个数,匹配位置
function convolution(binaryData, threshold = 1) {
  const codes = Object.keys(CODE_TEMPLATES);
  const h = binaryData.length;
  const w = binaryData[0].length;
  const matches = [];
  let code, tplData, tplH, tplW;

  function doConvolution() {
    let x, y, colLastIdx, rowLastIdx;

    // 返回1的个数,重合个数,重合百分比(相似度)
    const compare = (x, y, code) => {
      let effectivePointNum = 0;
      for (let i = 0; i < tplH; i++) {
        for (let j = 0; j < tplW; j++) {
          if (tplData[i][j] === '1') {
            if (tplData[i][j] === binaryData[i + y][j + x]) {
              effectivePointNum++;
            }
          }
        }
      }
      // 相似度 = 重合点数/字符模板有效点数
      const similarity = effectivePointNum / tplEffectPoints[code];
      return { x, y, similarity };
    };

    // 卷积方向:从左往右,从上往下
    for (y = 0, rowLastIdx = h - tplH; y <= rowLastIdx; y++) {
      for (x = 0, colLastIdx = w - tplW; x <= colLastIdx; x++) {
        const result = compare(x, y, code);
        if (result.similarity >= threshold) {
          matches.push({ ...result, code });
        }
      }
    }
  }

  for (let i = 0; i < codes.length; i++) {
    code = codes[i];
    // 将模板转成二维数组
    tplData = CODE_TEMPLATES[code].split(' ').map(row => row.split(''));
    tplH = tplData.length;
    tplW = tplData[0].length;
    doConvolution();
  }
  // 按位置(x轴)排序
  matches.sort((a, b) => a.x - b.x);
  return matches;
} 

其它处理

在进行卷积比对前,需将验证码进行二值化处理。

二值化后的图片可能还需要进行其它处理,如去噪点、去干扰线等。

这里简单处理了一下噪点。

去噪点

噪点就是在验证码图片上随机放上一些亮度较暗的一些点,如果我们仅通过明暗这个阈值来做过滤时,很容易将噪点当做有效像素。

image.png

噪点的特征

一般来说,噪点都是随机的,不连续的.

这里简单判断一下噪点:如果一个有效点(为1的点)的周围(上下左右)不存在另一个有效点,那么就认为这个有效点是一个噪点。

function denoising(binData) {
  const h = binData.length;
  const w = binData[0].length;
  const isEffectivePoint = (x, y) => binData[y][x] === '1';
  const checkAround = (x, y) => {
    // 边界控制
    const checkTop = y > 0;
    const checkBottom = y < h - 1;
    const checkLeft = x > 0;
    const checkRight = x < w - 1;
    
    return (
      (checkTop && isEffectivePoint(x, y - 1)) ||
      (checkBottom && isEffectivePoint(x, y + 1)) ||
      (checkLeft && isEffectivePoint(x - 1, y)) || 
      (checkRight && isEffectivePoint(x + 1, y))
    );
  };
  
  for (let y = 0; y < h; y++) {
    for (let x = 0; x < w; x++) {
      if (isEffectivePoint(x, y) && !checkAround(x, y)) {
        // 将噪点置为无效点
        binData[y][x] = '0';
      }
    }
  } 
}

后期处理

通过以上卷积比对拿到的结果可能并不总是满足我们的目的。

image.png

识别上面的验证码图片,得到的匹配结果是这样的:

image.png

识别结果中数量不仅超出了4个,还额外多识别了r。这是因为该字体的字符P中包含了字符r所有的有效像素。

所以,在匹配结果中,P字符位置若识别出字符r,我们应该舍弃字符r

这里列出该字体,所有有包含关系的字符:

const containMap = {
  Q: { C: -1 }, // C的x比Q小1
  E: { F: 0 },
  V: { v: 1 },
  y: { v: 2 },
  m: { r: 0 },
  p: { r: 0 },
};

根据字符包含关系进行后期处理:

function afterEffect(matches) {
  if (matches.length <= 4) return;
  // 构建数据结构,方便后续处理 {e: [match], r: [match, match], ...}
  const codeMap = matches.reduce((map, item) => {
    const { code } = item;
    (map[code] = map[code] || []).push(item);
    return map;
  }, {});
  
  Object.keys(containMap).forEach(code => {
    if (!codeMap[code]) return;
    Object.keys(containMap[code]).forEach(containCode => {
      if (!codeMap[containCode]) return;
      // 包含code与被包含code之间的位置偏差
      const offest = containMap[code][containCode];
      codeMap[code].forEach(Q => {
        let idx = codeMap[containCode].findIndex(C => C.x === Q.x + offest);
        if (idx > -1) {
          // 从codeMap中移除
          const [C] = codeMap[containCode].splice(0, 1);
          // 从matches中移除
          idx = matches.findIndex(item => item === C);
          matches.splice(idx, 1);
        }
      });
    });   
  });
}

后期处理可以有很多步骤(这里仅做了一步),需根据具体情况进行处理,越简单越好。

最后从匹配结果中提取验证码。

const verifyCodes = matches.map(item => item.code).join('');

还原验证

在取值验证码之前,需要再核对一次matches中的个数,如果明显不符合,那说明我们处理的还有问题。可以将每一步处理结果进行保存,后期再拿出来还原,对出问题的步骤进行优化。

另外,在我们提交验证码校验后,如果没有校验通过,也需要保存所有步骤的处理结果以及验证码,需要后续排查和优化。

校验失败后处理

会存在校验失败的情况:一种情况是我们的处理还有问题、还有可能是验证码生成步骤也会不断调整。

当识别失败后,可以允许一定次数的重试。