pycharm remote interpreter: cannot find declaration
现象:
调试debug 带来很多不便.
解决:
- 在
project interpreter
中的路径修改一下:/bin/python3
--->/bin/python
.小细节坑爹.
pycharm
,下载远程环境的代码,时间视网络环境和数量量决定.因为是远程环境,网络较差,坑爹地大概需要30多分钟,需要耐心等等:
evalue coco dataset error
Traceback (most recent call last):
File "/root/dxq/question-split-mask-rcnn/doc/evaluater.py", line 15, in <module>
from pycocotools.coco import COCO
File "/root/anaconda3/lib/python3.7/site-packages/pycocotools-2.0-py3.7-linux-x86_64.egg/pycocotools/coco.py", line 55, in <module>
from . import mask as maskUtils
File "/root/anaconda3/lib/python3.7/site-packages/pycocotools-2.0-py3.7-linux-x86_64.egg/pycocotools/mask.py", line 3, in <module>
import pycocotools._mask as _mask
File "__init__.pxd", line 918, in init pycocotools._mask
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject
环境设置错误了...
pip uninstall numpy
pip install numpy==1.16.2
pycocotools
和numpy
的兼容问题. 降级处理.
坑爹玩意,除此之外,还有好多坑,都心塞踩过来了...要不是对 coco格式还算了解,真不是那么容易.
pycocotools evaluate 数据解读.
Average Precision (AP):
AP% AP at IoU=.50:.05:.95 (primary challenge metric)
APIoU=.50% AP at IoU=.50 (PASCAL VOC metric)
APIoU=.75% AP at IoU=.75 (strict metric)
AP Across Scales:
AP small% AP for small objects: area < 32**2
AP medium% AP for medium objects: 32**2 < area < 96**2
AP large% AP for large objects: area > 96**2
Average Recall (AR):
AR max=1% AR given 1 detection per image
AR max=10% AR given 10 detections per image
AR max=100% AR given 100 detections per image
AR Across Scales:
AR small% AR for small objects: area < 32**2
AR medium% AR for medium objects: 32**2 < area < 96**2
AR large% AR for large objects: area > 96**2
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
DETECTION_MIN_CONFIDENCE = 0.5
-----------------------------------------------------------------------------------
20200522T1052_0601.h5
100%|█████████████████████████████████████| 2346/2346 [3:12:10<00:00, 4.57s/it]
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.384
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.626
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.420
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.148
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.291
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.386
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.184
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.436
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.452
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.183
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.364
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.449
Prediction time: 3965.8749873638153. Average 1.6904837968302708/image
Total time: 11549.180015802383
-----------------------------------------------------------------------------------
20200522T1052_0656.h5
100%|█████████████████████████████████████| 2346/2346 [3:34:07<00:00, 5.06s/it]
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.381
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.621
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.415
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.154
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.290
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.382
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.183
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.432
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.448
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.368
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.443
Prediction time: 4758.105183124542. Average 2.028177827418816/image
-----------------------------------------------------------------------------------
20200211T1100_0789.h5
100%|██████████| 2346/2346 [3:58:21<00:00, 6.23s/it]
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.288
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.494
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.302
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.132
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.215
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.288
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.144
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.351
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.364
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.149
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.284
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.361
Prediction time: 5515.610037565231. Average 2.351069922235819/image
Total time: 14321.094527959824
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
DETECTION_MIN_CONFIDENCE = 0
-----------------------------------------------------------------------------------
20200522T1052_0601.h5
100%|██████████| 2346/2346 [4:04:43<00:00, 5.23s/it]
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.397
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.653
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.430
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.156
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.302
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.399
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.191
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.454
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.471
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.192
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.381
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.468
Prediction time: 5827.201899766922. Average 2.4838882778205122/image
Total time: 14703.1005589962
-----------------------------------------------------------------------------------
20200211T1100_0789.h5
100%|██████████| 2346/2346 [3:50:37<00:00, 5.99s/it]
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.303
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.525
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.146
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.226
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.304
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.155
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.376
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.389
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.168
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.304
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.387
Prediction time: 4823.234839677811. Average 2.0559398293596804/image
- evaluate 上一次的模型, 本次训练的模型.挑选出上线模型.挑选标准:train loss,val loss, AP,AP50,AP75--->视觉检验
- confidence= 0.5情况: 601, 565, 上一次 789
- confidence= 0 情况: 601, 789. (以后主要用于不同网络架构的比较. 这个标准比较统一)
- 对比不同 confidence,相同模型 AP 等情况
- confidence = 0 > confidence= 0.5
- 对比相同confidence, 不同模型表现, 本次表现是否好于之前的模型.
- confidence=0.5. 601 > 656 > 上一次训练的789.
- train loss 比 val loss 更有参考意义.下次训练保存 train loss 最佳即可.
导出模型报错
~/anaconda3/envs/dl/lib/python3.6/site-packages/keras/engine/saving.py in load_weights_from_hdf5_group_by_name(f, layers, skip_mismatch, reshape)
1147 ' has shape {}'.format(symbolic_shape) +
1148 ', but the saved weight has shape ' +
-> 1149 str(weight_values[i].shape) + '.')
1150 else:
1151 weight_value_tuples.append((symbolic_weights[i],
ValueError: Layer #391 (named "mrcnn_bbox_fc"), weight <tf.Variable 'mrcnn_bbox_fc/kernel:0' shape=(1024, 48) dtype=float32_ref> has shape (1024, 48), but the saved weight has shape (1024, 68).
原因: 输出类别设置错误, 48/4=12类别,而模型中输出的类别为 68/4=17个类别. 参数改一下就行了.
detectron2 训练中断
现象:
Traceback (most recent call last):
File "train.py", line 46, in <module>
trainer.train()
File "/root/dxq/detectron2/detectron2/engine/defaults.py", line 401, in train
super().train(self.start_iter, self.max_iter)
File "/root/dxq/detectron2/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/root/dxq/detectron2/detectron2/engine/train_loop.py", line 209, in run_step
data = next(self._data_loader_iter)
File "/root/dxq/detectron2/detectron2/data/common.py", line 142, in __iter__
for d in self.dataset:
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
return self._process_data(data)
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/dxq/detectron2/detectron2/data/common.py", line 41, in __getitem__
data = self._map_func(self._dataset[cur_idx])
File "/root/dxq/detectron2/detectron2/utils/serialize.py", line 23, in __call__
return self._obj(*args, **kwargs)
File "/root/dxq/detectron2/detectron2/data/dataset_mapper.py", line 77, in __call__
image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
File "/root/dxq/detectron2/detectron2/data/detection_utils.py", line 120, in read_image
return convert_PIL_to_numpy(image, format)
File "/root/dxq/detectron2/detectron2/data/detection_utils.py", line 57, in convert_PIL_to_numpy
image = image.convert(conversion_format)
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/PIL/Image.py", line 860, in convert
self.load()
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/PIL/ImageFile.py", line 231, in load
"(%d bytes not processed)" % len(b))
OSError: image file is truncated (4 bytes not processed)
处理:
在detection_utils.py
中加入下面这句代码即可.
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
对某个类别检测效果奇差.
现象:
- 其他类别检测效果看上去都还可以
- 就某一个类别检测效果很差,很多都没检测出来.
- 从
loss
上已经非常平缓了. - 将 置信度调低到 0 ,发现 该类别的bbox 重叠非常严重
推测:
- 首先怀疑其实还没训练到收敛. 这一类的特征确实比较复杂,还需要再深入训练.
- 数据中有其他类别造成的影响.
处理过程:
- 继续训练,
iterations
从 30000提高到100000. - 去除一些不太重要的类别,排除干扰,重新训练.
验证:
发现只是训练还不到位,大力出奇迹,训练久一点即可.
比较奇怪的是明明loss
已经没什么太大变化,为什么训练久一点就可以了.
推测: 可能 只有 0.01
单位的loss
影响也是挺大的.