Well, when I mentioned “puzzling’, I really meant it!
In this article, we will look into how Google Vision API (and other deep learning vision API) in handling the images which has been break into pieces and “puzzlified”.
Previous article, “Hacking Deep Learning Network” tried how a DNN could stand on the correct answer/prediction with different level of noise, and some suggestions on how to pre-process the images to overcome the issues.
Let’s try further on more “messed-up” images and see how different vision API handle these type of images.
First, we will start with a simple rotation:
S = imread('cat.jpg'); S2 = imrotate(S,45); imwrite(S2,'test.jpg');
The image “test.jpg” was then uploaded manually to different Deep Learning vision API and compare the results.
All the results were arranged in following orders:
Looks like Clarifai and Google perform better in recognizing rotated image ?
Next, we tried to confuse the network by “puzzling” the image, swapping the upper part of image with the bottom part.
S3 = [S(151:$,:,:);S(1:150,:,:)]; imwrite(S3,'test_updown.jpg');
All 3 API seems to work fine, perhaps due to the face of the cat is visible?
and left right
S4 = [S(:,201:$,:) S(:,1:200,:)]; imwrite(S4,'test_leftright.jpg');
Carifai seems like confused in this case, possible due to the face split into 2?
Now “puzzlify” the image into 2×2 puzzle
// Puzzle size at n x n n = 2; // Initialization S5 = uint8(zeros(S)); [R C] = size(S); Rn = R/n; Cn = C/n; ind = grand(1, "prm", 1:n^2); // Break the images into pieces cnt=1; Sn = list(); for cntR = 1:n for cntC = 1:n Sn(cnt) = S((cntR-1)*Rn+1:(cntR-1)*Rn+Rn,(cntC-1)*Cn+1:(cntC-1)*Cn+Cn,:); cnt=cnt+1; end end // Combine the images into puzzle cnt=1; for cntR = 1:n for cntC = 1:n S5((cntR-1)*Rn+1:(cntR-1)*Rn+Rn,(cntC-1)*Cn+1:(cntC-1)*Cn+Cn,:) = Sn(ind(cnt)); cnt=cnt+1; end end imshow(S5); imwrite(S5,'test_puzzle22.jpg');
Microsoft Azure and Google still able to “see” this as cat.
3 x 3
Azure starts to overtake Google in this case.
5 x 5
Only Azure can see it now.
15 x 15
ok, this is too much. 🙂
Important: Different APIs might use different methods for their network and different focus on the recognition objectives. The comparison above might not be fair due to this reason, just treat this as a fun way of playing with DNN.