Rethinking the Inception Architecture for Computer Vision

Architecture in a Large Frame(Base on Paper)
| Layer | Filter | Filter Size | Stride | Padding | Size of Feature Map |
|---|---|---|---|---|---|
| Input | 299 x 299 x 3 | ||||
| Convolution 1 | 32 | 3 x 3 | 2 | - | 149 x 149 x 32 |
| Convolution 2 | 32 | 3 x 3 | 1 | - | 147 x 147 x 32 |
| Convolution 3 | 64 | 3 x 3 | 1 | 1 | 147 x 147 x 64 |
| Max Pool 1 | 3 x 3 | 2 | - | 73 x 73 x 64 | |
| Convolution 4 | 80 | 3 x 3 | 1 | - | 71 x 71 x 80 |
| Convolution 5 | 192 | 3 x 3 | 2 | - | 35 x 35 x 192 |
| Convolution 6 | 288 | 3 x 3 | 1 | 1 | 35 x 35 x 288 |
| Max Pool 2 | 3 x 3 | 1 | 1 | 35 x 35 x 288 | |
| 3 x Inception | 768 | As in Figure 5 | 17 x 17 x 768 | ||
| 5 x Inception | 1280 | As in Figure 6 | 8 x 8 x 1280 | ||
| 2 x Inception | 2048 | As in Figure 7 | 8 x 8 x 2048 | ||
| Average Pool 1 | 8 x 8 | 1 x 1 x 2048 | |||
| Linear | 1 x 1000 | ||||
| Softmax | 1000 |
Figure 5

Figure 6

Figure 7

Detailed Architecture(Base on PyTorch)
BuildCNN-PyTorch/03B_GoogLeNet_V2&V3.ipynb at main · CodeSensory/BuildCNN-PyTorch