Very Deep Convolutional Networks for Large-Scale Image Recognition
224 x 224의 RGB 이미지를 입력으로 사용함
Convolutional Layer에서는 3 x 3 필터를 사용하며, Stride는 항상 1px 고정
특정 구성에서는 1 x 1 필터를 사용함
Max Pooling은 2 x 2 필터를 사용하며, Stride는 2px
FC Layer, Activation Function(use ReLU)는 AlexNet과 동일함

VGGNet-A
| Layer | # Filters / neurons | Filter Size | Stride | Padding | Size of Feature Map | Activation Function |
|---|---|---|---|---|---|---|
| Input | - | - | - | - | 224 x 224 x 3 | - |
| Convolution 1 | 64 | 3 x 3 | 1 | 1 | 224 x 224 x 64 | ReLU |
| Max Pool 1 | - | 2 x 2 | 2 | - | 112 x 112 x 64 | - |
| Convolution 2 | 128 | 3 x 3 | 1 | 1 | 112 x 112 x 128 | ReLU |
| Max Pool 2 | - | 2 x 2 | 2 | - | 56 x 56 x 128 | - |
| Convolution 3 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Convolution 4 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Max Pool 3 | - | 2 x 2 | 2 | - | 28 x 28 x 256 | - |
| Convolution 5 | 512 | 3 x 3 | 1 | 1 | 28 x 28 x 512 | ReLU |
| Convolution 6 | 512 | 3 x 3 | 1 | 1 | 28 x 28 x 512 | ReLU |
| Max Pool 4 | - | 2 x 2 | 2 | - | 14 x 14 x 512 | - |
| Convolution 7 | 512 | 3 x 3 | 1 | 1 | 14 x 14 x 512 | ReLU |
| Convolution 8 | 512 | 3 x 3 | 1 | 1 | 14 x 14 x 512 | ReLU |
| Max Pool 5 | - | 2 x 2 | 2 | - | 7 x 7 x 512 | - |
| Dense(FC) 1 | 4096 | - | - | - | 1 x 4096 | - |
| Dense(FC) 2 | 4096 | - | - | - | 1 x 4096 | - |
| Softmax | 1000 | - | - | - | 1 x 1000 | - |
VGGNet-B
| Layer | # Filters / neurons | Filter Size | Stride | Padding | Size of Feature Map | Activation Function |
|---|---|---|---|---|---|---|
| Input | - | - | - | - | 224 x 224 x 3 | - |
| Convolution 1 | 64 | 3 x 3 | 1 | 1 | 224 x 224 x 64 | ReLU |
| Convolution 2 | 64 | 3 x 3 | 1 | 1 | 224 x 224 x 64 | ReLU |
| Max Pool 1 | - | 2 x 2 | 2 | - | 112 x 112 x 64 | - |
| Convolution 3 | 128 | 3 x 3 | 1 | 1 | 112 x 112 x 128 | ReLU |
| Convolution 4 | 128 | 3 x 3 | 1 | 1 | 112 x 112 x 128 | ReLU |
| Max Pool 2 | - | 2 x 2 | 2 | - | 56 x 56 x 128 | - |
| Convolution 5 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Convolution 6 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Max Pool 3 | - | 2 x 2 | 2 | - | 28 x 28 x 256 | - |
| Convolution 7 | 512 | 3 x 3 | 1 | 1 | 28 x 28 x 512 | ReLU |
| Convolution 8 | 512 | 3 x 3 | 1 | 1 | 28 x 28 x 512 | ReLU |
| Max Pool 4 | - | 2 x 2 | 2 | - | 14 x 14 x 512 | - |
| Convolution 9 | 512 | 3 x 3 | 1 | 1 | 14 x 14 x 512 | ReLU |
| Convolution 10 | 512 | 3 x 3 | 1 | 1 | 14 x 14 x 512 | ReLU |
| Max Pool 5 | - | 2 x 2 | 2 | - | 7 x 7 x 512 | - |
| Dense(FC) 1 | 4096 | - | - | - | 1 x 4096 | - |
| Dense(FC) 2 | 4096 | - | - | - | 1 x 4096 | - |
| Softmax | 1000 | - | - | - | 1 x 1000 | - |
VGGNet-C
| Layer | # Filters / neurons | Filter Size | Stride | Padding | Size of Feature Map | Activation Function |
|---|---|---|---|---|---|---|
| Input | - | - | - | - | 224 x 224 x 3 | - |
| Convolution 1 | 64 | 3 x 3 | 1 | 1 | 224 x 224 x 64 | ReLU |
| Convolution 2 | 64 | 3 x 3 | 1 | 1 | 224 x 224 x 64 | ReLU |
| Max Pool 1 | - | 2 x 2 | 2 | - | 112 x 112 x 64 | - |
| Convolution 3 | 128 | 3 x 3 | 1 | 1 | 112 x 112 x 128 | ReLU |
| Convolution 4 | 128 | 3 x 3 | 1 | 1 | 112 x 112 x 128 | ReLU |
| Max Pool 2 | - | 2 x 2 | 2 | - | 56 x 56 x 128 | - |
| Convolution 5 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Convolution 6 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Convolution 7 | 256 | 1 x 1 | 1 | 1 | 58 x 58 x 256 | ReLU |
| Max Pool 3 | - | 2 x 2 | 2 | - | 29 x 29 x 256 | - |
| Convolution 8 | 512 | 3 x 3 | 1 | 1 | 29 x 29 x 512 | ReLU |
| Convolution 9 | 512 | 3 x 3 | 1 | 1 | 29 x 29 x 512 | ReLU |
| Convolution 10 | 512 | 1 x 1 | 1 | 1 | 31 x 31 x 512 | ReLU |
| Max Pool 4 | - | 2 x 2 | 2 | - | 15 x 15 x 512 | - |
| Convolution 11 | 512 | 3 x 3 | 1 | 1 | 15 x 15 x 512 | ReLU |
| Convolution 12 | 512 | 3 x 3 | 1 | 1 | 15 x 15 x 512 | ReLU |
| Convolution 13 | 512 | 1 x 1 | 1 | 1 | 17 x 17 x 512 | ReLU |
| Max Pool 5 | - | 2 x 2 | 2 | - | 8 x 8 x 512 | - |
| Dense(FC) 1 | 4096 | - | - | - | 1 x 4096 | - |
| Dense(FC) 2 | 4096 | - | - | - | 1 x 4096 | - |
| Softmax | 1000 | - | - | - | 1 x 1000 | - |
VGGNet-D
| Layer | # Filters / neurons | Filter Size | Stride | Padding | Size of Feature Map | Activation Function |
|---|---|---|---|---|---|---|
| Input | - | - | - | - | 224 x 224 x 3 | - |
| Convolution 1 | 64 | 3 x 3 | 1 | 1 | 224 x 224 x 64 | ReLU |
| Convolution 2 | 64 | 3 x 3 | 1 | 1 | 224 x 224 x 64 | ReLU |
| Max Pool 1 | - | 2 x 2 | 2 | - | 112 x 112 x 64 | - |
| Convolution 3 | 128 | 3 x 3 | 1 | 1 | 112 x 112 x 128 | ReLU |
| Convolution 4 | 128 | 3 x 3 | 1 | 1 | 112 x 112 x 128 | ReLU |
| Max Pool 2 | - | 2 x 2 | 2 | - | 56 x 56 x 128 | - |
| Convolution 5 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Convolution 6 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Convolution 7 | 256 | 3 x 3 | 1 | 1 | 56 x 56 x 256 | ReLU |
| Max Pool 3 | - | 2 x 2 | 2 | - | 28 x 28 x 256 | - |
| Convolution 8 | 512 | 3 x 3 | 1 | 1 | 28 x 28 x 512 | ReLU |
| Convolution 9 | 512 | 3 x 3 | 1 | 1 | 28 x 28 x 512 | ReLU |
| Convolution 10 | 512 | 3 x 3 | 1 | 1 | 28 x 28 x 512 | ReLU |
| Max Pool 4 | - | 2 x 2 | 2 | - | 14 x 14 x 512 | - |
| Convolution 11 | 512 | 3 x 3 | 1 | 1 | 14 x 14 x 512 | ReLU |
| Convolution 12 | 512 | 3 x 3 | 1 | 1 | 14 x 14 x 512 | ReLU |
| Convolution 13 | 512 | 3 x 3 | 1 | 1 | 14 x 14 x 512 | ReLU |
| Max Pool 5 | - | 2 x 2 | 2 | - | 7 x 7 x 512 | - |
| Dense(FC) 1 | 4096 | - | - | - | 1 x 4096 | - |
| Dense(FC) 2 | 4096 | - | - | - | 1 x 4096 | - |
| Softmax | 1000 | - | - | - | 1 x 1000 | - |
VGGNet-E