[9주차 - Day4] Deep Learning 기초

교육/프로그래머스 인공지능 데브코스 2021. 7. 3. 15:46

728x90

Deep Learning: 신경망의 기초 - 심층학습기초 II

3. 컨볼루션(합성곱) 신경망(CNN, Convolutional Neural Network)

컴퓨터 비전에 많이 사용됨 ex)분류(classification), 검색(retrieval), 검출(detection), 분할(segmentation)

컴퓨터 비전의 어려운 점

관점의 변화 : 동일한 객체라도 영상을 찍는 카메라의 이동에 따라 모든 픽셀값이 변화됨
경계색(보호색)으로 배경과 구분이 어려운 경우
조명에 따른 변화
기형적인 형태의 영상 존재
일부가 가려진 영상 존재
같은 종류 간 변화가 큼

DMLP와 CNN의 비교

DMLP
- 완전 연결(fully connection)구조로 복잡도가 높음->학습이 매우 느리고 과잉적합될 가능성이 높음
CNN
- 컨볼루션 연산을 이용한 부분 연결(희소 연결)구조로 복잡도를 낮추고 좋은 특징 추출
- 격자(grid)구조(영상, 음성 등)를 갖는 데이터에 적합
- 수용장(receptive field)은 인간시각과 유사
- 가변 크기의 입력 처리 가능

CNN의 층

CONV
- CNN 학습에 의해 결정된 복수의 커널들(혹은 필터들)에 대응되는 특징들을 추출하는 층
- 각층의 입출력의 특징형상 유지(특징맵)
- 영상의 공간 정보를 유지하면서 인접한 정보의 특징을 효과적으로 인식
- 각 커널(필터)은 파라미터를 공유하면서 학습 파라미터가 적음
POOL
- 추출된 영상의 특징을 요약하고 강화

가변 크기의 데이터 다루기에 용이

->완전연결신경망은 특징 벡터의 크기가 달라지면 연산 불가능, CNN은 컨볼루션 층에서 보폭을 조정하거나 풀링층에서 커널이나 보폭을 조정하여 특징 맵 크기 조절

3.1. 컨볼루션층(CONV)

선형함수인 컨볼루션과 비선형함수인 활성함수 조합

컨볼루션(합성곱, Convolution)연산

해당하는 요소끼리 곱하고 결과를 모두 더하는 선형 연산

영상에서 특징을 추출하기 위한 용도로 사용됨(=공간 필터, spatial filtering)

필터를 학습에 의해 결정!

덧대기(padding)

가장자리에서 영상의 크기가 줄어드는 효과방지(각 층의 입출력의 특징 현상 유지)

편향(bias)추가

가중치 공유(weigth sharing or parameter sharing(묶인 가중치))

모든 노드가 동일한 커널(kernel)사용하므로(->가중치 공유) 매개변수(paramter)가 적어짐

->모델의 복잡도가 크게 낮아짐

다중 특징 맵 추출

커널의 값에 따라 커널이 추출하는 특징이 달라짐

하나의 커널만 사용하면 너무 빈약한 특징이 추출되므로 여러 개의 커널을 사용하여 다양한 특징맵 추출

특징학습

커널을 사람이 설계하지 않고 학습으로 찾음!오류 역전파로 커널 학습

ex)2차원 영상이 7*7커널을 64개 사용한다면 (7*7+1(bias))*64=3200개의 매개변수를 찾아야함

컨볼루션 연산에 따른 CNN의 특성

이동에 동변(신호가 이동하면 이동 정보가 그대로 특징 맵에 반영)
- 영상 인식에서 물체의 이동이나 음성 인식에서 발음 지연에 효과적으로 대처
병렬 분산 구조
- 각 노드는 독립적으로 계산가능->병렬 구조
- 노드는 깊은 층을 거치면서 전체에 영향을 미침->분산 구조
큰 보폭(stride)에 의한 다운 샘플링(down-sampling)
텐서 적용
- 3차원 이상의 구조에도 적용 가능
- 3차원 ex)RGB컬러영상, 4차원 ex)컬러 동영상(3*s*m*n), MRI 뇌영상(l*s*m*n)

3.2. 풀링층(POOL)

컨볼루션의 얻어진 특징을 통계적으로 압축

풀링(pooling)연산

최대 풀링, 평균 풀링, 가중치 평균 풀링등

보폭을 크게 하면 다운 샘플링(down sampling)효과

특징

풀링을 통해 요약 혹은 통계적 대표성을 추출할 수 있음
매개변수없음
연산 효율화
특징 맵의 수를 그대로 유지
작은 변화에 둔감하기 때문에 물체 인식이나 영상 검색 등에 효과적

3.3. 전체구조

빌딩 블록(building block)

컨볼루션->활성함수(주로 ReLU)->풀링층

CNN은 빌딩 블록을 이어 깊은 구조로 확장

입력데이터->커널1,2,...,커널k->다중특징맵-(활성함수)->다중특징맵-(풀링)->다중특징맵

컨볼루션 층의 출력 크기와 매개변수 수

입력 : w1*h1*d1

k개 f*f커널, 보폭 s, 덧대기 p

출력의 크기 : w2*h2*d2 (w2=(w1-f+2p)/s+1, h2=(h1-f+2)/s+1, d2=k)

매개변수 수 : 커널마다 f*f*d1개의 가중치+1개의 바이어스->전체 매개변수 수=(f*f*d1)k+k

LeNet

초창기 CNN사례

CONV-POOL(average pooling)-CONV-POOL(average pooling)-CONV의 다섯 층을 통해 28*28 명암 영상을 120차원의 특징 벡터로 변환

은닉층이 하나인 MLP로 분류

Deep Learning: 신경망의 기초 - 실습V PT-TF CNN

VGGNet : 모든 층에서 3*3의 필터 사용
종류 : vgg16,vgg19

1. pytorch

-VGGNet 모델코드

import torch
import torch.nn as nn
from .utils import load_state_dict_from_url
from typing import Union, List, Dict, Any, cast

__all__ = [
    'VGG', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn',
    'vgg19_bn', 'vgg19',
]

model_urls = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',
    'vgg11_bn': 'https://download.pytorch.org/models/vgg11_bn-6002323d.pth',
    'vgg13_bn': 'https://download.pytorch.org/models/vgg13_bn-abd245e5.pth',
    'vgg16_bn': 'https://download.pytorch.org/models/vgg16_bn-6c64b313.pth',
    'vgg19_bn': 'https://download.pytorch.org/models/vgg19_bn-c79401a0.pth',
}


class VGG(nn.Module):

    def __init__(
        self,
        features: nn.Module,
        num_classes: int = 1000,
        init_weights: bool = True
    ) -> None:
        super(VGG, self).__init__()
        self.features = features
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        #vggNet의 classifier는 fc layer3개로 이루어져있음
        self.classifier = nn.Sequential(
        	#nn.Linear(input dimension, output dimension)->fully connected layer를 쌓기 위한 함수
            #---첫번째 fc layer---
            nn.Linear(512 * 7 * 7, 4096),
            #input에 대해 0보다 크면 값을 유지 0보다 작거나 같으면 0으로 바꿔줌
            nn.ReLU(True),
            nn.Dropout(),
            #---두번째 fc layer---
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            #---세번째 fc layer---
            nn.Linear(4096, num_classes),
            #num_classes에 대한 각각의 prediction score출력
        )
        if init_weights:
            self._initialize_weights()
    
    #x : 입력 이미지
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)	#feature(=make_layers()로 쌓은 layer들(conv+pooling+ReLU))를 통과하면서 특징추출
        x = self.avgpool(x)
        # batch size * channel * height * width
        # 128 , (512 * 7 * 7)  
        x = torch.flatten(x, 1)	#격자구조를 띄고 있는 이미지를 분류층에 들어가기 전에 한줄로 펼쳐줌
        x = self.classifier(x)	#분류
        return x

    def _initialize_weights(self) -> None:
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

#'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],   
def make_layers(cfg: List[Union[str, int]], batch_norm: bool = False) -> nn.Sequential:
    layers: List[nn.Module] = []
    in_channels = 3	#기본적으로 RGB채널을 이용하기 때문에
    for v in cfg:	#cfs['D']가 들어왔다고 가정하면 v : 64->64->M(max pooling)->128->128...
        #'M'이면 maxpool 층 추가 
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        #그 이외에는 
        else:
            v = cast(int, v)
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            #batch norm을 사용한다면 relu전에 batchnorm layer 추가 
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
            #batch norm을 사용하지 않는다면 relu만 추가 
                layers += [conv2d, nn.ReLU(inplace=True)]
            #in_channels를 현재 channel 값으로 바꿔줌
            in_channels = v
    return nn.Sequential(*layers)


cfgs: Dict[str, List[Union[str, int]]] = {
    'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    #conv 2개->max pooling->conv 2개->max pooling->conv 3개->max pooling->conv 3개->max pooling->conv 3개->max pooling
    #숫자 : 각 conv층의 output채널의 개수
    'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}

def _vgg(arch: str, cfg: str, batch_norm: bool, pretrained: bool, progress: bool, **kwargs: Any) -> VGG:
    if pretrained:
        kwargs['init_weights'] = False
    #
    # _vgg('vgg16', 'D', False, pretrained, progress, **kwargs)
    #'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    model = VGG(make_layers(cfgs[cfg], batch_norm=batch_norm), **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls[arch],
                                              progress=progress)
        model.load_state_dict(state_dict)
    return model


def vgg11(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 11-layer model (configuration "A") from
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg11', 'A', False, pretrained, progress, **kwargs)


def vgg11_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 11-layer model (configuration "A") with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg11_bn', 'A', True, pretrained, progress, **kwargs)


def vgg13(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 13-layer model (configuration "B")
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg13', 'B', False, pretrained, progress, **kwargs)


def vgg13_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 13-layer model (configuration "B") with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg13_bn', 'B', True, pretrained, progress, **kwargs)

#실제로 models에서 호출하는 함수
def vgg16(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 16-layer model (configuration "D")
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    #여기서 시작 
    return _vgg('vgg16', 'D', False, pretrained, progress, **kwargs)


def vgg16_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 16-layer model (configuration "D") with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg16_bn', 'D', True, pretrained, progress, **kwargs)


def vgg19(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 19-layer model (configuration "E")
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg19', 'E', False, pretrained, progress, **kwargs)


def vgg19_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 19-layer model (configuration 'E') with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg19_bn', 'E', True, pretrained, progress, **kwargs)

-VGGNet 학습코드

#import library
from __future__ import print_function, division

import torch
import torch.nn as nn
from torch.optim import lr_scheduler
import torch.optim as optim
import torch.backends.cudnn as cudnn
import numpy as np
import torchvision
from torchvision import models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy

plt.ion()   # interactive mode

#Batch size : 전체 훈련 데이터를 모두 로드할 수 없기 때문에 나눠서 가져오는 단위
batch_size=128
#Epoch : 모델이 반복학습을 한다는 개념
#ex)애기가 사과를 배울때 부모님이 100장의 사과사진을 보여주면서 가르침->2번, 3번,...n번 반복
num_epochs=3
#learning rate : 학습률이 높을수록 빠르게 학습할 수 있지만 local optima에 빠질 위험이 높아짐
#학습률이 낮으면 느리게 학습하지만 정확한 최적값을 향해 갈 수 있음
learning_rate=0.001

#transform : 입력이 모델에 들어가기전 전처리하는 과정
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

train_set=torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=data_transforms['train'])
test_set =torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=data_transforms['val'])

dataloaders=dict()
dataloaders['train']= torch.utils.data.DataLoader(dataset=train_set, batch_size=batch_size, shuffle=True)
dataloaders['val']= torch.utils.data.DataLoader(dataset=test_set, batch_size=batch_size, shuffle=False)

dataset_sizes = {x: len(dataloaders[x].dataset) for x in ['train', 'val']}

print("train 개수",dataset_sizes['train'])
print("test 개수",dataset_sizes['val'])

class_names = train_set.classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

print("class_names:",class_names)
print(device)

def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))
#batch가 너무 크면 다 안보이니 3개만 가져오기
inputs_=inputs[:3]
classes_=classes[:3]

# Make a grid from batch
out = torchvision.utils.make_grid(inputs_)

imshow(out, title=[class_names[x] for x in classes_])

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    train_loss_list=[]
    val_acc_list=[]

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0
            iteration_count=0
            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                iteration_count+=len(inputs)
                print('Iteration {}/{}'.format(iteration_count,dataset_sizes[phase]))
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))
            
            if phase=="train":
              train_loss_list.append(epoch_loss)
            elif phase=="val":
              val_acc_list.append(epoch_acc)


            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model,train_loss_list,val_acc_list

def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title('predicted: {}'.format(class_names[preds[j]]))
                imshow(inputs.cpu().data[j])

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)

model_ft = models.vgg16(pretrained=True)	#model에 vgg16
num_ftrs = model_ft.classifier[6].in_features
model_ft.classifier[6] = nn.Linear(num_ftrs, len(class_names))
model_ft = model_ft.to(device)	#device에 모델 올리기
print(model_ft)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
#SGD
optimizer_ft = optim.SGD(model_ft.parameters(), lr=learning_rate, momentum=0.9)
#Adam
# optimizer_ft = optim.Adam(model_ft.parameters(), lr=learning_rate)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

model_ft,train_loss_list,val_acc_list = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=num_epochs)

#plot train loss 
x=[i for i in range(0,num_epochs)]
plt.title("Train Loss")
plt.xticks(x)
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.plot(x,train_loss_list)
plt.show()

#plot test acc
x=[i for i in range(0,num_epochs)]
plt.title("Test Accuracy")
plt.xticks(x)
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.plot(x,val_acc_list)
plt.show()

visualize_model(model_ft)

1. tensorflow

-VGGNet 모델코드

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# pylint: disable=invalid-name
"""VGG16 model for Keras.
Reference:
  - [Very Deep Convolutional Networks for Large-Scale Image Recognition]
    (https://arxiv.org/abs/1409.1556) (ICLR 2015)
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.python.keras import backend
from tensorflow.python.keras.applications import imagenet_utils
from tensorflow.python.keras.engine import training
from tensorflow.python.keras.layers import VersionAwareLayers
from tensorflow.python.keras.utils import data_utils
from tensorflow.python.keras.utils import layer_utils
from tensorflow.python.lib.io import file_io
from tensorflow.python.util.tf_export import keras_export


WEIGHTS_PATH = ('https://storage.googleapis.com/tensorflow/keras-applications/'
                'vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5')
WEIGHTS_PATH_NO_TOP = ('https://storage.googleapis.com/tensorflow/'
                       'keras-applications/vgg16/'
                       'vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5')

layers = VersionAwareLayers()

#keras.applications.VGG16를 사용하면 호출되는 함수
@keras_export('keras.applications.vgg16.VGG16', 'keras.applications.VGG16')
def VGG16(
    include_top=True, #분류 층 포함할 지 여부 
    weights='imagenet',
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation='softmax'):
  """Instantiates the VGG16 model.
  Reference:
  - [Very Deep Convolutional Networks for Large-Scale Image Recognition](
  https://arxiv.org/abs/1409.1556) (ICLR 2015)
  By default, it loads weights pre-trained on ImageNet. Check 'weights' for
  other options.
  This model can be built both with 'channels_first' data format
  (channels, height, width) or 'channels_last' data format
  (height, width, channels).
  The default input size for this model is 224x224.
  Note: each Keras Application expects a specific kind of input preprocessing.
  For VGG16, call `tf.keras.applications.vgg16.preprocess_input` on your
  inputs before passing them to the model.
  Arguments:
      include_top: whether to include the 3 fully-connected
          layers at the top of the network.
      weights: one of `None` (random initialization),
            'imagenet' (pre-training on ImageNet),
            or the path to the weights file to be loaded.
      input_tensor: optional Keras tensor
          (i.e. output of `layers.Input()`)
          to use as image input for the model.
      input_shape: optional shape tuple, only to be specified
          if `include_top` is False (otherwise the input shape
          has to be `(224, 224, 3)`
          (with `channels_last` data format)
          or `(3, 224, 224)` (with `channels_first` data format).
          It should have exactly 3 input channels,
          and width and height should be no smaller than 32.
          E.g. `(200, 200, 3)` would be one valid value.
      pooling: Optional pooling mode for feature extraction
          when `include_top` is `False`.
          - `None` means that the output of the model will be
              the 4D tensor output of the
              last convolutional block.
          - `avg` means that global average pooling
              will be applied to the output of the
              last convolutional block, and thus
              the output of the model will be a 2D tensor.
          - `max` means that global max pooling will
              be applied.
      classes: optional number of classes to classify images
          into, only to be specified if `include_top` is True, and
          if no `weights` argument is specified.
      classifier_activation: A `str` or callable. The activation function to use
          on the "top" layer. Ignored unless `include_top=True`. Set
          `classifier_activation=None` to return the logits of the "top" layer.
  Returns:
    A `keras.Model` instance.
  Raises:
    ValueError: in case of invalid argument for `weights`,
      or invalid input shape.
    ValueError: if `classifier_activation` is not `softmax` or `None` when
      using a pretrained top layer.
  """
  if not (weights in {'imagenet', None} or file_io.file_exists_v2(weights)):
    raise ValueError('The `weights` argument should be either '
                     '`None` (random initialization), `imagenet` '
                     '(pre-training on ImageNet), '
                     'or the path to the weights file to be loaded.')

  if weights == 'imagenet' and include_top and classes != 1000:
    raise ValueError('If using `weights` as `"imagenet"` with `include_top`'
                     ' as true, `classes` should be 1000')
  # Determine proper input shape
  input_shape = imagenet_utils.obtain_input_shape(
      input_shape,
      default_size=224,
      min_size=32,
      data_format=backend.image_data_format(),
      require_flatten=include_top,
      weights=weights)

  if input_tensor is None:
    img_input = layers.Input(shape=input_shape)
  else:
    if not backend.is_keras_tensor(input_tensor):
      img_input = layers.Input(tensor=input_tensor, shape=input_shape)
    else:
      img_input = input_tensor

  #특징추출코드 시작!
  # Block 1
  #layers.Conv2D(filters,kernel_size)
  #channels : 64, kernel_size : (3,3), activation function = ReLU, padding='same'(padding 사용), name : 해당 layer의 이름 설정
  x = layers.Conv2D(
      64, (3, 3), activation='relu', padding='same', name='block1_conv1')(
          img_input)
  x = layers.Conv2D(
      64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
  #pool_size : (2,2)
  x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

  # Block 2
  x = layers.Conv2D(
      128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
  x = layers.Conv2D(
      128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
  x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

  # Block 3
  x = layers.Conv2D(
      256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
  x = layers.Conv2D(
      256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
  x = layers.Conv2D(
      256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
  x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

  # Block 4
  x = layers.Conv2D(
      512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
  x = layers.Conv2D(
      512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
  x = layers.Conv2D(
      512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
  x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

  # Block 5
  x = layers.Conv2D(
      512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
  x = layers.Conv2D(
      512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
  x = layers.Conv2D(
      512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
  x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

  #분류 층을 포함한다면
  if include_top:
    # Classification block
    #Flatten
    x = layers.Flatten(name='flatten')(x)
    #Fully connected layer 1
    x = layers.Dense(4096, activation='relu', name='fc1')(x)
    #Fully connected layer 2
    x = layers.Dense(4096, activation='relu', name='fc2')(x)

    imagenet_utils.validate_activation(classifier_activation, weights)
    #최종 분류층 (activation 보통 softmax로 사용)
    x = layers.Dense(classes, activation=classifier_activation,
                     name='predictions')(x)
  else:
    if pooling == 'avg':
      x = layers.GlobalAveragePooling2D()(x)
    elif pooling == 'max':
      x = layers.GlobalMaxPooling2D()(x)

  # Ensure that the model takes into account
  # any potential predecessors of `input_tensor`.
  if input_tensor is not None:
    inputs = layer_utils.get_source_inputs(input_tensor)
  else:
    inputs = img_input
  # Create model.
  model = training.Model(inputs, x, name='vgg16')

  # Load weights.
  #ImageNet weight 사용할 경우 
  if weights == 'imagenet':
    #분류 층 포함 인 경우 
    if include_top:
      weights_path = data_utils.get_file(
          'vgg16_weights_tf_dim_ordering_tf_kernels.h5',
          WEIGHTS_PATH,
          cache_subdir='models',
          file_hash='64373286793e3c8b2b4e3219cbf3544b')
    #특징 분류 부분만 
    else:
      weights_path = data_utils.get_file(
          'vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5',
          WEIGHTS_PATH_NO_TOP,
          cache_subdir='models',
          file_hash='6d6bbae143d832006294945121d1f1fc')
    #모델에 가중치 로드
    model.load_weights(weights_path)
  elif weights is not None:
    model.load_weights(weights)

  return model


@keras_export('keras.applications.vgg16.preprocess_input')
def preprocess_input(x, data_format=None):
  return imagenet_utils.preprocess_input(
      x, data_format=data_format, mode='caffe')


@keras_export('keras.applications.vgg16.decode_predictions')
def decode_predictions(preds, top=5):
  return imagenet_utils.decode_predictions(preds, top=top)


preprocess_input.__doc__ = imagenet_utils.PREPROCESS_INPUT_DOC.format(
    mode='',
    ret=imagenet_utils.PREPROCESS_INPUT_RET_DOC_CAFFE,
    error=imagenet_utils.PREPROCESS_INPUT_ERROR_DOC)
decode_predictions.__doc__ = imagenet_utils.decode_predictions.__doc__

-VGGNet 훈련코드

from __future__ import absolute_import, division, print_function, unicode_literals
import os
import numpy as np
import matplotlib.pyplot as plt

try:
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

keras = tf.keras

print("tensorflow version",tf.__version__)

IMG_SIZE = 224 # 모든 이미지는 224x224으로 크기 조정 
EPOCHS = 3
BATCH_SIZE=128
learning_rate = 0.0001

#데이터 세트 다운로드 및 탐색
from keras.datasets import cifar10
from keras.utils import np_utils
import tensorflow_datasets as tfds

tfds.disable_progress_bar()

#분류할 클래스 개수 
num_classes=10 # Cifar10의 클래스 개수

(raw_train, raw_validation, raw_test), metadata = tfds.load(
    'cifar10',
    split=['train[:90%]', 'train[90%:]', 'test'],
    with_info=True,
    as_supervised=True,
)

print("Train data 개수:",len(raw_train))
print("Val data 개수:",len(raw_validation))
print("Test data 개수:",len(raw_test))

#데이터 정규화(tf.image 모듈을 사용하여 이미지를 정규화)
def format_example(image, label):
  image = tf.cast(image, tf.float32)
  image = (image/127.5) - 1
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label

# #map 함수를 사용하여 데이터셋의 각 항목에 데이터 포맷 함수를 적용
train = raw_train.map(format_example)
validation = raw_validation.map(format_example)
test = raw_test.map(format_example)

#데이터 세트 만들기
SHUFFLE_BUFFER_SIZE = 1000
train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)

#데이터 검사하기
#데이터 가시화
get_label_name = metadata.features['label'].int2str

for image, label in raw_train.take(2):
  plt.figure()
  plt.imshow(image)
  plt.title(get_label_name(label))

#사용할 CNN 모델 불러오기
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)

#CNN 모델 변경하려면 여기서 변경
#ImageNet으로 사전 훈련된 모델 불러오기 
base_model = tf.keras.applications.VGG16(input_shape=IMG_SHAPE,
                                               include_top=True,
                                                classes=1000,
                                               weights='imagenet')

#불러온 모델에서 데이터 셋의 클래스 수에 맞게 최종 분류층 교체
model = tf.keras.Sequential()
for layer in base_model.layers[:-1]: # go through until last layer
    model.add(layer)
#마지막 layer의 최종 분류 개수를 클래스 개수와 맞게 설정
model.add(keras.layers.Dense(num_classes, activation='softmax',name='predictions'))

#모델 아키텍처 살펴보기
model.summary()

Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 _________________________________________________________________ dense (Dense) (None, 10) 40970 ================================================================= Total params: 134,301,514 Trainable params: 134,301,514 Non-trainable params: 0 _________________________________________________________________

#모델 컴파일
model.compile(optimizer=tf.keras.optimizers.Adam(lr=learning_rate),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

#모델 훈련
history = model.fit(train_batches,
                    epochs=EPOCHS,
                    validation_data=validation_batches,
                    batch_size=BATCH_SIZE)

#학습 곡선 그리기
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.ylabel('Accuracy')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

#Test Set으로 학습된 모델 테스트
loss_and_metrics = model.evaluate(test_batches, batch_size=64)
print("테스트 성능 : {}%".format(round(loss_and_metrics[1]*100,4)))

79/79 [==============================] - 40s 507ms/step - loss: 0.3510 - accuracy: 0.8873 테스트 성능 : 88.73%

저작자표시 (새창열림)

'교육 > 프로그래머스 인공지능 데브코스' 카테고리의 다른 글

[10주차 - Day2] CNN Models (0)	2021.07.17
[10주차 - Day1] Deep Learning 기초 (0)	2021.07.07
[9주차 - Day3] Deep Learning 기초 (0)	2021.07.01
[9주차 - Day2] Multilayer Perceptron (0)	2021.06.24
[9주차 - Day1] 신경망 기초 (0)	2021.06.23

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인기포스트

ABOUT ME

공부해라 공부 공부해라 공부

Deep Learning: 신경망의 기초 - 심층학습기초 II

3. 컨볼루션(합성곱) 신경망(CNN, Convolutional Neural Network)

3.1. 컨볼루션층(CONV)

3.2. 풀링층(POOL)

3.3. 전체구조

Deep Learning: 신경망의 기초 - 실습V PT-TF CNN

'교육 > 프로그래머스 인공지능 데브코스' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

인기포스트

ABOUT ME

Deep Learning: 신경망의 기초 - 심층학습기초 II

3. 컨볼루션(합성곱) 신경망(CNN, Convolutional Neural Network)

3.1. 컨볼루션층(CONV)

3.2. 풀링층(POOL)

3.3. 전체구조

Deep Learning: 신경망의 기초 - 실습V PT-TF CNN

'교육 > 프로그래머스 인공지능 데브코스' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역