# Deep Learning Terminology
## Computer Vision
- Image Classification - Tell what is in the image
- multi-class classification
- Binary classification is a sub type
- multi-label classification - more than one class output
- Object Localization - draw a box around the detected object
- [[Object Detection]] - Classify object(s) and localize them at the same time
- Popular models R-CNN, Faster-RCNN, YOLO, SSD, RetinaNet
- [[Image Segmentation]] - Identify the pixels that make up the object
- Class Activation Map
- Salience
-
## Natural Language Processing
- Machine Translation
- Q&A
- Chatbot
- Text Summarization
- Masked Language Modeling (Mask LM) - randomly hide a few words from the sentence and train a network to fill in the blanks
- Next Sentence Prediction (NSP) - Tell if one sentence will follow another. Mask LM and NSP are techniques used to pretrain BERT
## Network Types
- U-Net CNN for Biomedical Image Segmentation https://arxiv.org/pdf/1505.04597.pdf
![[Pasted image 20210213142754.png]]
- ResNet
- YOLO - you only look once
- SSD - Single shot detector
- DCNN - Deep convolutional neural network
### NLP
- ELMo (Embeddings from Language Models) - Created embeddings that preserved context. Same word can have different embeddings depending on its context. It had 2 bidirectional RNNs.
- Downsides - very slow to train
![[Pasted image 20210213163658.png]]
- Transformer
It has an encoder and decoder structure for language translation. Encoder take a bunch of words and spits out encoding for each word (simultaneously). Decoder takes the encodings and initial seed words and predicts the next possible word (in the second language) until end of sentence is reached.
![[Pasted image 20210213164716.png]]
- GPT - Got rid of the encoder in a transformer and stack the decoders
- Masked Attention Unit - important part of GPT. This allows decoder to tell how much attention to pay to each word of input sentence for generating the next word.
- BERT - Bidirectional Encoder Representation from Transformers