CS771A Project Presentation
Machine Learning - Tools, Techniques and Applications
Group 43
Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain
IIT KANPUR
India
10-4-2016
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 1 / 18
Table of Contents
1 Introduction
2 Methodology
Feature Extraction
Classification
Object localization
3 Experiments and Results
4 Scope of Improvements
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 2 / 18
Outline
1 Introduction
2 Methodology
Feature Extraction
Classification
Object localization
3 Experiments and Results
4 Scope of Improvements
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 3 / 18
Introduction
Aim Detection and classification of objects in surveillance
video into various categories.
Categories - Car, Person, Motorcycle, Bicycle,
Rickshaw, Autorickshaw
Figure: Detection and Classification of objects
Motivation - Recognition, Tracking, Security, etc.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 4 / 18
Outline
1 Introduction
2 Methodology
Feature Extraction
Classification
Object localization
3 Experiments and Results
4 Scope of Improvements
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 5 / 18
Outline
1 Introduction
2 Methodology
Feature Extraction
Classification
Object localization
3 Experiments and Results
4 Scope of Improvements
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 6 / 18
Feature Extraction - Dataset
To obtain the image dataset required for classification, we extracted
frames from the provided video data.
Frames were taken after every 20 frames of each video.
The images in the dataset correspond to the labelled bounding boxes
from these frames.
Figure: Red Boxes Images used as dataset
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 7 / 18
Feature Extraction - Feature Vector
Bag of visual words[1]
Computed ∼ 300 SIFT points per image in the dataset.
Used mini-batch K-means - 700 clusters of siftpoints.
Voting of siftpoints of each image in clusters give vector of dimension
700.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 8 / 18
Feature Extraction - Feature Vector
Bag of visual words[1]
Computed ∼ 300 SIFT points per image in the dataset.
Used mini-batch K-means - 700 clusters of siftpoints.
Voting of siftpoints of each image in clusters give vector of dimension
700.
HOG features[2]
Images in the dataset are resized to 128x128.
Obtained HOG feature vector of dimension 15876 for each image.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 8 / 18
Feature Extraction - Feature Vector
Bag of visual words[1]
Computed ∼ 300 SIFT points per image in the dataset.
Used mini-batch K-means - 700 clusters of siftpoints.
Voting of siftpoints of each image in clusters give vector of dimension
700.
HOG features[2]
Images in the dataset are resized to 128x128.
Obtained HOG feature vector of dimension 15876 for each image.
Convolution Neural Network features
Used VGG 16 Model[3].
Output of FC-4096 layer as feature vector.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 8 / 18
Outline
1 Introduction
2 Methodology
Feature Extraction
Classification
Object localization
3 Experiments and Results
4 Scope of Improvements
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 9 / 18
Classification - Classifiers Used
After obtaing feature vectors of images in the dataset, classifiers are
trained given the category of each image.
Classifiers Used:
Support Vector Machines
Linear Kernel
Gaussian Kernel
Random Forest Classifier
Decision Tree
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 10 / 18
Outline
1 Introduction
2 Methodology
Feature Extraction
Classification
Object localization
3 Experiments and Results
4 Scope of Improvements
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 11 / 18
Object localization
Objects of interest are extracted from the test videos. Three techniques
are implemented for the task.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
Object localization
Objects of interest are extracted from the test videos. Three techniques
are implemented for the task.
Using Background Subtraction
Using Mixture Of Gaussions(MOG)
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
Object localization
Objects of interest are extracted from the test videos. Three techniques
are implemented for the task.
Using Background Subtraction
Using Mixture Of Gaussions(MOG)
Measuring Optical Flow
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
Object localization
Objects of interest are extracted from the test videos. Three techniques
are implemented for the task.
Using Background Subtraction
Using Mixture Of Gaussions(MOG)
Measuring Optical Flow
Sliding Window Method
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
Object localization
Objects of interest are extracted from the test videos. Three techniques
are implemented for the task.
Using Background Subtraction
Using Mixture Of Gaussions(MOG)
Measuring Optical Flow
Sliding Window Method
Figure: Contours are drawn on right image
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
Object localization
Objects of interest are extracted from the test videos. Three techniques
are implemented for the task.
Using Background Subtraction
Using Mixture Of Gaussions(MOG)
Measuring Optical Flow
Sliding Window Method
Figure: Contours are drawn on right image
Then, Learned Classifier is used to classify extracted image.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
Outline
1 Introduction
2 Methodology
Feature Extraction
Classification
Object localization
3 Experiments and Results
4 Scope of Improvements
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 13 / 18
Experiments and Results
Dataset for classifier learning
Labelled dataset was noisy - separated good images for dataset
creation.
∼ 1600 training images and ∼ 450 test images.
Around 300 training images of each class used for training.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 14 / 18
Experiments and Results
Dataset for classifier learning
Labelled dataset was noisy - separated good images for dataset
creation.
∼ 1600 training images and ∼ 450 test images.
Around 300 training images of each class used for training.
Accuracy Obtained
Classification Accuracies Obtained
Classifier/features BOVW HOG CNN
SVM(linear) 0.8535 0.9129 0.9873
SVM(gaussian) 0.879 0.756 0.9809
Random Forest 0.8322 0.9023 0.9745
Decision Tree 0.6475 0.6072 0.8917
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 14 / 18
Outline
1 Introduction
2 Methodology
Feature Extraction
Classification
Object localization
3 Experiments and Results
4 Scope of Improvements
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 15 / 18
Scope of Improvements
Dataset Dataset and Labels are noisy, improvement of dataset are
likely to give improve results.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 16 / 18
Scope of Improvements
Dataset Dataset and Labels are noisy, improvement of dataset are
likely to give improve results.
Localization BGS, Optical flow, Sliding window do not work good for
localization. Learning method can be used for localization.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 16 / 18
Scope of Improvements
Dataset Dataset and Labels are noisy, improvement of dataset are
likely to give improve results.
Localization BGS, Optical flow, Sliding window do not work good for
localization. Learning method can be used for localization.
CNN Though the accuracy of classifier with CNN features is very
good, it require 1-2 second each frame for the classification.
For real time classification, Fast R-CNN[4] can be used.
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 16 / 18
References I
Object recognition from local scale-invariant features
Lowe, David G. (1999).
Proceedings of the International Conference on Computer Vision. pp.
11501157
Histograms of oriented gradients for human detection
Dalal, Navneet, and Bill Triggs.
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE
Computer Society Conference on. Vol. 1. IEEE, 2005
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman.
arXiv:1409.1556
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 17 / 18
References II
Fast R-CNN
Ross B. Girshick.
Proceedings of the IEEE International Conference on Computer
Vision. 2015.
Libraries Used
OpenCV(i/o, bgs, optical flow, sift), sklearn(k-means, classifiers),
sklearn-image(hog).
Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 18 / 18

Machine Learning - Object Detection and Classification

  • 1.
    CS771A Project Presentation MachineLearning - Tools, Techniques and Applications Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain IIT KANPUR India 10-4-2016 Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 1 / 18
  • 2.
    Table of Contents 1Introduction 2 Methodology Feature Extraction Classification Object localization 3 Experiments and Results 4 Scope of Improvements Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 2 / 18
  • 3.
    Outline 1 Introduction 2 Methodology FeatureExtraction Classification Object localization 3 Experiments and Results 4 Scope of Improvements Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 3 / 18
  • 4.
    Introduction Aim Detection andclassification of objects in surveillance video into various categories. Categories - Car, Person, Motorcycle, Bicycle, Rickshaw, Autorickshaw Figure: Detection and Classification of objects Motivation - Recognition, Tracking, Security, etc. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 4 / 18
  • 5.
    Outline 1 Introduction 2 Methodology FeatureExtraction Classification Object localization 3 Experiments and Results 4 Scope of Improvements Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 5 / 18
  • 6.
    Outline 1 Introduction 2 Methodology FeatureExtraction Classification Object localization 3 Experiments and Results 4 Scope of Improvements Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 6 / 18
  • 7.
    Feature Extraction -Dataset To obtain the image dataset required for classification, we extracted frames from the provided video data. Frames were taken after every 20 frames of each video. The images in the dataset correspond to the labelled bounding boxes from these frames. Figure: Red Boxes Images used as dataset Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 7 / 18
  • 8.
    Feature Extraction -Feature Vector Bag of visual words[1] Computed ∼ 300 SIFT points per image in the dataset. Used mini-batch K-means - 700 clusters of siftpoints. Voting of siftpoints of each image in clusters give vector of dimension 700. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 8 / 18
  • 9.
    Feature Extraction -Feature Vector Bag of visual words[1] Computed ∼ 300 SIFT points per image in the dataset. Used mini-batch K-means - 700 clusters of siftpoints. Voting of siftpoints of each image in clusters give vector of dimension 700. HOG features[2] Images in the dataset are resized to 128x128. Obtained HOG feature vector of dimension 15876 for each image. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 8 / 18
  • 10.
    Feature Extraction -Feature Vector Bag of visual words[1] Computed ∼ 300 SIFT points per image in the dataset. Used mini-batch K-means - 700 clusters of siftpoints. Voting of siftpoints of each image in clusters give vector of dimension 700. HOG features[2] Images in the dataset are resized to 128x128. Obtained HOG feature vector of dimension 15876 for each image. Convolution Neural Network features Used VGG 16 Model[3]. Output of FC-4096 layer as feature vector. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 8 / 18
  • 11.
    Outline 1 Introduction 2 Methodology FeatureExtraction Classification Object localization 3 Experiments and Results 4 Scope of Improvements Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 9 / 18
  • 12.
    Classification - ClassifiersUsed After obtaing feature vectors of images in the dataset, classifiers are trained given the category of each image. Classifiers Used: Support Vector Machines Linear Kernel Gaussian Kernel Random Forest Classifier Decision Tree Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 10 / 18
  • 13.
    Outline 1 Introduction 2 Methodology FeatureExtraction Classification Object localization 3 Experiments and Results 4 Scope of Improvements Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 11 / 18
  • 14.
    Object localization Objects ofinterest are extracted from the test videos. Three techniques are implemented for the task. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
  • 15.
    Object localization Objects ofinterest are extracted from the test videos. Three techniques are implemented for the task. Using Background Subtraction Using Mixture Of Gaussions(MOG) Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
  • 16.
    Object localization Objects ofinterest are extracted from the test videos. Three techniques are implemented for the task. Using Background Subtraction Using Mixture Of Gaussions(MOG) Measuring Optical Flow Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
  • 17.
    Object localization Objects ofinterest are extracted from the test videos. Three techniques are implemented for the task. Using Background Subtraction Using Mixture Of Gaussions(MOG) Measuring Optical Flow Sliding Window Method Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
  • 18.
    Object localization Objects ofinterest are extracted from the test videos. Three techniques are implemented for the task. Using Background Subtraction Using Mixture Of Gaussions(MOG) Measuring Optical Flow Sliding Window Method Figure: Contours are drawn on right image Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
  • 19.
    Object localization Objects ofinterest are extracted from the test videos. Three techniques are implemented for the task. Using Background Subtraction Using Mixture Of Gaussions(MOG) Measuring Optical Flow Sliding Window Method Figure: Contours are drawn on right image Then, Learned Classifier is used to classify extracted image. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 12 / 18
  • 20.
    Outline 1 Introduction 2 Methodology FeatureExtraction Classification Object localization 3 Experiments and Results 4 Scope of Improvements Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 13 / 18
  • 21.
    Experiments and Results Datasetfor classifier learning Labelled dataset was noisy - separated good images for dataset creation. ∼ 1600 training images and ∼ 450 test images. Around 300 training images of each class used for training. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 14 / 18
  • 22.
    Experiments and Results Datasetfor classifier learning Labelled dataset was noisy - separated good images for dataset creation. ∼ 1600 training images and ∼ 450 test images. Around 300 training images of each class used for training. Accuracy Obtained Classification Accuracies Obtained Classifier/features BOVW HOG CNN SVM(linear) 0.8535 0.9129 0.9873 SVM(gaussian) 0.879 0.756 0.9809 Random Forest 0.8322 0.9023 0.9745 Decision Tree 0.6475 0.6072 0.8917 Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 14 / 18
  • 23.
    Outline 1 Introduction 2 Methodology FeatureExtraction Classification Object localization 3 Experiments and Results 4 Scope of Improvements Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 15 / 18
  • 24.
    Scope of Improvements DatasetDataset and Labels are noisy, improvement of dataset are likely to give improve results. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 16 / 18
  • 25.
    Scope of Improvements DatasetDataset and Labels are noisy, improvement of dataset are likely to give improve results. Localization BGS, Optical flow, Sliding window do not work good for localization. Learning method can be used for localization. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 16 / 18
  • 26.
    Scope of Improvements DatasetDataset and Labels are noisy, improvement of dataset are likely to give improve results. Localization BGS, Optical flow, Sliding window do not work good for localization. Learning method can be used for localization. CNN Though the accuracy of classifier with CNN features is very good, it require 1-2 second each frame for the classification. For real time classification, Fast R-CNN[4] can be used. Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 16 / 18
  • 27.
    References I Object recognitionfrom local scale-invariant features Lowe, David G. (1999). Proceedings of the International Conference on Computer Vision. pp. 11501157 Histograms of oriented gradients for human detection Dalal, Navneet, and Bill Triggs. Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005 Very Deep Convolutional Networks for Large-Scale Image Recognition K. Simonyan, A. Zisserman. arXiv:1409.1556 Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 17 / 18
  • 28.
    References II Fast R-CNN RossB. Girshick. Proceedings of the IEEE International Conference on Computer Vision. 2015. Libraries Used OpenCV(i/o, bgs, optical flow, sift), sklearn(k-means, classifiers), sklearn-image(hog). Group 43 Preetansh Goyal, Shubham Gupta, Vandana Gautam, Vikas Jain (IIT Kanpur)CS771A Project Presentation 10-4-2016 18 / 18