INTRODUCTION TO DATA SCIENCE
DATA SCIENCE TOOLS
DATA SCIENCE TOOLS FOR DATA MANIPULATION
DATA SCIENCE TOOLS FOR EDA
www.edureka.co
DATA SCIENCE TOOLS FOR DATA STORAGE
DATA SCIENCE TOOLS FOR DATA VISUALIZATION
INTRODUCTION TO DATA SCIENCE
www.edureka.co
Introduction To Data Science
www.edureka.co
Data Science is the process of extracting knowledge and insights from data by
using scientific methods.
Data Science involves collecting, analysing and modelling data to solve real-world problems. It is
used for fraud detection, disease detection, recommendation engines and so on.
DATA SCIENCE TOOLS
www.edureka.co
Data Science Tools come with pre-defined functions, algorithms, and a very user-friendly GUI.
Hence, they can be used to build convoluted Machine Learning models without the use of a
programming language.
DATA SCIENCE TOOLS
Data Science
Data Collection
Exploratory Data Analysis
Data Modelling
Data Visualization
www.edureka.co
DATA SCIENCE TOOLS FOR DATA STORAGE
www.edureka.co
Scale and manage massive
amounts of data
Hadoop Distributed File System
(HDFS) for data storage
Integrate with , Hadoop
MapReduce, Hadoop YARN
www.edureka.co
Data processing via Apache
Hadoop and Spark clusters
The default storage system is
Windows Azure Blob
Provides Microsoft R Server
www.edureka.co
DATA SCIENCE TOOLS FOR EDA
www.edureka.co
Data Integration tool based on
Extract Transform Load architecture
Extract Transform Load tool
to manage data
Support for distributed processing, grid
computing, adaptive load balancing.
www.edureka.co
Data processing, building
Machine Learning models, etc
Support for integrating Hadoop
framework
Generate predictive models
through automated modelling
www.edureka.co
DATA SCIENCE TOOLS FOR DATA MODELLING
www.edureka.co
Easy to apply Machine Learning
Supports GLM, Boosting ML models
& Deep Learning
Support to integrate with Apache
Hadoop
www.edureka.co
Supports parallel programming to
perform data analysis, data
modelling, etc
Tests and trains Machine Learning
models at lightning fast speed
Makes model evaluation much
easier.
www.edureka.co
DATA SCIENCE TOOLS FOR VISUALIZATION
www.edureka.co
Can visualize massive data sets to find
correlations and patterns
Create customized reports and
dashboards
Support to integrate with Apache
Hadoop
www.edureka.co
Clear & concise visualizations
Supports in-memory data
processing
Automatically generates data
associations
www.edureka.co
www.edureka.co

Top 8 Data Science Tools | Open Source Tools for Data Scientists | Edureka

  • 2.
    INTRODUCTION TO DATASCIENCE DATA SCIENCE TOOLS DATA SCIENCE TOOLS FOR DATA MANIPULATION DATA SCIENCE TOOLS FOR EDA www.edureka.co DATA SCIENCE TOOLS FOR DATA STORAGE DATA SCIENCE TOOLS FOR DATA VISUALIZATION
  • 3.
    INTRODUCTION TO DATASCIENCE www.edureka.co
  • 4.
    Introduction To DataScience www.edureka.co Data Science is the process of extracting knowledge and insights from data by using scientific methods. Data Science involves collecting, analysing and modelling data to solve real-world problems. It is used for fraud detection, disease detection, recommendation engines and so on.
  • 5.
  • 6.
    Data Science Toolscome with pre-defined functions, algorithms, and a very user-friendly GUI. Hence, they can be used to build convoluted Machine Learning models without the use of a programming language. DATA SCIENCE TOOLS Data Science Data Collection Exploratory Data Analysis Data Modelling Data Visualization www.edureka.co
  • 7.
    DATA SCIENCE TOOLSFOR DATA STORAGE www.edureka.co
  • 8.
    Scale and managemassive amounts of data Hadoop Distributed File System (HDFS) for data storage Integrate with , Hadoop MapReduce, Hadoop YARN www.edureka.co
  • 9.
    Data processing viaApache Hadoop and Spark clusters The default storage system is Windows Azure Blob Provides Microsoft R Server www.edureka.co
  • 10.
    DATA SCIENCE TOOLSFOR EDA www.edureka.co
  • 11.
    Data Integration toolbased on Extract Transform Load architecture Extract Transform Load tool to manage data Support for distributed processing, grid computing, adaptive load balancing. www.edureka.co
  • 12.
    Data processing, building MachineLearning models, etc Support for integrating Hadoop framework Generate predictive models through automated modelling www.edureka.co
  • 13.
    DATA SCIENCE TOOLSFOR DATA MODELLING www.edureka.co
  • 14.
    Easy to applyMachine Learning Supports GLM, Boosting ML models & Deep Learning Support to integrate with Apache Hadoop www.edureka.co
  • 15.
    Supports parallel programmingto perform data analysis, data modelling, etc Tests and trains Machine Learning models at lightning fast speed Makes model evaluation much easier. www.edureka.co
  • 16.
    DATA SCIENCE TOOLSFOR VISUALIZATION www.edureka.co
  • 17.
    Can visualize massivedata sets to find correlations and patterns Create customized reports and dashboards Support to integrate with Apache Hadoop www.edureka.co
  • 18.
    Clear & concisevisualizations Supports in-memory data processing Automatically generates data associations www.edureka.co
  • 19.