DATA SCIENCE

  • Course Duration: 20 Hours (approx.)
  • Location: Online - Live Sessions
  • Prerequisites: No
  • Category: Agile, Business, Project Management, Scrum, SDLC, STLC
  • Language: English
  • Skill Level: Beginner
  • Available Modes: Online (Batch or One on One)
  • Sessions: Weekday and Weekend
  • Course Capacity: 20
  • Viewers: 5278
  • Start Course:
  • Certificate: Yes

Descriptions

Machine Learning witThis course focuses on the practical application of data science in solving real-world problems. Students will learn how to use statistical analysis, machine learning, and data visualization techniques to extract insights from complex data sets. They will also develop practical skills such as data cleaning, preparation, and wrangling. Through hands-on projects and case studies, students will gain experience in the entire data science pipeline, from data collection to decision-making. By the end of the course, students will be equipped with the tools and techniques necessary to use data science to make informed decisions in their personal and professional lives.h Python, Artificial Intelligence, Deep Learning.

Data science, Machine Learning with Big data

Course Content:

  1. Introduction to Data Science
    • What is data science
      • Data science is the study of data. It involves developing methods of recording, storing, and analysing data to effectively extract useful information. The goal of data science is to gain insights and knowledge from any type of data — both structured and unstructured.
    • Role of Data scientist
    • How data science is driving the industries
    • Role of PYTHON in data science applications and why we choose PYTHON
  2. Introduction to Python
    • Introduction to Python programming language
    • Features and how it is different from other programming languages
    • Python & Anaconda Installation on Windows, Linux and Mac
    • Python IDE working mechanism
    • Python Basics
      • Variables,
      • Data Types
      • Keywords
      • Examples on variable methods
  3. Operators
  4. Python Data structures
    • Data Structures
      • List
      • Tuple
      • Dictionary
      • Set
    • Slicing
    • Q & A’s
    • Hands-on Exercises
  5. Control statements and Loops
    1. IF- ELSE statements
    2. For Loop and While Loop
    3. Q & A’s
    4. Hands-on Exercises
  6. Functions
    1. Role of functions
    2. Parameters
    3. Executing functions
    4. Q & A’s
    5. Hands-on Exercises
  7. Lambda functions
  8. Exceptions and how we use in projects
  9. OOPS concepts & Database access
    1. Understanding object -oriented programming
    2. Global and Local variables
    3. Methods
    4. Connect with Database and pull the data
    5. Q & A’s
    6. Hands-on Exercises
  10. Setting up the Jupyter notebook environment

Modules:

11. NumPy

NumPy is not another programming language but a Python extension module. It

provides fast and efficient operations on arrays of homogeneous

data. NumPy extends python into a high-level language for manipulating numerical

data, similar to MATLAB

    • Understanding NumPy
    • Role of NumPy in Data Science
    • Arrays and Matrices
    • Important Methods
    • Slicing
    • Q & A’s
    • Hands-on Exercises

12. SciPy

It is used for scientific computing and technical computing. It contains modules

for optimization, linear algebra, integration, interpolation, special functions, FFT,

signal and image processing, ODE solvers and other tasks common in science and

engineering

    • Introduction
    • Characteristics of SciPy
    • Sub packages of SciPy
    • Bayes theorem
    • Q & A’s
    • Hands-on Exercises
  1. Pandas (Data manipulation)

Data in pandas is often used to feed statistical analysis in SciPy, plotting

functions from Matplotlib, and machine learning algorithms in Scikit-learn. Jupyter

Notebooks offer a good environment for using pandas to do data exploration and

modelling, but pandas can also be used in text editors just as easily

    • Dataframes and it’s methods
    • Reading and writing the different file formats (CSV, Json, etc.)
    • Connecting to Database
    • Data manipulation techniques
    • Joins and merge
    • NumPy dependency of Pandas library
    • Exploring and analysing datasets
    • Q & A’s
    • Hands-on Exercises

 

Data Analysis and Machine learning:

  1. Machine learning
    • Introduction
    • Various tools in python used for machine learning (NumPy, Pandas, Matplotlib, Scikit-Learn etc.)
    • Use cases of Machine learning
    • Machine learning flow
    • Handling missing values

Algorithms:

a. Linear Regression

  • Linear regression is a basic and commonly used type of predictive analysis. The overall idea of regression is to examine two things:
    • does a set of predictor variables do a good job in predicting an outcome (dependent) variable?
    • Which variables in particular are significant predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable?

b. Logistic Regression

  • Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable

c. Gradient descent

  • Gradient Descent is the process of minimizing a function by following the gradients of the cost function. This involves knowing the form of the cost as well as the derivative so that from a given point you know the gradient and can move in that direction

d. Time series analysis

  • Time series analysis is the collection of data at specific intervals over a period of time, with the purpose of identifying trends, cycles, and seasonal variances to aid in the forecasting of a future event. Data is any observed outcome that’s measurable.
    • Q & A’s
    • Hands-on Exercises

15.Supervised Learning

  • What is Supervised Learning
    • supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
  • Classification
    • Classification is the process of predicting the class of given data points. Classes are sometimes called as targets/ labels or categories. Classification belongs to the category of supervised learning where the targets also provided with the input data
  • Decision Tree and algorithm for Decision Tree induction
    • decision tree is a flowchart-like structure in which each internal node represents a ―test‖ on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes)
  • Confusion Matrix
    • confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known
  • Random Forest
    • Random Forest increases predictive power of the algorithm and also helps prevent overfitting. Random forest is the simplest and widely used algorithm. Used for both classification and regression. It is an ensemble of randomized decision trees
  • Naïve Bayes
    • Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.
  • Implement Naïve Bayes Classifier
  • Q & A’s
  • Support vector machine and its process
    • SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs.
  • Hyperparameter optimization
    • Hyperparameter is a parameter whose value is used to control the learning process
    • Comparing Random search with Grid search
    • Implement Support vector machine for classification
    • Q & A’s
    • Hands-on Exercise to implement above algorithms using SciPy
  1. Unsupervised Learning
  • Introduction and use cases of Unsupervised Learning
  • K-means clustering
    • The Kmeans clustering algorithm is used to find groups which have not been explicitly labelled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.
  • Optimal clustering
    • The optimal number of clusters k is the one that maximize the average silhouette over a range of possible values for k. The algorithm is similar to the elbow method and can be computed as follow: Compute clustering algorithm (e.g., k-means clustering) for different values of k
  • Hierarchical clustering
    • Hierarchical clustering is a powerful technique that allows you to build tree structures from data similarities
  • Implementation of K-means and Hierarchical clustering
  • Q & A’s
  • Introduction to NLP
    • helps resolve ambiguity in language and adds useful numeric structure to the data for many downstream applications, such as speech recognition or text analytics.
  • Working with NLP on text data
  • Analysing sentence
  • Bags of words model
    • The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.
  • Extracting features from text
  • Searching a grid
  • Model training
  • Multiple parameters and building of a pipeline
  • Q & A’s
  • Hands-on Exercises using SciPy
  1. Project implementation