DATA SCIENCE

Course Duration: 20 Hours (approx.)
Location: Online - Live Sessions
Prerequisites: No
Category: Agile, Business, Project Management, Scrum, SDLC, STLC
Language: English
Skill Level: Beginner
Available Modes: Online (Batch or One on One)
Sessions: Weekday and Weekend
Course Capacity: 20
Viewers: 5278
Start Course:
Certificate: Yes

Descriptions

Machine Learning witThis course focuses on the practical application of data science in solving real-world problems. Students will learn how to use statistical analysis, machine learning, and data visualization techniques to extract insights from complex data sets. They will also develop practical skills such as data cleaning, preparation, and wrangling. Through hands-on projects and case studies, students will gain experience in the entire data science pipeline, from data collection to decision-making. By the end of the course, students will be equipped with the tools and techniques necessary to use data science to make informed decisions in their personal and professional lives.h Python, Artificial Intelligence, Deep Learning.

Data science, Machine Learning with Big data

Course Content:

Introduction to Data Science
- What is data science
  - Data science is the study of data. It involves developing methods of recording, storing, and analysing data to effectively extract useful information. The goal of data science is to gain insights and knowledge from any type of data — both structured and unstructured.
- Role of Data scientist
- How data science is driving the industries
- Role of PYTHON in data science applications and why we choose PYTHON
Introduction to Python
- Introduction to Python programming language
- Features and how it is different from other programming languages
- Python & Anaconda Installation on Windows, Linux and Mac
- Python IDE working mechanism
- Python Basics
  - Variables,
  - Data Types
  - Keywords
  - Examples on variable methods
Operators
Python Data structures
- Data Structures
  - List
  - Tuple
  - Dictionary
  - Set
- Slicing
- Q & A’s
- Hands-on Exercises
Control statements and Loops
1. IF- ELSE statements
2. For Loop and While Loop
3. Q & A’s
4. Hands-on Exercises
Functions
1. Role of functions
2. Parameters
3. Executing functions
4. Q & A’s
5. Hands-on Exercises
Lambda functions
Exceptions and how we use in projects
OOPS concepts & Database access
1. Understanding object -oriented programming
2. Global and Local variables
3. Methods
4. Connect with Database and pull the data
5. Q & A’s
6. Hands-on Exercises
Setting up the Jupyter notebook environment

Modules:

11. NumPy

NumPy is not another programming language but a Python extension module. It

provides fast and efficient operations on arrays of homogeneous

data. NumPy extends python into a high-level language for manipulating numerical

data, similar to MATLAB

- Understanding NumPy
- Role of NumPy in Data Science
- Arrays and Matrices
- Important Methods
- Slicing
- Q & A’s
- Hands-on Exercises

12. SciPy

It is used for scientific computing and technical computing. It contains modules

for optimization, linear algebra, integration, interpolation, special functions, FFT,

signal and image processing, ODE solvers and other tasks common in science and

engineering

- Introduction
- Characteristics of SciPy
- Sub packages of SciPy
- Bayes theorem
- Q & A’s
- Hands-on Exercises

Pandas (Data manipulation)

Data in pandas is often used to feed statistical analysis in SciPy, plotting

functions from Matplotlib, and machine learning algorithms in Scikit-learn. Jupyter

Notebooks offer a good environment for using pandas to do data exploration and

modelling, but pandas can also be used in text editors just as easily

- Dataframes and it’s methods
- Reading and writing the different file formats (CSV, Json, etc.)
- Connecting to Database
- Data manipulation techniques
- Joins and merge
- NumPy dependency of Pandas library
- Exploring and analysing datasets
- Q & A’s
- Hands-on Exercises

Data Analysis and Machine learning:

Machine learning

- Introduction
- Various tools in python used for machine learning (NumPy, Pandas, Matplotlib, Scikit-Learn etc.)
- Use cases of Machine learning
- Machine learning flow
- Handling missing values

Algorithms:

a. Linear Regression

Linear regression is a basic and commonly used type of predictive analysis. The overall idea of regression is to examine two things:
- does a set of predictor variables do a good job in predicting an outcome (dependent) variable?
- Which variables in particular are significant predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable?

b. Logistic Regression

Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable

c. Gradient descent

Gradient Descent is the process of minimizing a function by following the gradients of the cost function. This involves knowing the form of the cost as well as the derivative so that from a given point you know the gradient and can move in that direction

d. Time series analysis

Time series analysis is the collection of data at specific intervals over a period of time, with the purpose of identifying trends, cycles, and seasonal variances to aid in the forecasting of a future event. Data is any observed outcome that’s measurable.
- Q & A’s
- Hands-on Exercises

15.Supervised Learning

What is Supervised Learning
- A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
Classification
- Classification is the process of predicting the class of given data points. Classes are sometimes called as targets/ labels or categories. Classification belongs to the category of supervised learning where the targets also provided with the input data
Decision Tree and algorithm for Decision Tree induction
- A decision tree is a flowchart-like structure in which each internal node represents a ―test‖ on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes)
Confusion Matrix
- A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known
Random Forest
- Random Forest increases predictive power of the algorithm and also helps prevent overfitting. Random forest is the simplest and widely used algorithm. Used for both classification and regression. It is an ensemble of randomized decision trees
Naïve Bayes
- Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.
Implement Naïve Bayes Classifier
Q & A’s
Support vector machine and its process
- SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs.
Hyperparameter optimization
- Hyperparameter is a parameter whose value is used to control the learning process
- Comparing Random search with Grid search
- Implement Support vector machine for classification
- Q & A’s
- Hands-on Exercise to implement above algorithms using SciPy

Unsupervised Learning

Introduction and use cases of Unsupervised Learning
K-means clustering
- The K–means clustering algorithm is used to find groups which have not been explicitly labelled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.
Optimal clustering
- The optimal number of clusters k is the one that maximize the average silhouette over a range of possible values for k. The algorithm is similar to the elbow method and can be computed as follow: Compute clustering algorithm (e.g., k-means clustering) for different values of k
Hierarchical clustering
- Hierarchical clustering is a powerful technique that allows you to build tree structures from data similarities
Implementation of K-means and Hierarchical clustering
Q & A’s
Introduction to NLP
- helps resolve ambiguity in language and adds useful numeric structure to the data for many downstream applications, such as speech recognition or text analytics.
Working with NLP on text data
Analysing sentence
Bags of words model
- The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.
Extracting features from text
Searching a grid
Model training
Multiple parameters and building of a pipeline
Q & A’s
Hands-on Exercises using SciPy

Project implementation

DATA SCIENCE

Descriptions

Empowering Success through Technology, Talent, and Service.

Contact Us