We the students of the M.Stat batch of ISI, Kolkata are organizing coursework for the Data Science or Analytics job aspirants. We have got several requests from students coming from Engineering, Economics, Mathematics, Statistics, and several other backgrounds who are seeking jobs in Data Science, or Data analytics and would like to know how to prepare for them and many have asked for our guidance. And so we have decided to come forward and take out time from our busy schedule to help such aspirants get the jobs they seek. We will be covering all the topics that one needs to know when they apply for such jobs. The relevant high school topics, pre-college mathematics as well as probability, statistics, linear algebra, design, and analysis of algorithms, machine learning, and mathematical analysis will all be covered. You will find a detailed structure of the coursework below along with sample notes and materials that have been prepared by us. The classes will be held live online via Zoom and recordings of all the lectures will be available. We will be there to share all our experiences and help you boost your skills and profile so that you get the job you are seeking.
Data Science and Data Analytics Coursework
About the Coursework
Starting 15 th February, 2022
Demo Notes and Problems:
Note: These are just sample notes. Complete lecture notes and materials will be given to the members during the course. Please stay tuned for more updates on sample notes.
How to enroll?
The course has been separated into two levels: Level 1 and Level 2. Level 1 will begin on February 15th, 2022, and will last for two months and fifteen days before Level 2 begins. You can click on “Get Course” and enroll for both levels at a cost of INR 6000 only or you may enroll for Level 1 now at a cost of INR 3500 only and then enroll for Level 2 at a cost of INR 3500 before it starts. If one wishes to appear only for the Mock Tests and Mock Interviews then there is a separate plan for them which costs INR 1000 only. Please note that students who opt for the full course or opt for Level 1 and then Level 2 will not have to purchase the Mock Tests and Mock Interview plan separately as it is already included in the course. You will only be able to register once you have paid the full amount. No partial payments will be accepted. For other payment methods, you can WhatsApp at 9700803692. Registration will be closed once the limit has been reached. Registration will be based on first come first serve basics.
Topics to be covered:
- Linear Algebra and Linear Models: Getting Started with the plane, system of linear equations, vector spaces and projections, Inner product, and orthogonal projections, Matrix Decomposition, Eigenvalues and Eigenvectors, vector calculus, linear regression, Dimensionality reduction with PCA, Gaussian Mixtures Models, Density Estimation, Support Vector Machines, Model Implementation in R/Python.
- Probability Theory: Classical Probability, Discrete Random Variables, Continuous Random Variables, Inequalities and Modes of Convergence, Convergence Theorems, Random Vectors, Stochastic Processes I, Stochastic Processes II, Measure Theory I, Measure Theory II, Conditional Probability, Martingale Theory, Miscellaneous Problems.
- Data Structure and Algorithms: Introduction and Analysis of Algorithms, Find and Search Algorithm, Sorting Algorithm & their Comparison, Data Structures, Search Trees and Hashing, Algorithms Involving Sequences and Sets, Graph Algorithms, Minimum Spanning Tree, Single Source Shortest Paths, Maximum Flow Algorithms, Geometric Algorithms, NP-Completeness, Miscellaneous Problems.
- Statistics for Data Science: Descriptive Statistics, Estimation, Testing of Hypothesis, Fitting Probability Distributions, Linear Regression, Linear Models, Resampling Techniques, Generalised Linear Models, Penalized Regression, Categorical Data Analysis, Regression Techniques, Time Series Analysis, Miscellaneous Problems.
- Machine Learning and Neural Networks: Introduction to Machine Learning, Exploratory Data Analysis, Supervised Classification, Supervised Regression, K Nearest Neighbours, Support Vector Machines, Decision Trees and Ensemble Techniques, K Means Clustering, DBSCAN and Anomaly Detection, t-SNE, Neural Networks I, Neural Networks II, Miscellaneous Problems.
The whole course will consist of 60 lectures, a total of 120 hours of live lectures along with Mock Tests and Mock Interviews.
Prerequisite: You should know your 10+2 Mathematics and must be from Math, Stat, Eco or Engineering background.
A detailed description of the course and sample problems are below.
Course plan and Description
Timings: 9 pm to 11 pm
Who can apply?
If you are a student of Tier – I or II of any Govt or Private Engineering College, where Data Science Profiles and Financial Companies are coming for placement and you would like to prepare for it then this course is for you.
Students from the Mathematics and Statistics Department of DU, BHU, CU, HCU, MSQMS students of ISI Bangalore, CMI Data Science students, students from Economics Department like DSE, MSE, IGIDR., etc. All these departments have strong placement cells and if you think you need help preparing for such jobs then this is the right place to get started.
We would encourage any off-campus Data Science or Analytics job seeker to apply.
Also if you are interested in learning the above-mentioned topics then you may also apply.
We are preparing a complete guide to crack these particular jobs where engineers can learn statistics, Eco graduates can learn probability and puzzles, Stat grad can learn algorithms, and much more.
Linear Algebra And Linear Models Questions:
- Can there be square matrices such that
Suppose we pick points uniformly at random from the circumference of a circle. What is the probability that this triangle is acute? Suppose we extend this setup to
Suppose we pick points uniformly at random from the surface area of a sphere. What is the probability that this pyramid/tetrahedron is acute? [All the faces are acute-angled triangles.]
Suppose we take an equilateral triangle in , say the vertices are like given in the figure. What is the regression line of on ?
- What happens to the regression line when we rotate this triangle about its centroid?
Probability Theory Questions:
Identically distributed random numbers (Real Numbers) are generated one by one till the sequence obtained by listing these numbers is in increasing order. What is the Expected Length of this list?
There are unstable molecules in a row, . One of the pairs of neighbors, chosen at random, combine to form a stable dimer; this process continues until there remain isolated molecules no two of which are adjacent.
1. What is the probability that remains isolated?
2. Deduce that .
A number of spaceships land independently and uniformly at random on the surface of planet Mongo. Each ship controls the hemisphere of which it is the center. What is the probability that every point on Mongo is controlled by at least one ship?
Hint: great circles almost surely partition the surface of the sphere into disjoint regions.
- Two Bi-variate datasets have positive correlation coefficients. Can the combined dataset have a negative correlation?
- Suppose there are points on , show that a line passes through at least of the given points. Note: LAD line means Least Absolute Deviation Minimising Line.
- Suppose are two Random Variables with correlation What is the correlation between
- Suppose we have 5 independent Random Variables such that for and for . Find the Best Linear Unbiased Estimator of ? Note: Here Best is in the sense of Minimized Mean Squared Error.
- Given the unit disk, simulate a uniformly random point inside it. Note: Uniformly Random on the Area.
Data Structure and Algorithms Problems:
- You are given matrices of such dimensions such that the product is defined. The computer can multiply only two matrices at a time and in whatever order the product is computed, the answer is always the same. Find an optimal way to parenthesize the matrices so that a minimum number of operations have to be performed to compute the product.
- You are given two sorted arrays of length . Device an algorithm of complexity to find the median of the array obtained by merging the two given arrays.
- A person has two eggs and he has to determine the minimum floor of a storey building from which if an egg is dropped, breaks. An unbroken egg can be, of course, repeatedly used. What is the most efficient algorithm for the person to determine the minimum required floor?
Note: Efficient means a minimum number of trials.
Machine Learning Questions
- How to handle imbalanced data in classification? Which metric is best suited for such a case? Give reasons.
- Generate a simulated two-class data set with observations and two features in which there is a visible but non-linear separation between the two classes. Show that in this setting, a support vector machine with a polynomial kernel (with degree greater than ) or a radial kernel will outperform a support vector classifier on the training data. Which technique performs best on the test data? Make plots and report training and test error rates in order to back up your assertions
- In Hierarchical clustering, we can use both correlation-based distance and Euclidean distance as dissimilarity measures. Show that these two measures are almost equivalent: if each observation has been centered to have mean and standard deviation , and if we let denote the correlation between the th and th observations, then the quantity is proportional to the squared Euclidean distance between the th and th observations.