Friday, March 18, 2016
Saturday, January 2, 2016
- Simple recursive algorithms
- Backtracking algorithms
- Divide and conquer algorithms
- Dynamic programming algorithms
- Greedy algorithms
- Branch and bound algorithms
- Brute force algorithms
- Randomized algorithms
Complexity of Algorithms :
- Constant - O(1)
- Logarithmic - O(log(N))
- Linear - O(N)
- Quadratic - O(N*N)
- Cubic - O(N*N*N)
- Exponential - O(N!) or O(2^N) or O(N^K) or many others.
Tuesday, December 29, 2015
- You start with a dataset to analyse. - Purchase / Social / Medical / Travel
- Many variable are typically collected. - Categorical / Continuous / Geo
- Majority of them can be irrelevant and cause noise.
- Data Mining is Statistics at Scale and Speed.
- Applications in Intelligence / Genetics / Natural Sc. / Bussiness.
- Data Mining has origin with Categorical data whereas Statistics deals with Continuous data.
- Large model overfits the training dataset and may lead to higher prediction error with new situations.
- Consider if predictor variable would be available and relationship holds in future data.
- Cluster analysis is example for Unsupervised learning
- Dimension Reduction
- Association Rules
- Classification is example of Supervised learning
- Regression, Regression Trees, Nearest Neighbour - Continuous response.
- Logistic Regression, Classification Trees, Nearest Neighbour, Discriminant analysis and Naive Bayes methods are well suited for Categorical response.
- Data Mining should be viewed as a process :
- Data Storage & PreProcessing
- Identify variables for investigation
- Screen the outliers and missing values from data
- Data need to be partitioned for training, test and evaluation set.
- Use Sampling for Large datasets.
- Visualize your data - Line, Bar, Scatter, Box, Histogram, Map, Geo
- Summary of data - Mean, Median, Mode, Standard Deviation, Correlation, Principal Components
- Apply appropriate model - Linear, Logistic, Trees, K-means ...
- Verify finding against evaluation set.
- Get the insights, Apply the findings! Plan - do - check - act !!
https://www.linkedin.com/in/alokawi ( Data Engineer, Analytics Engineer, Data Science )
Friday, October 16, 2015
What is a Correlation in statistics?
Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect.
What is a Regression in statistics?
In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.
What is a Causation in statistics?When an article says that causation was found, this means that the researchers found that changes in one variable they measured directly caused changes in the other.