Data Science: June 2015

Saturday, 20 June 2015

Support Vector Machines

Support vector machines(SVMs) are supervised learning models with associated learning algorithms that analyze data and recognize patterns used for classification and regression analysis.

Advantages:

Effective in high dimensional space
Still effective in cases where no. of dimensional is greater than no. of samples.
Uses a subset of training points in the decision function. So it is also memory efficient.
Versatile: different kernal functions can be specified for the decision function. Common kernals are provided. But it also possible to specify custom kernals.

Monday, 15 June 2015

Logarithm in computer science

The logarithm of a number is the exponent to which another fixed value,the base, must be raised to produce that number.

 $y=b^x\Leftrightarrow x=\log_b(y)$

The binary logarithm (log2 n) is the logarithm to the base 2. In Computer Science or information theory, logarithm is very useful because it is closely connected to the binary numeral system. Binary numbers are actually base-2 numeral system.

100101₂ = [ ( 1 ) × 2⁵ ] + [ ( 0 ) × 2⁴ ] + [ ( 0 ) × 2³ ] + [ ( 1 ) × 2² ] + [ ( 0 ) × 2¹ ] + [ ( 1 ) × 2⁰ ]
100101₂ = [ 1 × 32 ] + [ 0 × 16 ] + [ 0 × 8 ] + [ 1 × 4 ] + [ 0 × 2 ] + [ 1 × 1 ]
100101₂ = 37₁₀

Difference between classification and clustering

Classification– The task of assigning instances to pre-defined classes.
–E.g. Deciding whether a particular patient record can be associated with a specific disease.

Classification is supervised learning technique used to assign per-defined tag to instance on the basis of features. So classification algorithm requires training data. Classification model is created from training data, then classification model is used to classify new instances.

Clustering – The task of grouping related data points together without labeling them.
–E.g. Grouping patient records with similar symptoms without knowing what the symptoms indicate.

Clustering is unsupervised technique used to group similar instances on the basis of features. Clustering does not require training data. Clustering does not assign per-defined label to each and every group.

Saturday, 20 June 2015

Support Vector Machines

Monday, 15 June 2015

Logarithm in computer science

Wednesday, 10 June 2015

Difference between classification and clustering