Sunday, May 6, 2018

MACHINE LEARNING | MULTICLASS CLASSIFICATION

MACHINE LEARNING - DAY 9

MULTI-CLASS CLASSIFICATION: ONE-VS-ALL


For the basics, you can check the earlier articles.

Terms used in this article can be understood from:

Continuing our learning in machine learning today we’ll learn about the multi-class classification in logistic regression also known as one vs all.

Till now we have discussed about the 2 classification possibilities or 2 outcomes i.e., 1 or 0. Now, let’s see what happens when there are more number of possibilities.

for eg.,

lWeather: sunny, rainy, pleasant, windy

The outcome or the categorical value can be: 0, 1, 2, 3

lHealth: ill, dizzy, well
  
   The outcome or the categorical value can be: 0, 1, 2

The numbering doesn’t matter. It can be 1,2,3,4 or 0,1,2,3. These are just values which categorizes the given data or output into different categories.

y  {0,1,2…,n}

hΘ(0)(x) =  P(y = 0 | x; Θ )

hΘ(1)(x) =  P(y = 1 | x; Θ )
.
.
.
hΘ(n)(x) =  P(y = n | x; Θ )

prediction : max(hΘ(i)(x))
           i

STEPS OF COMPUTATION:

1. Plot the data





2. Take the classes one by one and rest of the 2 classes will behave as a single class or category. The probability of the single class is calculated in this way.

   For eg,




CONCLUSION:

Train a logistic regression hΘ(x) for each class to predict the probability that y = i.

To make a prediction on a new x, pick the class that maximizes hΘ(x) and that will be the output.


That’s all for day 9. Today we learned about the multi-class classification and how to compute it.

In day 10, we will be learning about the issue known as Overfitting which originates due to over-training of the model. The solution for this issue is Regularization which we’ll also cover in the next article.

If you think this article helped you in learning something new or can help someone then do share this article among the peers.

Till then Happy Learning!!!





MACHINE LEARNING | LOGISTIC REGRESSION COST FUNCTION

MACHINE LEARNING - DAY 8

LOGISTIC REGRESSION COST FUNCTION AND ANALYSIS


For the basics, you can check the earlier articles.

Terms used in this article can be understood from:

Continuing our learning in logistic regression today we’ll learn about the cost function in logistic regression and make it efficient to be able to find proper parameter values for the model.

COST FUNCTION:

Cost function as in linear regression is used to compute the parameter, Θi, values automatically which will give the best fit for a given model.
The graph of Θ and number of iterations should always be a convex curve to achieve a global minima.

Training set: {(x1,y1),(x2,y2),…..,(xm,ym)}  (Number of examples: m)

x = [x0;x1;x2;….;xn]        x0 = 1, y  {0,1}

The dimension of the x matrix is n+1 x 1.

hΘ(x) = 1/(1+(e^(-ΘTX))

How to choose parameter value for Θi :

In linear regression the cost function was
         n
J(Θ)=1/m (1/2)( hΘ(xi)  yi)2
        i=1
          n 
J(Θ) = 1/m  cost( hΘ(xi)  yi)
         i=1

cost( hΘ(xi)  yi) = 1/2 ( hΘ(xi)  yi)2

The only difference between the cost function of linear regression and logistic regression is that of the hypothesis. The hypothesis in logistic regression is:

hΘ(x) = 1/(1+(e^(-ΘTX))


This will give a non- convex graph or output curve which will lead to multiple local optima.

To prevent the non- convex output and get a convex output with a single optima i.e., global optima, we make some alterations in the cost function for logistic regression.

cost( hΘ(x)  y) = { - log(hΘ(x)), if y = 1} and { - log(1 - hΘ(x)), if y = 0}

For simplification the cost function can be written in a single line as,

cost( hΘ(x)  y) = - [y * log(hΘ(x)) + (1 - y) * log(1 - hΘ(x))]

for y = 0,

cost( hΘ(x)  y) = - [0 * log(hΘ(x)) + (1 - 0) * log(1 - hΘ(x))]

cost( hΘ(x)  y) = - [0 + (1) * log(1 - hΘ(x))] ≈  - log(1 - hΘ(x))

for y = 1,

cost( hΘ(x)  y) = - [1 * log(hΘ(x)) + (1 - 1) * log(1 - hΘ(x))]

cost( hΘ(x)  y) = - [log(hΘ(x)) + 0] ≈  - log(hΘ(x))

Hence, it gives the same equations as we have seen earlier.

GRADIENT DESCENT:

Gradient descent helps in iterating the equation until and unless a global minima is achieved.

Compute:





Notice, the gradient descent is also similar to the one used in the linear regression. The only difference here also is of the hypothesis used in linear regression and logistic regression which is:

hΘ(x) = 1/(1+(e^(-ΘTX))


That’s all for day 8. Today we learned about the cost function in logistic regression and how to make it efficient to attain a global minima for our parameters values to obtain the perfect fit for a model. We also learned the gradient descent for logistic regression.

In day 9, we will be learning about the Multi-Class Classification problem which includes more possible outcomes than 0 and 1.

If you think this article helped you in learning something new or can help someone then do share this article among others.

Till then Happy Learning!!!




  

MACHINE LEARNING | DECISION BOUNDARY

MACHINE LEARNING - DAY 7

DECISION BOUNDARY FOR LOGISTIC REGRESSION


For the basics, you can check the earlier articles.

Terms used in this article can be understood from:


DECISION BOUNDARY

Decision boundary means the shape of the curve dividing the data into 2 segments, one which has y = 1 and the other category with y = 0.

hΘ(x) = g(ΘTX) = P(y = 1 | x; Θ )


g(z) = 1/(1+(e^(-z)) = 1/(1+(e^(-ΘTX))


Suppose, we want to predict y=1 then

                             

hΘ(x)  0.5

and, for prediction of y=0,

hΘ(x) < 0.5

Now, let’s see when are these values possible.

1. y=1 when,

g(z)  0.5

When z  0

hΘ(x) = g(ΘTX)  0.5

y = 1, when ΘTX  0.

2. y = 0 when,

g(z) < 0.5

When z < 0

hΘ(x) = g(ΘTX) < 0.5

y = 0, when ΘTX < 0.

Now, let’s discuss about the decision boundaries with the help of some examples.

Example 1:

hΘ(x) = g(Θ0 + Θ1x1 + Θ2x2)



Let Θ0 = -3, Θ1 = 1, Θ2 = 1

Θ = [-3;1;1]

Dimension of Θ matrix is 3X1.

Predict y = 1, if

-3 + x1 + x2  0 ≈ g(z) > 0.5 ≈ z > 0.

x1 + x2 ≥ 3.

And for y = 0,

x1 + x2 < 3


NON - LINEAR DECISION BOUNDARIES

Sometimes, the data points are arranged in such a manner that the curve separating them takes a complex shape then a straight line.

Example

Hypothesis: hΘ(x) = g(Θ0 + Θ1x1 + Θ2x2 + Θ3x12 + Θ4x22)




Let Θ0 = -1, Θ1 = 0, Θ2 = 0, Θ3 =1, Θ4 = 1 (for now we’ll see how to find the parameters automatically under upcoming lessons.)

Θ = [-1;0;0;1;1]

The dimension of the matrix is 5X1.

To predict:

y = 1 if,

-1 + x12 + x22   0 ≈ x12 + x22   1(equation of a circle with center at origin).

NOTE: Decision boundaries depends upon the parameters i.e., Θ values.

Decision boundaries can vary depending upon the hypothesis. It can get complex or it can also get simplified with the increase of the parameters and the variables.

Points to remember:

lg(z)  0.5  z  0

lz = 0, e0 = 1, g(z) = 1/2

lz  , e- → 0 → g(z) = 1

lz  -, e  → g(z) = 0


That’s all for day 7. Today we learned about the decision boundaries in classification problems, especially in logistic regression.

In day 8, we will be learning about the cost function of logistic regression which will help us in figuring out the parameter i.e., Θ values automatically for the best fit and we will also learn about the concept of multi-class classification in logistic regression.

If you think this article helped you in learning something new or can help someone then do share this article among the peers.

Till then Happy Learning!!!