Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. Why can\'t we use random search - a zero order method of optimization - to fi

ID: 3734853 • Letter: 1

Question

1. Why can't we use random search - a zero order method of optimization - to find the optimal weights of a deep learning model?

2. Why is the Least Squares cost rarely used in linear classification?

3. What is the major difference between One-versus-All (OvA) classification and multiclass softmax classification?

4. Why does gradient descent tend to zig-zag in long narrow valleys? What is one way to fix this issue?

5. What is the case for normalizing gradient descent in deep learning?

6. A convolutional layer that uses ReLU, i.e., max(0,)max(0,), as nonlinear activation together with max-pooling as pooling function, is redundant in the sense that we can simplify it by removing the nonlinear activation module in the middle without changing the results. Is that true or false? Explain.

Explanation / Answer

1)

Because manual search is the very critical in random search to reproducing results.This is important both for the progress of scientific research in machine learning or application of learning algorithms by non-expert users. the parameter space has a low effective dimensionality.Which parameters those are varies according to the dataset.

2)

least squares are not properly well understood by many of the people

Because Least squares regression can perform very poorly when some points in the training data have excessively large or small values for the dependent variable compared to the rest of the training data.

least squares regression suffer from the major drawback that in reality most systems are not linear.

3)

In One-versus-All classification each classifier is trained independently from the others, while with multi-class logistic regression all are tuned simultaneously.

One-versus-All

The idea consists in integrate one classifier per class. For each classifier the class is integrated against all the other classes. benefits of this approach is its interpretability. that's why each class is represented by only one classifier, it is possible to gain knowledge about the class by testing its corresponding classifier.

multiclass softmax classification

multiclass softmax classification means a classification task with more than two classes like classify a set of images of vegetables which may include potato,onion,tomato etc. Multiclass classification makes the assumption that each sample is assigned to one and only one label a vegetable can be either an onion or a potato but not both at the same time.

4)

In gradient descent due to direction of the negative gradient direction.gradient always perpendicular to the contour of the function it lies on. Because of this the gradient direction tends to zig-zag in long narrow valleys,

We can fix this issue with machine learning or deep learning functions.