Tentin tekstisisältö

SGN-41007 Pattern Recognition and Machine Learning - 27.02.2018
Tentin tekstisisältö

Teksti on luotu tekstintunnistuksella alkuperäisestä tenttitiedostosta, joten se voi sisältää virheellistä tai puutteellista tietoa. Esimerkiksi matemaattisia merkkejä ei voida esitää oikein. Tekstiä käytetään pääasiassa hakutulosten luomiseen.
Alkuperäinen tentti
 

 

SGN-41007 Pattern Recognition and Machine Learning
Exam 27.2.2018
Heikki Huttunen

 

> Use of calculator is allowed.

> Use of other materials is not allowed.

> The exam guestions need not be returned after the exam.
> You may answer in English or Finnish.

1. Are the following statements true or false? No need to justify your answer, just T or F.
Correct answer: 1 pts, wrong answer: < pts, no answer 0 pts.

(a) Least sguares estimator minimizes the sguared distance between neighboring sam-
ples.

(b) The Receiver Operating Characteristics curve plots the probability of detection versus
the probability of false alarm for all thresholds.

(c) The LDA maximizes the following score:

Distance of class means
Variance of classes

J(w) =

(d) A neural network classifier has a linear decision boundary between classes.
(e) L1 and L2 regularization improves the generalization of a logistic regression classifier.

(f) Maxpooling returns the maximum over input channels.

2. Two measurements x(n) and y(n) depend on each other in a linear manner, and there are
the following measurements available:

n 0 1 2
x(n) 7 9 2
ylm) | 11.6 148 35

 

 

 

 

We want to model the relationship between the two variables using the model:
y(n) = ax(n) + b.

Find the [,-regularized least sguares estimates & and 6 that minimize the sguared error
using penalty A = 10.

! Alternatively, the unregularized solution will give you max. 4 points.

 
O Class1
X Class?

x

 

 

-3 2 -1 o 1 2 3

   

Sample 1

Sample ? 0.3 1
Sample 3 0.5 0
Sample 4 0.25 0

Table 1: Results on test data for guestion 5.

3. A dataset consists of two classes, containing four samples each. The samples are shown in
Figure 1. The covariances of the classes are

1/1 1 1/5 1
a=3(i )) a=3(i ')-
Find the Linear Discriminant Analysis (LDA) weight vector w for this data and find the
threshold c at the center of the projected class means.

Present the decision rule for sample x € R? in the following format:

 

if | something

 

 

 

Class(x) = 6

2, otherwise

4. Consider the Keras model defined in Listing 2. Inputs are 64 x 64 color images from two
categories, and all convolutional kernels are of sizewxh=5x5.
(a) Draw a diagram of the network.
(b) Compute the number of parameters for each layer.

(c) How many scalar multiplications take place on the first convolutional layer?
 

 

 

conv2d 49 (Conv2D) (None, 64, 64, 32) |

 

max pooling2d 47 (MaxPooling (None, 16, 16, 32)

 

conv2d 50 (Conv2D) (None,

 

 

max pooling2d 48 (MaxPooling (None, 4, 4, 32)

 

 

 

(Flatten) (None,

 

 

dense 29 (Dense) (None, 100)

dense 30 (D

   
   

Total params: 79,566

   

Trainable params: , 566
Non-trainable params: 0

 

 

 

 

Figure 2: ACNN model summary as reported by Keras

5. Arandom forest classifier is trained on training data set and the predict proba method
is applied on the test data of five samples. The predictions and true labels are in Table 1.
Draw the receiver operating characteristic curve. What is the Area Under Curve (AUC)
score?

Related Wikipedia pages

 

Inversion of 2 x 2 matrices [edit]

The cofactor eguation listed above yields the following result for 2 x 2 matrices. Inversion of these matrices can be done as follows:l8]

at=[t 1] - 1[4—+] 1 d +
le ad] — det A[|-c a] ad-del-c al

 

 

 

 
 

ROC space jedit]

The contingency table can derive several evaluation "metrics" (see infobox). To draw a ROC curve, only the true positive
rate (TPR) and false positive rate (FPR) are needed (as functions of some classifier parameter). The TPR defines how
many correct positive results occur among all positive samples available during the test. FPR, on the other hand, defines
how many incorrect positive results occur among all negative samples available during the test.

A ROC space is defined by FPR and TPR as x and y axes respectively, which depicts relative trade-offs between true
positive (benefits) and false positive (costs). Since TPR is eguivalent to sensitivity and FPR is egual to 1 — specificity, the
ROC graph is sometimes called the sensitivity vs (1 — specificity) plot. Each prediction result or instance of a confusion
matrix represents one point in the ROC space.

 

 

 

Another complication in applying LDA and Fisher's discriminant to real data occurs when the number of
measurements of each sample (i.e., the dimensionality of each data vector) exceeds the number of
samples in each class.) In this case, the covariance estimates do not have full rank, and so cannot be
Inverted. There are a number of ways to deal with this. One is to use a pseudo inverse instead of the
usual matrix inverse in the above formulae. However, better numeric stability may be achieved by first
projecting the problem onto the subspace spanned by Yy.[12] Another strategy to deal with small sample
size is to use a shrinkage estimator of the covariance matrix, which can be expressed mathematically as

2 =(1-2)2+M

where 7 is the identity matrix, and A is the shrinkage intensity or regularisation parameter. This leads to
the framework of regularized discriminant analysis!!3] or shrinkage discriminant analysis.['4]
Tentin tekstisisältö

Tentin tekstisisältö

Käytämme evästeitä