Tentin tekstisisältö

SGN-41007 Pattern Recognition and Machine Learning - 12.12.2019
Tentin tekstisisältö

Teksti on luotu tekstintunnistuksella alkuperäisestä tenttitiedostosta, joten se voi sisältää virheellistä tai puutteellista tietoa. Esimerkiksi matemaattisia merkkejä ei voida esitää oikein. Tekstiä käytetään pääasiassa hakutulosten luomiseen.
Alkuperäinen tentti
 

SGN-41007 Pattern Recognition and Machine Learning
Exam 12.12.2019
Heikki Huttunen

 

> Use of calculator is allowed.

> Use of other materials is not allowed.

> The exam guestions need not be returned after the exam.
> You may answer in English or Finnish.

1. Are the following statements true or false? No need to justify your answer, just T or F.
Correct answer: 1 pts, wrong answer: < pts, no answer 0 pts.

(a) Maximum likelihood estimators are unbiased.

(b) Least sguares estimator minimizes the sguared distance between the data and the
model.

(c) Mobilenets were the first to introduce a shortcut (residual) connection between layers.

(d) The number of support vectors of a support vector machine eguals the total number
of samples.

(e) The LDA maximizes the within-class distance of samples in each class.

(f) Cross-validation is used for model accuracy evaluation.
2. Consider the model
xn] = A exp(—n) sin(0n) + w[n], n =0,1,...,N—1,

where w[n] — /(0, 0?) and O is a known real number. In other words, we assume that our
measurement is a damped sinusoid at known freguency and phase and want to estimate
the amplitude A. Derive the maximum likelihood estimator of A.

3. Consider the Keras model defined in Listing 1. Inputs are 224 x 224 color images from 17
categories.

(a) Compute the number of parameters for each layer, and their total number over all
layers.

(b) Compute the number of multiplications reguired on the first convolutional layer.

4. [n this task, you will design both an unregularized and a regularized LDA classifier.

(a) Compute the LDA weight vector for

(2 — (0
mo=10 3 mE=1g
1 1 1 1
2=(; )) n=(; 3]
(b) Compute the regularized LDA with N = 100. You may use the Wikipedia pages at the
end of the exam paper.

 
Prediction True label
SmpkT| 08.1.
Sample 2 0.5 1
Sample 3 0.6 0
Sample 4 0.1 0

 

 

Table 1: Results on test data for guestion 5a.

 

Layer (type) Output Shape

conv2d 1 (Conv2D) (None, 224, 224, 32)

 

max pooling2d (MaxPooling2D) (None, 112, 112, 32)

 

conv2d 2 (Conv2D) (None, 112, 112, 32)

 

max pooling2d 1 (MaxPooling2 (None, 56, 56, 32)

 

conv2d 3 (Conv2D) (None, 56, 56, 32)

 

max pooling2d 2 (MaxPooling2 (None, 28, 28, 32)

 

conv2d 4 (Conv2D) (None, 28, 28, 32)

 

max pooling2d 3 (MaxPooling2 (None, 14, 14, 32)

 

flatten (Flatten) (None, 6272)

 

 

dense (Dense) (None, 17)

Total params:
Trainable par.

Non-trainable params: O

 

Figure 1: Model structure of Ouestion 3.

5, (a) A random forest dlassifier is trained on training data set and the predict proba
method is applied on the test data of Table 1. Draw the receiver operating character-
istic curve. What is the Area Under Curve (AUC) score?

(b) Draw the precision recall curve. What is the Area Under PR Curve (AUPRC) score?
Related Wikipedia pages

 

Another eompiiation in applying LDA and Fisher's discriminant (o real data occurs when the number of
measuremenis of each sample (i.e., Ihe dimensionality of each data vector) exceeds the number of
sampies in each class! In this case, the covariance estimatas do not have ful rank, and so cannot be
| inverted. There are a number of ways to deal with lis. One is to use a pseudo inverse instead of the
| | vsuai malrivinverse in the above formulae. However, better numeie stability may be achieved by first

The temis Fishers near serminant an LOA are olen used mtetehangeaty aanougn Fishers orinat artikkel! aetuaty
Gescres a sipndy ditterent iserminant which does not make some one assurpilons of LDA such 25 nomally
eistncuted classes or eg lass covananees

Suppose tuo classes of observations have means jy i; and coraianees Dy, 21. Then tne near combination of
Jeatures (+ 3 s have means 5 ji, and varanees GT Yi (ori = 0,1. Fisher detined me separation tetveen these

 

projecting Ihe problem onto Ine subspare spanned by Zs (| Anothar strategy to deal viin smalt
sample size is to usa a shrinkage estimator of the covatianeo matrix, which can be expressed
maihemativally as

T= (1-54

the tamework of ragularized diseriminant analysis!" or shrinkage asserminant analysis [2

where I is ineidentity matrix, and A is the shrinkage intensity or regutarisation parameter. This leads to

tso istbutionst9 be he rata of he vaiance Detseen Ine classes 19 he variance viin he classes

   

s — (8-0 -
int
Tmomeatun sanassa maas tn sina 1o.nosa rata < me elastasea N can be shoan Dt Pe
manun separata otus ven

S (24 21) HF)
Vinen me assumptons OLOA are satsted. me 20016 egvaten i eguvatent LDA

 

 

 

 

Inversion of 2 x 2 matrices [edit]

matrices. Inversion of these matrices can be done as follows 1)

 

 

The cofactor eguation listed above yieids the following result for 2 x 2

This is possible because 1i(ad— dc) is the reciprocal of the determinant
of the matrix in guestion, and the same strategy could be used for other

Tikhonov regulariaation. named tor Andrey Tinonoi is memaa of eguanaion 01 8-posea protiems Asa
koa as ridge regression (s padicumiy setä o migats me protaan ot myteoänearty m near
regressionmhieh commany orcuss in models miin tampe munbers cf parameters Piin general Ime matnot
Pora mproven etoiency m parameter eslmasen pubiens 1 exharge ra kieradie amour o Dac (see
Dias-vanance trade)!

me smpis case ta patien ota near saraa mammat (XX) & etta 2 aan pose

- a pj 1 PA 1 4-04] || senensmmaoasmuts menun docoessmmmratymsngaconsuat A= ete va
Am = m = sauses pote. sn mat
e d det A|[-c a ad-be|-e a min (y = XO) '(y = X3) + MW'B-)

here A s Pe Layange mute O e constant The minne of e peäsen s e same ge estimato
Bn = (X"X+A1) 'X'y
were Tis me dentty mat ana me e parameter A serves 25 me piste constant sung me Gagonas

 

 

 

 

view T'as a function of P, the AUC can be rewritten as follows.
avo= [ruman
ä
= [ Pöto> Pm =1an
= []P600>F Fon) -1- 9 &

matrix sizes. metedy docreasing me conton number 0 ihe moment mari A more ganaratapprsaeh 10 Täneeov
reguaraton is scussea decn
The ROC curve simpiy plots T(2) against F(t) while varying £ trom 0 to 1. Thus, if we ROC space jea]

The eontinganey taklo can derive several evaluation "mates" (sev intobox). To rasva ROC curve, ordy the vue positive
rate (TPR) and tai positiva rate (FPR) are needed (as tunetions of some elassifer parameter). The TPR defines hos
many eoreetposiiveresuta oseur among al postive samples avaäabie during ihe test FPR, on e otmer hand. deines
hoi many incoreet posilve resuts ocsur among al negative sampies avatakie during the test.

A ROG space is defined by FPR and TPR 25 x and y axes respectively. whieh Cepie's relative vade-ofts belween rue
positive (penefis) and tase positive (costs). Since TPR is guvalen to sensääväy 20d FPR is eava!to 1 specikeäy, ne
ROG grapi a sometimes called ihe sensäiräy vs (1— speeility lot. Each predation resut orinstanse of a confusion
matsi represents one point in the ROC spaee.

 

 

= [] Pit) > 209 =11- Pöte) = 11v62) = jat

[= Po) > BOX) &Be) = 11) — 1800) = at
= Plp(x) > Bx) [1(x) = 18:40x) = 0)
where we used the fact that the probability density function.
Piplx) =t1al') =0|=:J0)
[5 derivatve vi 1ospet to he cumuaive astitulon tunin
PIöle) <tlyx) = 0.

50, given a randomiy chosen observation x belonging 10 less 1, nd a randomiy chosen
observation x belonging to class 0, the AUC is the probabilty ihat ihe evaluated
olassicaion algorithm väl assign a higher score to x than 10 X, i€., Ihe condlliona!
probabity ot P(x) > Bee).

   

 

 

 

 

riomadioinmiään | suoraa mummun
Klz,3) = (s'v+ o)"
snerenana are vertrs m ne mp saa 16. vertoso jana conptea vom anna 1 105 samp an €2 01 a vee pramtnvn
ofmerntuence o Nyheter versus over tari e pay! When c=. kee case hamogeneaus A aine
peneraszedpoyeme! toes x13by auser speed sea parameter!)
Asaremai Keonesponastoan mme: poouet n ans space base on some mappia
(2,3) = (e(2), PD
The ratwre ot can be seen tom an evampia Leid= 2 son get he speicase the Grastane keet Afar ast he mnomalhecrem
(8c0— temastappcatin hs ema Maorem) an E90uan

maa= (Bate) Fen) + 55 (tnn) 810) + Yo 1V8n) (Yi) +

From mis otows mate feature maps ven Dy
la) = (zh... 87, V3ro

 

 

 

 

 

 
 

 

 

 

| Total | Prevalence
| Condition positive . Condition negative 2 Condition positive
| population | | = otsi population”
 Predicted v H s avo [Posiva paina vaa
condition EPT TIT | (PPV), Precision =
Ä. 2 Power Type | error | 2 True positive
Predicted = positive EaKE : | YPredicled condition positive
condition predicted jo
2 False negative, Ma False omission rale (FOR) =
= True negative 2 False negative
; I Predicted condition negative
eo a i | än pä sanit
True E = i FSK - rate |
i SYN | — (FPR), Fallout, — | Positive ikelihoodralio (LR*)
| probabiiity of false alarm | = IPR
poit [= tt 21978] = FPR
3 True
| TE Condon negative |
| i
False negative rate SJ Po. R
| Seleclivity, True Negative likelihood ratio (LR-)
(FNR), Missrate —— FNR
5 False nedative | a rate (TNR) = TNR
5 E Condilion posiliva | = S

Accuracy (ACC) =
itive + X True.
3 Total population

2 True

False discovery rate (FDR) =

2 False positive.
3 Predicted condition positive

Negative predictive value (NPV) =
condition negative.

odds ratio 1
= R+
(DOR) = FR-
Tentin tekstisisältö

Tentin tekstisisältö

Käytämme evästeitä