SGN-41007 Pattern Recognition and Machine Learning - 12.12.2019
Teksti on luotu tekstintunnistuksella alkuperäisestä tenttitiedostosta, joten se voi sisältää virheellistä tai puutteellista tietoa. Esimerkiksi matemaattisia merkkejä ei voida esitää oikein. Tekstiä käytetään pääasiassa hakutulosten luomiseen.
Alkuperäinen tenttiSGN-41007 Pattern Recognition and Machine Learning Exam 12.12.2019 Heikki Huttunen > Use of calculator is allowed. > Use of other materials is not allowed. > The exam guestions need not be returned after the exam. > You may answer in English or Finnish. 1. Are the following statements true or false? No need to justify your answer, just T or F. Correct answer: 1 pts, wrong answer: < pts, no answer 0 pts. (a) Maximum likelihood estimators are unbiased. (b) Least sguares estimator minimizes the sguared distance between the data and the model. (c) Mobilenets were the first to introduce a shortcut (residual) connection between layers. (d) The number of support vectors of a support vector machine eguals the total number of samples. (e) The LDA maximizes the within-class distance of samples in each class. (f) Cross-validation is used for model accuracy evaluation. 2. Consider the model xn] = A exp(—n) sin(0n) + w[n], n =0,1,...,N—1, where w[n] — /(0, 0?) and O is a known real number. In other words, we assume that our measurement is a damped sinusoid at known freguency and phase and want to estimate the amplitude A. Derive the maximum likelihood estimator of A. 3. Consider the Keras model defined in Listing 1. Inputs are 224 x 224 color images from 17 categories. (a) Compute the number of parameters for each layer, and their total number over all layers. (b) Compute the number of multiplications reguired on the first convolutional layer. 4. [n this task, you will design both an unregularized and a regularized LDA classifier. (a) Compute the LDA weight vector for (2 — (0 mo=10 3 mE=1g 1 1 1 1 2=(; )) n=(; 3] (b) Compute the regularized LDA with N = 100. You may use the Wikipedia pages at the end of the exam paper. Prediction True label SmpkT| 08.1. Sample 2 0.5 1 Sample 3 0.6 0 Sample 4 0.1 0 Table 1: Results on test data for guestion 5a. Layer (type) Output Shape conv2d 1 (Conv2D) (None, 224, 224, 32) max pooling2d (MaxPooling2D) (None, 112, 112, 32) conv2d 2 (Conv2D) (None, 112, 112, 32) max pooling2d 1 (MaxPooling2 (None, 56, 56, 32) conv2d 3 (Conv2D) (None, 56, 56, 32) max pooling2d 2 (MaxPooling2 (None, 28, 28, 32) conv2d 4 (Conv2D) (None, 28, 28, 32) max pooling2d 3 (MaxPooling2 (None, 14, 14, 32) flatten (Flatten) (None, 6272) dense (Dense) (None, 17) Total params: Trainable par. Non-trainable params: O Figure 1: Model structure of Ouestion 3. 5, (a) A random forest dlassifier is trained on training data set and the predict proba method is applied on the test data of Table 1. Draw the receiver operating character- istic curve. What is the Area Under Curve (AUC) score? (b) Draw the precision recall curve. What is the Area Under PR Curve (AUPRC) score? Related Wikipedia pages Another eompiiation in applying LDA and Fisher's discriminant (o real data occurs when the number of measuremenis of each sample (i.e., Ihe dimensionality of each data vector) exceeds the number of sampies in each class! In this case, the covariance estimatas do not have ful rank, and so cannot be | inverted. There are a number of ways to deal with lis. One is to use a pseudo inverse instead of the | | vsuai malrivinverse in the above formulae. However, better numeie stability may be achieved by first The temis Fishers near serminant an LOA are olen used mtetehangeaty aanougn Fishers orinat artikkel! aetuaty Gescres a sipndy ditterent iserminant which does not make some one assurpilons of LDA such 25 nomally eistncuted classes or eg lass covananees Suppose tuo classes of observations have means jy i; and coraianees Dy, 21. Then tne near combination of Jeatures (+ 3 s have means 5 ji, and varanees GT Yi (ori = 0,1. Fisher detined me separation tetveen these projecting Ihe problem onto Ine subspare spanned by Zs (| Anothar strategy to deal viin smalt sample size is to usa a shrinkage estimator of the covatianeo matrix, which can be expressed maihemativally as T= (1-54 the tamework of ragularized diseriminant analysis!" or shrinkage asserminant analysis [2 where I is ineidentity matrix, and A is the shrinkage intensity or regutarisation parameter. This leads to tso istbutionst9 be he rata of he vaiance Detseen Ine classes 19 he variance viin he classes s — (8-0 - int Tmomeatun sanassa maas tn sina 1o.nosa rata < me elastasea N can be shoan Dt Pe manun separata otus ven S (24 21) HF) Vinen me assumptons OLOA are satsted. me 20016 egvaten i eguvatent LDA Inversion of 2 x 2 matrices [edit] matrices. Inversion of these matrices can be done as follows 1) The cofactor eguation listed above yieids the following result for 2 x 2 This is possible because 1i(ad— dc) is the reciprocal of the determinant of the matrix in guestion, and the same strategy could be used for other Tikhonov regulariaation. named tor Andrey Tinonoi is memaa of eguanaion 01 8-posea protiems Asa koa as ridge regression (s padicumiy setä o migats me protaan ot myteoänearty m near regressionmhieh commany orcuss in models miin tampe munbers cf parameters Piin general Ime matnot Pora mproven etoiency m parameter eslmasen pubiens 1 exharge ra kieradie amour o Dac (see Dias-vanance trade)! me smpis case ta patien ota near saraa mammat (XX) & etta 2 aan pose - a pj 1 PA 1 4-04] || senensmmaoasmuts menun docoessmmmratymsngaconsuat A= ete va Am = m = sauses pote. sn mat e d det A|[-c a ad-be|-e a min (y = XO) '(y = X3) + MW'B-) here A s Pe Layange mute O e constant The minne of e peäsen s e same ge estimato Bn = (X"X+A1) 'X'y were Tis me dentty mat ana me e parameter A serves 25 me piste constant sung me Gagonas view T'as a function of P, the AUC can be rewritten as follows. avo= [ruman ä = [ Pöto> Pm =1an = []P600>F Fon) -1- 9 & matrix sizes. metedy docreasing me conton number 0 ihe moment mari A more ganaratapprsaeh 10 Täneeov reguaraton is scussea decn The ROC curve simpiy plots T(2) against F(t) while varying £ trom 0 to 1. Thus, if we ROC space jea] The eontinganey taklo can derive several evaluation "mates" (sev intobox). To rasva ROC curve, ordy the vue positive rate (TPR) and tai positiva rate (FPR) are needed (as tunetions of some elassifer parameter). The TPR defines hos many eoreetposiiveresuta oseur among al postive samples avaäabie during ihe test FPR, on e otmer hand. deines hoi many incoreet posilve resuts ocsur among al negative sampies avatakie during the test. A ROG space is defined by FPR and TPR 25 x and y axes respectively. whieh Cepie's relative vade-ofts belween rue positive (penefis) and tase positive (costs). Since TPR is guvalen to sensääväy 20d FPR is eava!to 1 specikeäy, ne ROG grapi a sometimes called ihe sensäiräy vs (1— speeility lot. Each predation resut orinstanse of a confusion matsi represents one point in the ROC spaee. = [] Pit) > 209 =11- Pöte) = 11v62) = jat [= Po) > BOX) &Be) = 11) — 1800) = at = Plp(x) > Bx) [1(x) = 18:40x) = 0) where we used the fact that the probability density function. Piplx) =t1al') =0|=:J0) [5 derivatve vi 1ospet to he cumuaive astitulon tunin PIöle) <tlyx) = 0. 50, given a randomiy chosen observation x belonging 10 less 1, nd a randomiy chosen observation x belonging to class 0, the AUC is the probabilty ihat ihe evaluated olassicaion algorithm väl assign a higher score to x than 10 X, i€., Ihe condlliona! probabity ot P(x) > Bee). riomadioinmiään | suoraa mummun Klz,3) = (s'v+ o)" snerenana are vertrs m ne mp saa 16. vertoso jana conptea vom anna 1 105 samp an €2 01 a vee pramtnvn ofmerntuence o Nyheter versus over tari e pay! When c=. kee case hamogeneaus A aine peneraszedpoyeme! toes x13by auser speed sea parameter!) Asaremai Keonesponastoan mme: poouet n ans space base on some mappia (2,3) = (e(2), PD The ratwre ot can be seen tom an evampia Leid= 2 son get he speicase the Grastane keet Afar ast he mnomalhecrem (8c0— temastappcatin hs ema Maorem) an E90uan maa= (Bate) Fen) + 55 (tnn) 810) + Yo 1V8n) (Yi) + From mis otows mate feature maps ven Dy la) = (zh... 87, V3ro | Total | Prevalence | Condition positive . Condition negative 2 Condition positive | population | | = otsi population” Predicted v H s avo [Posiva paina vaa condition EPT TIT | (PPV), Precision = Ä. 2 Power Type | error | 2 True positive Predicted = positive EaKE : | YPredicled condition positive condition predicted jo 2 False negative, Ma False omission rale (FOR) = = True negative 2 False negative ; I Predicted condition negative eo a i | än pä sanit True E = i FSK - rate | i SYN | — (FPR), Fallout, — | Positive ikelihoodralio (LR*) | probabiiity of false alarm | = IPR poit [= tt 21978] = FPR 3 True | TE Condon negative | | i False negative rate SJ Po. R | Seleclivity, True Negative likelihood ratio (LR-) (FNR), Missrate —— FNR 5 False nedative | a rate (TNR) = TNR 5 E Condilion posiliva | = S Accuracy (ACC) = itive + X True. 3 Total population 2 True False discovery rate (FDR) = 2 False positive. 3 Predicted condition positive Negative predictive value (NPV) = condition negative. odds ratio 1 = R+ (DOR) = FR-
Tämä sivusto käyttää evästeitä, mukaanlukien kolmansien puolten evästeitä, vain sivuston toiminnan kannalta välttämättömiin tarkoituksiin, kuten asetusten tallentamiseen käyttäjän laitteelle, käyttäjäistuntojen ylläpitoon ja palvelujen toiminnan mahdollistamiseen. Sivusto kerää käyttäjästä myös muuta tietoa, kuten käyttäjän IP-osoitteen ja selaimen tyypin. Tätä tietoa käytetään sivuston toiminnan ja tietoturvallisuuden varmistamiseen. Kerättyä tietoa voi päätyä myös kolmansien osapuolten käsiteltäväksi sivuston palvelujen tavanomaisen toiminnan seurauksena.