DATA.ML.300 Computer Vision - 03.03.2023 (Viikkotentit)
The text is generated with Optical Image Recognition from the original exam file and it can therefore contain erroneus or incomplete information. For example, mathematical symbols cannot be rendered correctly. The text is mainly used for generating search results.
Original examWeek exam 1 1. General questions (a) What is a Gaussian filter and where it can be applied? (b) What is the benefit of using homogenous coordinates in the case of pinhole camera model? (c) How Fourier transform could be used in calculating the linear filtering result? (d) What is a Gaussian image pyramid? 2. Transformations (a) A perspective camera has the following camera matrix: toid P=/|0 11 2 0011 Determine the image points corresponding to 3D point X = (6,2,2). Report your answer in non-homogenous coordinates. (b) Write the matrix equations for 3D similarity, affine, and perspective transformations. Use homogenous coordinates. How many degrees of freedom each transform has and how many point correspondences are needed to estimate them? 3. Homogenous coordinates (a) Convert the following (in-homogenous) points into homogenous coordinates (1, 5), (100, 500), and (4, 4, 1). Similarly, convert the following homogenous points into corresponding in-homogenous form (i.e. to normal coordinates) (1,5,1), (7,1,3), (24,12,6) and (8,6,1,2). What does a homogenous point (1, 1, 0) correspond to? (b) A line ax + by + c = 0 can be presented in a vector form as |=(a,b,c)" and, using ' homogenous coordinates, a point x is on the line | if x™l = 0. The intersection of two lines | i and I’ is given by the vector cross product between | and |’. Similarly, the line | passing though points x and x’ is given by the vector cross product between x and x’. Use homogenous coordinates and above formulas to determine the intersection of lines 11 and 12. The |1 runs through points (2,4) and (8,8), and I2 runs through points (14,10) and (18,6). Hint: The 3D vector cross product is calculated as: a,) (| | a,b,—b,4, ‘ a, |x) 5, |=| 4,2, — 8,4, i a,) \b,) \a,b, -b,a, Week exam 2 1. General questions (a) What is the main goal in image retrieval task? (b) What do hyperparemeters mean in image classification (give one example)? (c) What is k-nearest neighbour classifier? What are its pros and cons? (d) Give one example how pertained classification network can be used in image retrieval? 2. Neural networks (a) What is a Perceptron? Explain the construction (hint: use picture) and how it can be trained to perform classification task (assume you have training samples with input feature vector x and class label 1 or -1). (b) In Figure 1 below you see a very small neural network, which has one input unit, one hidden unit (logistic), and one output unit (linear). The nonlinear function o in the logistic unit is defined by the formula o(z) = 1/(1 + e-2). Let’s consider one training case. For that training case, the input value is 1 (as shown in the figure) and the target output value t is 2. We are using the standard squared loss function: E = (t - y)#/2, where y is the output of the network. The values of the weights and biases are shown in the figure and they have been constructed in such a way that you don’t need a calculator. Hint: the derivative of logistic function is defined as d/dx 6(x) = 6(x)(1-0(x)). Answer the following questions: i. What is the output of the hidden unit and the output unit, for this training case? ii. What is the loss, for this training case? iii. What is the derivative of the loss with respect to w2, for this training case? | | | bias= Linear output unit w= +4 bias= +2-—-»| Logistic hidden unit wi=-2 Figure 1: A simple neural network - Input unit 3. Image retrieval | (a) Describe the bag-of-visual-words image representation technique. How it can be utilised in image retrieval? (b) Figure 3 (below) illustrates a database of four images and corresponding visual words for each image (W1, W2, ...). Construct an inverted index for this example dataset. (c) We have a database of 10 images. Our retrieval algorithm has ranked them in the following order with respect to a given query (see query and ranked database in Figure 2 below). Based on the manual annotations, we know that the images with green box are relevant to the current query. Draw a precision-recall curve for the retrieval result (use the axis given in Fig 2). Database images Corresponding visual words => wi wi wa Dataset size: 10 images Relevant (total): 5 images Precision = #relevant / #returned Recall = #relevant / #total relevant Figure 2 Results (ordered) =| a> ws wi we Figure 3 Week exam 3 1. General questions (a) What is the goal in object category detection and how it differs from image classification and object segmentation? (b) Name the main components in the sliding window based object detector. (c) What is bootstrapping and how it can be used in training detectors? (d) What is the difference between one and two stage CNN object detectors? 2. Classical object detectors (a) Describe different phases in extracting Histogram of Oriented Gradients (HoG). Use picture. (b) The following image (Figure 1) depicts an example detection result. The blue boxes are the know ground truth locations of the objects and the red boxes are the obtained detection. The number next to each detection denotes the corresponding ranking (i.e. detection 1 has the highest classification score, detection 2 next highest, and so on.). The corresponding intersection over union values are: 1) 0.9, 2) 0.57, 3) 0, and 4) 0.49 (i.e. the loU measure for each detection with respect to the highest overlapping ground truth). Draw the corresponding precision-recall curve using 0.5 loU value as a detection threshold. Hint: Precision = #returned correct detections / #returned detections Recall = #detected objects / #total number of objects Figure 1 3. CNN based detectors (a) Explain the main phases in the “CornerNet” object detection approach. (b) The following image (Figure 2) depicts the Faster-RCNN object detector. Shortly describe the objective of each component in the system (i.e. what it takes in and what it aims to produce as an output). ROI classifier and regressor Proposals Feature map Base network Figure 2 Week exam 4 1. General questions (a) What are the main stages in Canny edge detector? (b) Outline the cost function that is minimised when fitting a line with least squares method. (no need to solve it) (c) Why it is usually beneficial to sample minimal subset of data points in RANSAC instead of using more data points? (d) What is the main motivation in using “robust cost functions” in model fitting instead of normal quadratic function used in vanilla Least Square fitting? 2. Local features (a) Figure 1 illustrates three different kinds of local image areas (the box). For each case, explain if it makes a good local keypoint or not. Justify your answer. (local key point = an image point that can be accurately and reliably detected from multiple images from the same scene). Figure 1 (b) Describe how scale normalised Laplacian of Gaussian function (see figure 2) can be used in scale covariant blob detection. og & Hs ,Og ox” oy 2 _ 2 Vom =O Boe ee ee Figure 2. Scale normalised Laplacian of Gaussian 3. Robust model fitting (a) Describe the main stages of the RANSAC algorithm in the general case. (b) What is the idea in Hough transform and how it can be used in model fitting? Give one example. Week exam 5 1. General questions (a) What is the brightness constraint in optical flow estimation? (b) What is so called aperture problem? (c) What are motion field and optical flow? What is the main difference? (d) What kind of features are good for tracking and why? 2. 2D transformations (a) Figure 1 depicts two images of database objects and a scene where they need to be detected. Describe the main steps how this kind of object instance recognition task can be solved using local features and image alignment. For each step, explain the main goal and name at least one method to implement this. Aaya i Figure 1 (b) Figure 2 depicts two images taken from the same scene. Describe the main steps how these images can be aligned to form a panorama image shown in Figure 3. For each step, explain the main goal and name at least one method to implement this. Figure 1: image pair from the same scene Figure 2: panorama image 3. Optical flow and tracking (a) Assume we have two frames obtained at time instants (t-1) and t as shown in Figure 3. In optical flow, our target is to estimate the motion (u,v) of a pixel at position (x,y). Starting from the brightness constraint, derive the optical flow equation: VI -(u,v) +I, =0 How many unknown this equation has per pixel? Hint: I(x +u(x, y), V+V(x, y),t) = I(x, y,t)+1,u(x, y) +I, v(x, y) (ey) \ displacement = (u,v) ° (e+u,y+v) Ia) Figure 3. 1(x,y,t) denotes the brightness of a pixel at position (x,y) at time instant 1. (b) Explain the multi-resolution approach for optical flow estimation. What are the main advantages of the approach? Week exam 6 1. General questions (a) Why the recovery of the scene structure from a single image is an ill-posed problem? (b) What does auto calibration mean in the context of camera calibration? (c) What is the relation between depth and disparity in stereo vision? (d) What is the main difference between essential and fundamental matrices? 2. Camera calibration and single view metrology (a) Briefly explain the “linear method” for camera calibration? What are the pros and cons of this approach? (b) Figure 2 illustrates a scenario, where we are trying to estimate the height H (distance between top T and bottom B) from a single image using the known reference height R. We have detected the image points t, r, and b that correspond to 3D points T, R, B, respectively. The image point vz is the vanishing point in the vertical direction. Show how the height H can be obtained using points t, r, b, and . Hint: use the cross ration of four points defined as 1 P;—Pil [P, — Pal ha -P,| IP, ~ P\| T (top of object) R (reference point) Figure 1: Estimating the height H from a single image using reference height R. Vue 3. Epipolar geometry and stereo (a) What is epipolar line and how it relates observations in two cameras? How Essential and Fundamental matrices are related to these? (b) Figure 2 presents a stereo system with two parallel pinhole cameras separated by a baseline b so that the centers of the cameras are cj = (0,0,0) and c; =(b,0,0). Both cameras have the same focal length f. The point P is located in front of the cameras and its disparity d is the distance between corresponding image points, i.e., d=|xi - xr. Assume that d = 4 cm, b = 12 cm, and f = 2 cm. Compute Zp. P= (Xp,Yp,Zp) Figure 2: Top view of a stereo pair where two pinhole cameras are placed side by side. Week exam 7 1. General questions (a) What is the projective ambiguity in the context of Structure from Motion. (b) What are the main differences between multi-view stereo and Structure from Motion? (c) What is bundle adjustment and why it is important in Structure from Motion? (d) What is inverse depth and why it is used in some multi-view stereo applications? 2. Structure from Motion (a) We know that all images used in Structure from Motion (SfM) are captured by a single moving camera. How this information can be used to “upgrade” projective SfM solution? Give rough idea how can be done (no need to solve). (b) You are given m images of n fixed 3D points l.e. you have m cameras, n 3D points, and each point is detected in every camera (see illustration in Figure 1). The task is to estimate m projection matrices Pj and n 3D points Xj from mn correspondences xj (up to projective transformation). Explain the main steps in solving this problem in sequential manner. Aux, = PIX, L=1,..,m, fal ...n Figure 1: Illustration of the setup in Structure from Motion problem in the case of 3 cameras. 3. Multi-view geometry (a) Explain the main principle in epipolar geometry based multi-view stereo reconstruction approach. (b) What is space carving method and how it works in multi-view stereo reconstruction?
This website uses cookies, including third-party cookies, only for necessary purposes such as saving settings on the user's device, keeping track of user sessions and for providing the services included on the website. This website also collects other data, such as the IP address of the user and the type of web browser used. This information is collected to ensure the operation and security of the website. The collected information can also be used by third parties to enable the ordinary operation of the website.