Exam text content

SGN-45006 Fundamentals of Robot Vision - 06.05.2019

Exam text content

The text is generated with Optical Image Recognition from the original exam file and it can therefore contain erroneus or incomplete information. For example, mathematical symbols cannot be rendered correctly. The text is mainly used for generating search results.

Original exam
SGN-45006 Fundamentals of Robot Vision
Lecturer Esa Rahtu

The maximum number of points for each task is shown in parenthesis. The use of
calculator is allowed, but it is not necessary.

1. Explain briefly the following terms and concepts:

(a) Camera projection matrix (2 p)

(b) Scale Invariant Feature Transform (SIFT) (2 p)

= (c) Camera calibration (2 p)

(d) Structure from motion (2 p)

M (e) Convolutional neural network (2 p)
(£) Boosting sceme in classification (2 p)

2. Model fitting using RANSAC algorithm

(a) Describe the main stages of the RANSAC algorithm in the general case. (2 p)

(b) In this context, why it is usually beneficial to sample minimal subsets of data
points instead of using more data points? (Minimal subsets have the minimal
number of data points reguired for fitting.) (17)

(e) Mention at least two examples of models that can be fitted using RANSAC.
Describe how the models are used in computer vision and what is the size of
the minimal subset of data points reguired for fitting in each case. (17)

(d) Describe how RANSAC can be used for panoramic image stitching. Why is
RANSAC needed and what is the model fitted in this case? (2 p)

3. Geometric 2D transformations

(a) Using homogeneous coordinates, write the matrix form of the following 2D
transformations: translation, similarity (rotation+scaling+translation), affine
and homography. How many degrees of freedom does each transformation have
and why? How many point correspondences are needed to estimate each? (3 p)

(b) A rectangle with corners A = (—1,1), B = (1,1), C = (1,-1), D= (-1,+1)
is transformed by a transformation so that the new corners are A = (1,3),
B! = (3,3), C/ = (2,1), D' = (+6,1), respectively. An affine transformation
does not explain the observations perfectly, but there is reason to believe that
the transformation is affine and there is noise in the observations. Write down
the eguations to solve the transformation using the least sguares method.
Note: You don't actually have to solve the transformation. (3 p)

4. Neural networks

(a) What is a Perceptron? Explain the construction (hint: use picture) and how it
can be trained to perform classification task (assume you have training s
with input feature vector x and class label 1 or -1).

 
 

(b) What are the main differences between convolutional neural networks (CNNs)

and conventiona! fully connected networks? (1 p)
(e) Give a rough example of typically used structures in ONN based models. Il-
lustrate your example with a picture. (17)
(d) Explain what are fenture maps in the context of CNNs. (17)

(e) Explain the main differences between one and two stage object detection meth-
ods? Give example of both types. (1p7)

5. Epipolar geometry and stereo

(a) Figure 1 presents a stereo system with two parallel pinhole cameras separated
by a baseline b so that the centers of the cameras are c; = (0,0,0) and cr =
(b, 0,0). Both cameras have the same focal length f. The point P is located in
front of the cameras and its disparity d is the distance between corresponding
image points, i.e., d= |z1 — zr|. Assume that d=1cm,b=6cmandf=1
cm. Compute Zp. (2 p)
(b) Let's denote the camera projection matrices of two cameras by P1 = [I 0]
and P! = [R t], where R is a rotation matrix and t = (t1, t2, 13)" describes
the translation between the cameras. Show that the epipolar constraint for
corresponding image points x and x/ can be written in the form x'E"x = 0,
where matrix E is the essentia] matrix E = [t]xR. (2 p)
(c) In the configuration illustrated in Figure 1 the camera matrices are P1 = [I 0]
and P! = [I t], where I is the identity matrix and t = (—6,0,0)7. The point
O has coordinates (3,0,3). Compute the image of O on the image plane of
the camera on the left and the corresponding epipolar line on the image plane
of the camera on the right. (Hint: The epipolar line is computed using the
essentia] matrix.) (2 p)

P= (Xp, YP,ZP)

 

Figure 1: Top view of a stereo pair where two pinhole cameras are placed side by side.


We use cookies

This website uses cookies, including third-party cookies, only for necessary purposes such as saving settings on the user's device, keeping track of user sessions and for providing the services included on the website. This website also collects other data, such as the IP address of the user and the type of web browser used. This information is collected to ensure the operation and security of the website. The collected information can also be used by third parties to enable the ordinary operation of the website.

FI / EN