Exam text content

TIE-22306 Data-Intensive Programming - 16.10.2015

Exam text content

The text is generated with Optical Image Recognition from the original exam file and it can therefore contain erroneus or incomplete information. For example, mathematical symbols cannot be rendered correctly. The text is mainly used for generating search results.

Original exam

TIE-22306 Data-Intensive Programming Exam 16.10.2015

TIE-22306 Data-Intensive Programming

Exam 16.10.2015
Timo Aaltonen

It is forbidden to use any written material such as cheatsheets, lecture notes etc. besides an English
dictionary. Electrical devices (calculators, cell phones, computers, etc.) may not be used during the
exam.

Make sure you have answered all guestions.

Answer shortly and clearly — the answers are not graded based on their length. Incorrect answers
don't normally reduce points. However, the examiner reserves the right to make point reductions, if
answer is completely irrational or in clear contradiction with itself, i.e. clearly a guess

1. Explain shortly the following concepts or [a
1) three Vsof big data (3 p) K z
ii) design principles of Hadoop (4 v) 1
iii) Combiner (2 p) ;
iv) Shared-nothing architecture (2 p) = foc

    
 

 

    

nun Mdepenolenttu ,,.

2. Explain how a HDFS client reads data from the Hadoop Distributed File System. Illustrate
he architecture of the system and include the communication between entities into the
igure. (7 p) = ——

3. Below is a table of the data regarding the electrical consumption of an organization.

 

Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec
2012 |12 10 10 11 10 13 14
2013 |11 10 12 13 18. 13 13
504 10 1853541 5 0:02 05 «6
8 13 16 17

2015 (9 1 11 12

 

 

 

 

 

= [it
58 0
517 35
ea

oOjo|aIa|n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Sketch a Hadoop application for calculating maximum monthly electricity usage for each year (14
for 2012, 13 for 2013, and so on). Cover all phases from the data preparation to the final output.
(10 p)

 

4. Are the following Hadoop-related claims true or false? Grading: correct answer gives +1 p,
no answer 0 p , wrong answer -1 p. (7 p)
1) Map andreduce can happen in parallel.
ii) FileInputFormat splits the file by newline (i.e. it is line-oriented format).
-iii) Pig Latin provides standard relational transforms.
iv) Hive offers a relational view to HDFS files.
v) Checksum can be used for fixing corrupted data.
vi) XML is suitable format for Hadoop applications.
vii) Compressed input files are newer splittable.

We use cookies

This website uses cookies, including third-party cookies, only for necessary purposes such as saving settings on the user's device, keeping track of user sessions and for providing the services included on the website. This website also collects other data, such as the IP address of the user and the type of web browser used. This information is collected to ensure the operation and security of the website. The collected information can also be used by third parties to enable the ordinary operation of the website.

FI / EN