精品深夜AV无码一区二区_伊人久久无码中文字幕_午夜无码伦费影视在线观看_伊人久久无码精品中文字幕

代做IEMS 5730、代寫 c++,Java 程序設計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    精品深夜AV无码一区二区_伊人久久无码中文字幕_午夜无码伦费影视在线观看_伊人久久无码精品中文字幕
    <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp>
    <ul id="e4iaa"></ul>
    <blockquote id="e4iaa"><tfoot id="e4iaa"></tfoot></blockquote>
    • <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp>
      <ul id="e4iaa"></ul>
      <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp><ul id="e4iaa"></ul>
      <ul id="e4iaa"></ul>
      <th id="e4iaa"><menu id="e4iaa"></menu></th>
      成人av动漫网站| 老司机精品视频线观看86| 亚洲精品在线一区二区| 欧美欧美欧美欧美| 7777精品伊人久久久大香线蕉| 91在线观看免费视频| 成人深夜福利app| 99天天综合性| 欧美在线短视频| 欧美日韩欧美一区二区| 欧美一区二区三级| 久久天天做天天爱综合色| 国产无人区一区二区三区| 国产欧美日韩精品一区| 国产精品高清亚洲| 亚洲成人在线观看视频| 日本网站在线观看一区二区三区| 免费成人在线视频观看| 国产精品88av| 日本丶国产丶欧美色综合| 欧美亚一区二区| 日韩精品在线看片z| 国产精品女同一区二区三区| 亚洲天堂免费看| 免费人成精品欧美精品| 国产福利一区二区三区在线视频| 91丝袜美女网| 欧美一区二区三区性视频| 久久久99久久精品欧美| 亚洲色图制服诱惑 | 久久蜜桃一区二区| 亚洲天堂中文字幕| 美女脱光内衣内裤视频久久网站 | 色婷婷综合激情| 日韩欧美另类在线| 亚洲视频在线一区| 精品亚洲成av人在线观看| 色综合网站在线| 久久蜜桃av一区二区天堂| 亚洲欧美日韩中文播放 | 亚洲在线观看免费视频| 国产在线不卡视频| 欧美日韩亚洲综合一区二区三区 | 丁香亚洲综合激情啪啪综合| 欧美日韩视频在线一区二区| 国产精品午夜久久| 麻豆精品国产91久久久久久| 91福利在线播放| 国产欧美一区二区三区网站| 日韩在线a电影| 色狠狠桃花综合| 国产精品久久久久一区二区三区| 免费成人在线视频观看| 欧美三区免费完整视频在线观看| 中文字幕欧美日本乱码一线二线| 热久久国产精品| 欧美裸体一区二区三区| 亚洲免费在线播放| av午夜精品一区二区三区| 国产亚洲成aⅴ人片在线观看| 美女网站在线免费欧美精品| 欧美天堂一区二区三区| 亚洲特级片在线| av成人免费在线| 综合网在线视频| 99视频精品全部免费在线| 欧美激情在线观看视频免费| 国产一区二区看久久| 日韩欧美一区二区视频| 美女视频黄a大片欧美| 91精品国产91久久久久久最新毛片| 亚洲激情网站免费观看| 91日韩在线专区| 亚洲综合在线观看视频| 在线观看视频欧美| 亚洲成人自拍一区| 6080国产精品一区二区| 日本一区中文字幕| 日韩一区二区电影在线| 老司机免费视频一区二区| 日韩美女视频在线| 国产在线国偷精品免费看| 久久精品综合网| 91视频免费看| 午夜精品久久久久久久久| 欧美一级片在线看| 国产一区二区三区美女| 国产视频一区在线播放| va亚洲va日韩不卡在线观看| 亚洲美女少妇撒尿| 欧美老肥妇做.爰bbww| 日韩电影免费在线| 国产日韩成人精品| 99精品热视频| 午夜激情久久久| 久久精品欧美一区二区三区麻豆| 成人精品一区二区三区中文字幕| 亚洲精品精品亚洲| 日韩一级精品视频在线观看| 国产成人av影院| 婷婷中文字幕综合| 国产亚洲成aⅴ人片在线观看 | 精品一区二区免费视频| 久久精品亚洲麻豆av一区二区| 成人av综合在线| 肉色丝袜一区二区| 中文成人av在线| 欧美精选在线播放| 粉嫩蜜臀av国产精品网站| 一区二区三区不卡在线观看| 精品免费99久久| 在线一区二区三区做爰视频网站| 六月丁香婷婷久久| 亚洲猫色日本管| 久久久久九九视频| 91 com成人网| a在线欧美一区| 国产精品一区二区在线观看网站| 亚洲精品欧美在线| 国产精品三级av| 欧美精品一区二区三区蜜桃视频 | 欧美一区二区三区播放老司机| 播五月开心婷婷综合| 美腿丝袜亚洲三区| 亚洲一区二区欧美激情| 国产精品久久久久久久浪潮网站| 91精品国产色综合久久久蜜香臀| 97aⅴ精品视频一二三区| 国内国产精品久久| 日韩电影一区二区三区四区| 亚洲自拍另类综合| ...xxx性欧美| 国产精品色婷婷| 久久久久亚洲综合| 精品成人一区二区三区| 欧美不卡视频一区| 欧美一区二区三区的| 欧美日韩免费电影| 欧美日韩色综合| 色呦呦国产精品| 91在线国产福利| 99精品在线观看视频| 99久久综合国产精品| 国产精品88888| 福利一区福利二区| 国产盗摄一区二区| 粉嫩av一区二区三区在线播放| 国产美女视频91| 国产成人免费在线观看| 国内外精品视频| 国产成人综合亚洲网站| 国产麻豆日韩欧美久久| 国产尤物一区二区在线| 国产精品资源站在线| 国产高清亚洲一区| 成人动漫中文字幕| 色综合久久久久| 欧美日韩视频不卡| 欧美成人bangbros| 国产日韩v精品一区二区| 国产精品久久久久一区| 尤物视频一区二区| 日韩中文字幕一区二区三区| 久久精品国产99国产| 国产精品99久久久久久宅男| 成人一区二区三区在线观看 | 人人爽香蕉精品| 国产麻豆视频一区| 91丨porny丨中文| 欧美日韩一区二区电影| 精品久久久久久久久久久久久久久 | 成人午夜av在线| 色婷婷激情一区二区三区| 欧美日韩久久不卡| 久久久精品国产免大香伊| 亚洲欧洲av一区二区三区久久| 一区二区三区国产精华| 蜜臀精品久久久久久蜜臀| 国产盗摄一区二区| 欧美日韩色一区| 国产亚洲精品资源在线26u| 亚洲精品乱码久久久久久久久 | 久久美女高清视频| 亚洲女同ⅹxx女同tv| 麻豆成人久久精品二区三区小说| 高清shemale亚洲人妖| 欧美日韩一区在线观看| www国产亚洲精品久久麻豆| 亚洲精品国产a| 国产成人日日夜夜| 欧美一区二区私人影院日本| 中文字幕欧美三区| 免费在线视频一区| 欧美在线一区二区三区| 国产日韩欧美高清| 免播放器亚洲一区| 欧美日韩午夜精品| 亚洲丝袜精品丝袜在线| 国产美女视频91| 欧美一级久久久|