精品深夜AV无码一区二区_伊人久久无码中文字幕_午夜无码伦费影视在线观看_伊人久久无码精品中文字幕

代做IEMS 5730、代寫 c++,Java 程序設計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    精品深夜AV无码一区二区_伊人久久无码中文字幕_午夜无码伦费影视在线观看_伊人久久无码精品中文字幕
    <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp>
    <ul id="e4iaa"></ul>
    <blockquote id="e4iaa"><tfoot id="e4iaa"></tfoot></blockquote>
    • <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp>
      <ul id="e4iaa"></ul>
      <samp id="e4iaa"><tbody id="e4iaa"></tbody></samp><ul id="e4iaa"></ul>
      <ul id="e4iaa"></ul>
      <th id="e4iaa"><menu id="e4iaa"></menu></th>
      天天干天天爱天天操| 丰满人妻av一区二区三区| 久久精品视频7| 亚洲a级黄色片| 国产在线观看免费av| 亚洲理论片在线观看| 欧美特级特黄aaaaaa在线看| 国产 中文 字幕 日韩 在线| 亚洲免费在线观看av| 午夜免费福利网站| 日产欧产va高清| 久久精品在线观看视频| www.偷拍.com| 91久久国语露脸精品国产高跟| 日韩在线视频不卡| 懂色av.com| 999久久久国产| 久久免费在线观看视频| 精品黑人一区二区三区| wwwww黄色| av中文字幕网址| 一级黄色录像毛片| 亚洲一区二区蜜桃| 中文字幕一区二区三区人妻 | 日本xxxx黄色| 黄色一级大片在线免费看国产一| 国产毛片一区二区三区va在线| 亚洲一二三不卡| 午夜福利视频一区二区| 欧美自拍第一页| 天天操天天操天天干| 人人妻人人藻人人爽欧美一区| 欧美特黄aaaaaa| 欧美成人aaa片一区国产精品| 欧美熟妇乱码在线一区| 精品人妻一区二区三区日产乱码| 人人干在线观看| 欧美日韩中文视频| 中文字幕乱码在线| 亚洲精品乱码久久久久久蜜桃图片| 97精品久久人人爽人人爽| 亚洲精品国产久| 亚洲天堂久久新| 国产成人av免费看| 国产伦精品一区二区三区妓女下载 | 精品国产午夜福利在线观看| 国产第一页第二页| 欧美超碰在线观看| 少妇视频一区二区| 国产精品久久久视频| 精品久久久中文字幕人妻| 九九热视频免费| 秋霞av鲁丝片一区二区| 少妇熟女视频一区二区三区| 亚洲精品久久久久久无码色欲四季| 国内精品久久久久久久久久久| 久久久久久久国产精品毛片| 久久视频一区二区三区| 久久综合激情网| 五月婷婷丁香在线| 国产精品国产三级国产专区52| 四虎国产成人精品免费一女五男| 99久久一区二区| 国产一级理论片| 久久一区二区三| 亚洲成人av综合| av免费在线观看不卡| 国产一区二区视频网站| 日韩综合第一页| 91午夜交换视频| 九九免费精品视频| 五月天六月丁香| 国产成人愉拍精品久久| 天堂在线资源视频| 国产美女免费网站| 在线播放黄色av| 久久久久成人精品无码中文字幕| 亚洲视频 中文字幕| 国产一级特黄视频| 日韩久久久久久久久久久| 亚洲一区二区乱码| 久久视频精品在线观看| 亚洲精品久久久中文字幕| 鲁一鲁一鲁一鲁一av| 亚洲精品乱码久久久久久久久久久久 | 中文字幕在线欧美| 国产精品免费在线视频| 日日操免费视频| 国产又粗又黄又爽视频| 中文精品在线观看| 免费国产精品视频| 国产又黄又粗的视频| 中文字幕久久久久| 呻吟揉丰满对白91乃国产区 | 99精品视频国产| 日韩三级视频在线播放| 国产福利短视频| av天堂一区二区三区| 中文字幕日韩三级| 久草视频在线资源| 国产精品久久久久久免费| 一区二区视频在线免费观看| 全部免费毛片在线播放一个| 国产色视频在线| 91精品国产高潮对白| 午夜精品福利在线视频| 日韩精品一区二区亚洲av性色| 无码人妻精品一区二区三区温州| 蜜臀尤物一区二区三区直播| 91精品小视频| 亚洲精品久久久久久国| 少妇精品无码一区二区三区 | jizz中文字幕| 中文字幕在线视频一区二区| 无码人妻久久一区二区三区蜜桃 | av天堂一区二区| 性xxxx视频播放免费| 天堂av免费在线观看| 色av性av丰满av| 日本精品在线免费观看| 超碰福利在线观看| 国产精品熟妇一区二区三区四区| 国产精品视频久久久久久久| 国产精品v日韩精品v在线观看| 加勒比av中文字幕| 国产极品国产极品| 国产麻豆a毛片| 成人午夜淫片100集| 97人妻精品一区二区三区软件| 国产又黄又大又粗的视频| 丰满熟女人妻一区二区三区| 91香蕉国产视频| 成人一二三四区| 精品少妇一二三区| 免费在线观看黄网站| 青青青在线视频| 亚洲第一区第二区第三区| 怡春院在线视频| 337p日本欧洲亚洲大胆张筱雨| 精品人妻伦一二三区久| 国产精品伦理一区| 精品人妻一区二区三区浪潮在线| 日韩不卡高清视频| 亚洲av综合一区| 99九九精品视频| 久久久久久久久99| 日韩三级一区二区| 亚洲 激情 在线| 91看片在线播放| 国产亚洲精品久久久久久豆腐| 天堂中文在线观看视频| 漂亮人妻被黑人久久精品| 无码日韩精品一区二区| 99草在线视频| 九九九九九国产| 在线免费av片| 国产乱女淫av麻豆国产| 青青草免费av| 91精品人妻一区二区三区果冻| 日韩av一卡二卡三卡| 婷婷av一区二区三区| 亚洲综合欧美综合| 精品人妻无码一区二区| 婷婷丁香激情网| www.com国产| 欧美熟妇精品一区二区| 亚洲综合精品在线| 久久亚洲av午夜福利精品一区| va视频在线观看| 久久精品波多野结衣| 无码人妻精品一区二区中文| 国产精品熟女视频| 亚洲av少妇一区二区在线观看 | 色婷婷在线观看视频| 亚洲国产成人精品女人久久| 97精品久久人人爽人人爽| 日本不卡一二区| 国产wwwxx| 中文人妻熟女乱又乱精品| 免费观看特级毛片| 91禁男男在线观看| 天天干天天干天天干| 国产资源中文字幕| jlzzjlzzjlzz亚洲人| 五月婷婷之综合激情| 久久婷婷五月综合| 国产精品人人爽| 亚洲av无码一区二区三区dv| 久久久精品人妻一区二区三区| 国精品人伦一区二区三区蜜桃| 91亚洲国产成人久久精品麻豆 | 国产伦精品一区二区三区视频痴汉 | 精品国产欧美日韩不卡在线观看| 国产高潮免费视频| 无码人妻精品中文字幕| 麻豆精品国产传媒av| 成人国产精品久久久网站| 色天使在线视频| 999精品免费视频| 在线观看亚洲欧美|