环境搭建的文章对于简单的不再更新了,稍微复杂点的再写博客记录吧

参考文档地址(持续新增)

标题 地址
Hadoop Yarn Container 资源分配 https://blog.csdn.net/szh1124/article/details/76178699?tdsourcetag=s_pcqq_aiomsg
map数量如何控制 https://www.cnblogs.com/junneyang/p/5850440.html
MapReduce 程序内部数据处理流程全解析 http://www.aboutyun.com/thread-15494-1-2.html
https://blog.csdn.net/qq_17776287/article/details/78176515
https://blog.csdn.net/aa518189/article/details/80020857
maptask 并行度决定机制 https://blog.csdn.net/tototuzuoquan/article/details/72851603
https://blog.csdn.net/wu_cai/article/details/77869610
Shuffle过程介绍 https://blog.csdn.net/txbsw/article/details/80760285
详解 WordCount 运行后历史日志记录 https://blog.csdn.net/u011414200/article/details/50372331
Hadoop HDFS高可用(HA) https://blog.csdn.net/bingduanlbd/article/details/51946540
理解Hadoop YARN架构 https://blog.csdn.net/bingduanlbd/article/details/51880019
MapReduce shuffle过程剖析及调优 https://blog.csdn.net/bingduanlbd/article/details/51933914
MapReduce shuffle过程详解 https://blog.csdn.net/u014374284/article/details/49205885
Hadoop MapReduce原理及实例 https://blog.csdn.net/bingduanlbd/article/details/51924398
LCS
HMM隐马尔科夫
协同过滤
贝叶斯 https://baike.baidu.com/tashuo/browse/content?id=bbf9b41cf33c09f9d4856750&lemmaId=&fromLemmaModule=pcBottom
朴素贝叶斯 https://blog.csdn.net/fisherming/article/details/79509025
https://blog.csdn.net/li8zi8fa/article/details/76176597
逻辑回归 https://blog.csdn.net/saltriver/article/details/63681339
逻辑回归——梯度下降法 https://www.jianshu.com/p/c7e642877b0e

虚拟机配置

虚拟机: Vmware12


操作系统: CentOS7.5


系统内常用组件: lrzsz、vim、yum、wget


主从机地址:
Master —> 192.168.202.10
Slave1 —> 192.168.202.11
Slave2 —> 192.168.202.12


软件版本(及时更新)
Anaconda3 用于切换python2和3版本和管理python所需依赖
Maria DB
Hadoop2.7.7
HBase1.3.1
Zookeeper 3.4.11
Scala 2.11.12
Spark 2.0.2
Hive 1.2.2
Flume 1.9.0
Kafka 2.11-2.1.0