本書是面向高等院校計(jì)算機(jī)相關(guān)專業(yè)的機(jī)器學(xué)習(xí)教材。全書以機(jī)器學(xué)習(xí)應(yīng)用程序的開發(fā)流程為主線,詳細(xì)介紹數(shù)據(jù)預(yù)處理和多種算法模型的概念與原理;以Python 和Spark 為落地工具,使讀者在實(shí)踐中掌握項(xiàng)目代碼編寫、調(diào)試和分析的技能。本書最后兩章是兩個(gè)實(shí)戰(zhàn)項(xiàng)目,舉例講解機(jī)器學(xué)習(xí)的工程應(yīng)用。本書內(nèi)容豐富、結(jié)構(gòu)清晰、語(yǔ)言流暢、案例充實(shí),還配備了豐富的教學(xué)資源,包括源代碼、教案、電子課件和習(xí)題答案,讀者可以在華信教育資源網(wǎng)下載。
孫立煒,廈門南洋職業(yè)學(xué)院大數(shù)據(jù)技術(shù)教研室主任。解放軍電子工程學(xué)院信號(hào)與信息處理專業(yè)碩士研究生,大數(shù)據(jù)高級(jí)分析師。主要研究方向?yàn)閿?shù)據(jù)挖掘、Hadoop大數(shù)據(jù)技術(shù)。在CN刊物公開發(fā)表論文20篇,主編教材1部,主持申報(bào)并獲得軟件著作權(quán)4項(xiàng),主持市級(jí)以上科研課題3項(xiàng),主持精品課程項(xiàng)目1項(xiàng)。
第 1 章 機(jī)器學(xué)習(xí)技術(shù)簡(jiǎn)介 ···············································································1
1.1 機(jī)器學(xué)習(xí)簡(jiǎn)介 ·······················································································1
1.1.1 機(jī)器學(xué)習(xí)的概念············································································1
1.1.2 機(jī)器學(xué)習(xí)的算法模型······································································1
1.1.3 機(jī)器學(xué)習(xí)應(yīng)用程序開發(fā)步驟·····························································2
1.2 機(jī)器學(xué)習(xí)的實(shí)現(xiàn)工具 ··············································································3
1.3 Python 平臺(tái)搭建 ····················································································3
1.3.1 集成開發(fā)環(huán)境 Anaconda ··································································4
1.3.2 集成開發(fā)環(huán)境 PyCharm···································································7
1.3.3 搭建虛擬環(huán)境············································································.10
1.3.4 配置虛擬環(huán)境············································································.13
1.4 Spark 平臺(tái)搭建···················································································.17
1.4.1 Spark 的部署方式·······································································.17
1.4.2 安裝 JDK··················································································.18
1.4.3 安裝 Scala·················································································.21
1.4.4 安裝開發(fā)工具 IDEA ····································································.22
1.4.5 安裝 Spark ················································································.24
1.4.6 安裝 Maven···············································································.25
1.5 基于 Python 創(chuàng)建項(xiàng)目 ··········································································.27
1.6 基于 Spark 創(chuàng)建項(xiàng)目············································································.29
習(xí)題 1 ·····································································································.32
第 2 章 數(shù)據(jù)預(yù)處理 ·····················································································.34
2.1 數(shù)據(jù)預(yù)處理的概念 ··············································································.34
2.1.1 數(shù)據(jù)清洗··················································································.34
2.1.2 數(shù)據(jù)轉(zhuǎn)換··················································································.35
2.2 基于 Python 的數(shù)據(jù)預(yù)處理 ····································································.37
2.3 基于 Spark 的數(shù)據(jù)預(yù)處理······································································.43
習(xí)題 2·······························································································.46
第 3 章 分類模型 ························································································.48
3.1 分類模型的概念 ·················································································.48
3.2 分類模型的算法原理 ···········································································.51
3.2.1 決策樹算法···············································································.51
3.2.2 最近鄰算法···············································································.56
3.2.3 樸素貝葉斯算法·········································································.58
3.2.4 邏輯回歸算法············································································.59
3.2.5 支持向量機(jī)算法·········································································.59
3.3 基于 Python 的分類建模實(shí)例 ·································································.60
3.4 基于 Spark 的分類建模實(shí)例···································································.63
習(xí)題 3 ·····································································································.67
第 4 章 聚類模型 ························································································.70
4.1 聚類模型的概念 ·················································································.70
4.1.1 聚類模型概述············································································.70
4.1.2 聚類模型中的相似度計(jì)算方法·······················································.71
4.1.3 聚類算法的評(píng)價(jià)············································