Mastering Machine Learning on AWS

Dr. Saket S.R. Mengle Maximo Gurmendez

更新时间：2021-06-24 14:23:51

coverpage

Title Page

Mastering Machine Learning on AWS

Dedication

About Packt

Why subscribe?

Packt.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Machine Learning on AWS

Getting Started with Machine Learning for AWS

How AWS empowers data scientists

Using AWS tools for ML

Identifying candidate problems that can be solved using ML

The ML project life cycle

Data gathering

Evaluation metrics

Algorithm selection

Deploying models

Summary

Exercises

Section 2: Implementing Machine Learning Algorithms at Scale on AWS

Classifying Twitter Feeds with Naive Bayes

Classification algorithms

Feature types

Nominal features

Ordinal features

Continuous features

Naive Bayes classifier

Bayes' theorem

Posterior

Likelihood

Prior probability

Evidence

How the Naive Bayes algorithm works

Classifying text with language models

Collecting the tweets

Preparing the data

Building a Naive Bayes model through SageMaker notebooks

Naïve Bayes model on SageMaker notebooks using Apache Spark

Using SageMaker's BlazingText built-in ML service

Naive Bayes – pros and cons

Summary

Exercises

Predicting House Value with Regression Algorithms

Predicting the price of houses

Understanding linear regression

Linear least squares estimation

Maximum likelihood estimation

Gradient descent

Evaluating regression models

Mean absolute error

Mean squared error

Root mean squared error

R-squared

Implementing linear regression through scikit-learn

Implementing linear regression through Apache Spark

Implementing linear regression through SageMaker's Linear Learner

Understanding logistic regression

Logistic regression in Spark

Pros and cons of linear models

Summary

Predicting User Behavior with Tree-Based Methods

Understanding decision trees

Recursive splitting

Types of decision trees

Cost functions

Gini Impurity

Information gain

Criteria to stop splitting trees

Understanding random forest algorithms

Understanding gradient-boosting algorithms

Predicting clicks on log streams

Introduction to Elastic MapReduce (EMR)

Training with Apache Spark on EMR

Getting the data

Preparing the data

Categorical encoding

One-hot encoding

Training a model

Evaluating our model

Area under the ROC curve

Area under the precision-recall curve

Training tree ensembles on EMR

Training gradient-boosted trees with the SageMaker services

Preparing the data

Training with SageMaker XGBoost

Applying and evaluating the model