2014 MachineLearningAnAlgorithmicPer
- (Marsland, 2014) ⇒ Stephen Marsland. (2014). “Machine Learning: An Algorithmic Perspective, Second Edition.” Chapman & Hall/CRC. ISBN:1466583282, 9781466583283
Subject Headings: ML Textbook; Accuracy Metrics; Activation Function; AdaBoost; Approximate Inference; Associative Memory; Auto-Associative Network; Back-Propagation of Error; Basis Expansion; Batch Training; Baum–Welch; Forward–Backward Algorithm; Bayesian Networks; Bias Input; Boundary Conditions; Competitive Learning; Computing the Posterior; Confusion Matrix; Conjugate Gradients; Constrained Optimisation Problem; Continuous Hopfield Network; Covariance Functions; Cubic Spline; Curse of Dimensionality; Data Compression; Data Preparation; Deep Belief Networks (DBN); Directed Belief Network; Distance Measures; Energy Function; Entropy in Information Theory; Error Function; Evaluating Fitness; Exclusive Or (XOR) Function; Exhaustive Search; Expectation-Maximisation (EM) Algorithm; Fitting the Spline to the Data; Forward Algorithm; Gaussian Random Numbers; Genetic Algorithms; Gibbs Sampling; Gini Impurity; Hidden Layers; ID3; Information Criteria; Initialising the Weights; KD-Tree; Kalman Filter; Laplace Approximation; Learning Rate; Learning the Parameters; Levenberg–Marquardt Algorithm; Logic Functions; Markov Chains; Markov Decision Processes; Markov Property; McCulloch and Pitts Neuronal Model; McCulloch and Pitts Neurons; Measurement Precision; Metropolis–Hastings Algorithm; Stochastic Gradient Descent; Minimising Risk; Multi-Class Classification; Multi-Dimensional Scaling (MDS); Multi-layer Perceptron; Nearest Neighbour Smoothing; Network Dimensionality; Non-Linearly Separable Problem; Overfitting; PCA; Particle Filter; Perceptron Convergence Theorem; Perceptron Learning Algorithm; Pima Indian Dataset; Random Numbers; Receiver Operator Characteristic (ROC) Curve; Regression Problem; Regression in Trees; Restricted Boltzmann Machine; Reward Function; SOM Algorithm; SVM Regression; Sequential Training; Simulated Annealing; Slack Variables; Smoothing Splines; String Representation; Stumping; Supervised Learning; Testing Dataset; The Four Peaks Problem; The Knapsack Problem; Time-Series Prediction; Training Data; Training Dataset; Training Neural Networks; Unbalanced Datasets; Validation Dataset; Viterbi Algorithm; Weight Update Rule; When to Stop Learning; k-Means Neural Network.
Notes
Cited By
- http://scholar.google.com/scholar?q=%222014%22+Machine+Learning%3A+An+Algorithmic+Perspective%2C+Second+Edition
- http://dl.acm.org/citation.cfm?id=2692349&preflayout=flat#citedby
Quotes
Abstract
A proven, hands-on approach for students without a strong statistical foundation since the best-selling first edition was published, there have been several prominent developments in the field of machine learning, including the increasing work on the statistical interpretations of machine learning algorithms. Unfortunately, computer science students without a strong statistical background often find it hard to get started in this area. Remedying this deficiency, Machine Learning: An Algorithmic Perspective, Second Edition helps students understand the algorithms of machine learning. It puts them on a path toward mastering the relevant mathematics and statistics as well as the necessary programming and experimentation. New to the Second Edition two new chapters on deep belief networks and Gaussian processes. Reorganization of the chapters to make a more natural flow of content. Revision of the support vector machine material, including a simple implementation for experiments. New material on random forests, the perceptron convergence theorem, accuracy methods, and conjugate gradient optimization for the multi-layer perceptron. Additional discussions of the Kalman and particle filters Improved code, including better use of naming conventions in Python. Suitable for both an introductory one-semester course and more advanced courses, the text strongly encourages students to practice with the code. Each chapter includes detailed examples along with further reading and problems. All of the code used to create the examples is available on the authors website.
- Features
- Reflects recent developments in machine learning, including the rise of deep belief networks
- Presents the necessary preliminaries, including basic probability and statistics.
- Discusses supervised learning using neural networks.
- Covers dimensionality reduction, the EM algorithm, nearest neighbor methods, optimal decision boundaries, kernel methods, and optimization.
- Describes evolutionary learning, reinforcement learning, tree-based learners, and methods to combine the predictions of many learners.
- Examines the importance of unsupervised learning, with a focus on the self-organizing feature map.
- Explores modern, statistically based approaches to machine learning.
- Provides working Python code for all the algorithms on the author’s website, enabling students to experiment with the code
Table of Contents
CHAPTER 1 Introduction 1 1.1 IF DATA HAD MASS, THE EARTH WOULD BE A BLACK HOLE 1 1.2 LEARNING 4 1.2.1 Machine Learning 4 1.3 TYPES OF MACHINE LEARNING 5 1.4 SUPERVISED LEARNING 6 1.4.1 Regression 6 1.4.2 Classification 8 1.5 THE MACHINE LEARNING PROCESS 10 1.6 A NOTE ON PROGRAMMING 11 1.7 A ROADMAP TO THE BOOK 12 FURTHER READING 13
CHAPTER 2 Preliminaries 15 2.1 SOME TERMINOLOGY 15 2.1.1 Weight Space 16 2.1.2 The Curse of Dimensionality 17 2.2 KNOWING WHAT YOU KNOW: TESTING MACHINE LEARNING ALGORITHMS 19 2.2.1 Overfitting 19 2.2.2 Training, Testing, and Validation Sets 20 2.2.3 The Confusion Matrix 21 2.2.4 Accuracy Metrics 22 2.2.5 The Receiver Operator Characteristic (ROC) Curve 24 2.2.6 Unbalanced Datasets 25 2.2.7 Measurement Precision 25 2.3 TURNING DATA INTO PROBABILITIES 27 2.3.1 Minimising Risk 30 SOME BASIC STATISTICS 32 2.4.1 Averages 32 2.4.2 Variance and Covariance 32 2.4.3 The Gaussian 34 2.5 THE BIAS-VARIANCE TRADEOFF 35 FURTHER READING 36 PRACTICE QUESTIONS 37
CHAPTER 3 Neurons, Neural Networks, and Linear Discriminants 39 3.1 THE BRAIN AND THE NEURON 39 3.1.1 Hebb’s Rule 40 3.1.2 McCulloch and Pitts Neurons 40 3.1.3 Limitations of the McCulloch and Pitts Neuronal Model 42 3.2 NEURAL NETWORKS 43 3.3 THE PERCEPTRON 43 3.3.1 The Learning Rate 46 3.3.2 The Bias Input 46 3.3.3 The Perceptron Learning Algorithm 47 3.3.4 An Example of Perceptron Learning: Logic Functions 48 3.3.5 Implementation 49 3.4 LINEAR SEPARABILITY 55 3.4.1 The Perceptron Convergence Theorem 57 3.4.2 The Exclusive Or (XOR) Function 58 3.4.3 A Useful Insight 59 3.4.4 Another Example: The Pima Indian Dataset 61 3.4.5 Preprocessing: Data Preparation 63 3.5 LINEAR REGRESSION 64 3.5.1 Linear Regression Examples 66 FURTHER READING 67 PRACTICE QUESTIONS 68
CHAPTER 4 The Multi-layer Perceptron 71 4.1 GOING FORWARDS 73 4.1.1 Biases 73 4.2 GOING BACKWARDS: BACK-PROPAGATION OF ERROR 74 4.2.1 The Multi-layer Perceptron Algorithm 77 4.2.2 Initialising the Weights 80 4.2.3 Different Output Activation Functions 81 4.2.4 Sequential and Batch Training 82 4.2.5 Local Minima 82 4.2.6 Picking Up Momentum 84 4.2.7 Minibatches and Stochastic Gradient Descent 85 4.2.8 Other Improvements 85 4.3 THE MULTI-LAYER PERCEPTRON IN PRACTICE 85 4.3.1 Amount of Training Data 86 4.3.2 Number of Hidden Layers 86 4.3.3 When to Stop Learning 88 4.4 EXAMPLES OF USING THE MLP 89 4.4.1 A Regression Problem 89 4.4.2 Classification with the MLP 92 4.4.3 A Classification Example: The Iris Dataset 93 4.4.4 Time-Series Prediction 95 4.4.5 Data Compression: The Auto-Associative Network 97 4.5 A RECIPE FOR USING THE MLP 100 4.6 DERIVING BACK-PROPAGATION 101 4.6.1 The Network Output and the Error 101 4.6.2 The Error of the Network 102 4.6.3 Requirements of an Activation Function 103 4.6.4 Back-Propagation of Error 104 4.6.5 The Output Activation Functions 107 4.6.6 An Alternative Error Function 108 FURTHER READING 108 PRACTICE QUESTIONS 109
CHAPTER 5 Radial Basis Functions and Splines 111 5.1 RECEPTIVE FIELDS 111 5.2 THE RADIAL BASIS FUNCTION (RBF) NETWORK 114 5.2.1 Training the RBF Network 117 5.3 INTERPOLATION AND BASIS FUNCTIONS 119 5.3.1 Bases and Basis Expansion 122 5.3.2 The Cubic Spline 123 5.3.3 Fitting the Spline to the Data 123 5.3.4 Smoothing Splines 124 5.3.5 Higher Dimensions 125 5.3.6 Beyond the Bounds 127 FURTHER READING 127 PRACTICE QUESTIONS 128
CHAPTER 6 Dimensionality Reduction 129 6.1 LINEAR DISCRIMINANT ANALYSIS (LDA) 130 6.2 PRINCIPAL COMPONENTS ANALYSIS (PCA) 133 6.2.1 Relation with the Multi-layer Perceptron 137 6.2.2 Kernel PCA 138 6.3 FACTOR ANALYSIS 141 6.4 INDEPENDENT COMPONENTS ANALYSIS (ICA) 142 6.5 LOCALLY LINEAR EMBEDDING 144 6.6 ISOMAP 147 6.6.1 Multi-Dimensional Scaling (MDS) 147 FURTHER READING 150 PRACTICE QUESTIONS 151
CHAPTER 7 Probabilistic Learning 153 7.1 GAUSSIAN MIXTURE MODELS 153 7.1.1 The Expectation-Maximisation (EM) Algorithm 154 7.1.2 Information Criteria 158 7.2 NEAREST NEIGHBOUR METHODS 158 7.2.1 Nearest Neighbour Smoothing 160 7.2.2 Efficient Distance Computations: the KD-Tree 160 7.2.3 Distance Measures 165 FURTHER READING 167 PRACTICE QUESTIONS 168
CHAPTER 8 Support Vector Machines 169 8.1 OPTIMAL SEPARATION 170 8.1.1 The Margin and Support Vectors 170 8.1.2 A Constrained Optimisation Problem 172 8.1.3 Slack Variables for Non-Linearly Separable Problems 175 8.2 KERNELS 176 8.2.1 Choosing Kernels 178 8.2.2 Example: XOR 179 8.3 THE SUPPORT VECTOR MACHINE ALGORITHM 179 8.3.1 Implementation 180 8.3.2 Examples 183 8.4 EXTENSIONS TO THE SVM 184 8.4.1 Multi-Class Classification 184 8.4.2 SVM Regression 186 8.4.3 Other Advances 187 FURTHER READING 187 PRACTICE QUESTIONS 188
CHAPTER 9 Optimisation and Search 189 9.1 GOING DOWNHILL 190 9.1.1 Taylor Expansion 193 9.2 LEAST-SQUARES OPTIMISATION 194 9.2.1 The Levenberg–Marquardt Algorithm 194 9.3 CONJUGATE GRADIENTS 198 9.3.1 Conjugate Gradients Example 201 9.3.2 Conjugate Gradients and the MLP 201 9.4 SEARCH: THREE BASIC APPROACHES 204 9.4.1 Exhaustive Search 204 9.4.2 Greedy Search 205 9.4.3 Hill Climbing 205 9.5 EXPLOITATION AND EXPLORATION 206 9.6 SIMULATED ANNEALING 207 9.6.1 Comparison 208 FURTHER READING 209 PRACTICE QUESTIONS 209
CHAPTER 10 Evolutionary Learning 211 10.1 THE GENETIC ALGORITHM (GA) 212 10.1.1 String Representation 213 10.1.2 Evaluating Fitness 213 10.1.3 Population 214 10.1.4 Generating Offspring: Parent Selection 214 10.2 GENERATING OFFSPRING: GENETIC OPERATORS 216 10.2.1 Crossover 216 10.2.2 Mutation 217 10.2.3 Elitism, Tournaments, and Niching 218 10.3 USING GENETIC ALGORITHMS 220 10.3.1 Map Colouring 220 10.3.2 Punctuated Equilibrium 221 10.3.3 Example: The Knapsack Problem 222 10.3.4 Example: The Four Peaks Problem 222 10.3.5 Limitations of the GA 224 10.3.6 Training Neural Networks with Genetic Algorithms 225 10.4 GENETIC PROGRAMMING 225 10.5 COMBINING SAMPLING WITH EVOLUTIONARY LEARNING 227 FURTHER READING 228 PRACTICE QUESTIONS 229
CHAPTER 11 Reinforcement Learning 231 11.1 OVERVIEW 232 11.2 EXAMPLE: GETTING LOST 233 11.2.1 State and Action Spaces 235 11.2.2 Carrots and Sticks: The Reward Function 236 11.2.3 Discounting 237 11.2.4 Action Selection 237 11.2.5 Policy 238 11.3 MARKOV DECISION PROCESSES 238 11.3.1 The Markov Property 238 11.3.2 Probabilities in Markov Decision Processes 239 11.4 VALUES 240 11.5 BACK ON HOLIDAY: USING REINFORCEMENT LEARNING 244 11.6 THE DIFFERENCE BETWEEN SARSA AND Q-LEARNING 245 11.7 USES OF REINFORCEMENT LEARNING 246 FURTHER READING 247 PRACTICE QUESTIONS 247
CHAPTER 12 Learning with Trees 249 12.1 USING DECISION TREES 249 12.2 CONSTRUCTING DECISION TREES 250 12.2.1 Quick Aside: Entropy in Information Theory 251 12.2.2 ID3 251 12.2.3 Implementing Trees and Graphs in Python 255 12.2.4 Implementation of the Decision Tree 255 12.2.5 Dealing with Continuous Variables 257 12.2.6 Computational Complexity 258 12.3 CLASSIFICATION AND REGRESSION TREES (CART) 260 12.3.1 Gini Impurity 260 12.3.2 Regression in Trees 261 12.4 CLASSIFICATION EXAMPLE 261 FURTHER READING 263 PRACTICE QUESTIONS 264
CHAPTER 13 Decision by Committee: Ensemble Learning 267 13.1 BOOSTING 268 13.1.1 AdaBoost 269 13.1.2 Stumping 273 13.2 BAGGING 273 13.2.1 Subagging 274 13.3 RANDOM FORESTS 275 13.3.1 Comparison with Boosting 277 13.4 DIFFERENT WAYS TO COMBINE CLASSIFIERS 277 FURTHER READING 279 PRACTICE QUESTIONS 280
CHAPTER 14 Unsupervised Learning 281 14.1 THE K-MEANS ALGORITHM 282 14.1.1 Dealing with Noise 285 14.1.2 The k-Means Neural Network 285 14.1.3 Normalisation 287 14.1.4 A Better Weight Update Rule 288 14.1.5 Example: The Iris Dataset Again 289 14.1.6 Using Competitive Learning for Clustering 290 14.2 VECTOR QUANTISATION 291 14.3 THE SELF-ORGANISING FEATURE MAP 291 14.3.1 The SOM Algorithm 294 14.3.2 Neighbourhood Connections 295 14.3.3 Self-Organisation 297 14.3.4 Network Dimensionality and Boundary Conditions 298 14.3.5 Examples of Using the SOM 300 FURTHER READING 300 PRACTICE QUESTIONS 303
CHAPTER 15 Markov Chain Monte Carlo (MCMC) Methods 305 15.1 SAMPLING 305 15.1.1 Random Numbers 305 15.1.2 Gaussian Random Numbers 306 15.2 MONTE CARLO OR BUST 308 15.3 THE PROPOSAL DISTRIBUTION 310 15.4 MARKOV CHAIN MONTE CARLO 313 15.4.1 Markov Chains 313 15.4.2 The Metropolis–Hastings Algorithm 315 15.4.3 Simulated Annealing (Again) 316 15.4.4 Gibbs Sampling 318 FURTHER READING 319 PRACTICE QUESTIONS 320
CHAPTER 16 Graphical Models 321 16.1 BAYESIAN NETWORKS 322 16.1.1 Example: Exam Fear 323 16.1.2 Approximate Inference 327 16.1.3 Making Bayesian Networks 329 16.2 MARKOV RANDOM FIELDS 330 16.3 HIDDEN MARKOV MODELS (HMMS) 333 16.3.1 The Forward Algorithm 335 16.3.2 The Viterbi Algorithm 337 16.3.3 The Baum–Welch or Forward–Backward Algorithm 339 16.4 TRACKING METHODS 343 16.4.1 The Kalman Filter 343 16.4.2 The Particle Filter 350 FURTHER READING 355 PRACTICE QUESTIONS 356
CHAPTER 17 Symmetric Weights and Deep Belief Networks 359 17.1 ENERGETIC LEARNING: THE HOPFIELD NETWORK 360 17.1.1 Associative Memory 360 17.1.2 Making an Associative Memory 361 17.1.3 An Energy Function 365 17.1.4 Capacity of the Hopfield Network 367 17.1.5 The Continuous Hopfield Network 368 17.2 STOCHASTIC NEURONS — THE BOLTZMANN MACHINE 369 17.2.1 The Restricted Boltzmann Machine 371 17.2.2 Deriving the CD Algorithm 375 17.2.3 Supervised Learning 380 17.2.4 The RBM as a Directed Belief Network 381 17.3 DEEP LEARNING 385 17.3.1 Deep Belief Networks (DBN) 388 FURTHER READING 393 PRACTICE QUESTIONS 393
CHAPTER 18 Gaussian Processes 395 18.1 GAUSSIAN PROCESS REGRESSION 397 18.1.1 Adding Noise 398 18.1.2 Implementation 402 18.1.3 Learning the Parameters 403 18.1.4 Implementation 404 18.1.5 Choosing a (set of) Covariance Functions 406 18.2 GAUSSIAN PROCESS CLASSIFICATION 407 18.2.1 The Laplace Approximation 408 18.2.2 Computing the Posterior 408 18.2.3 Implementation 410 FURTHER READING 412 PRACTICE QUESTIONS 413
APPENDIX A Python 415 A.1 INSTALLING PYTHON AND OTHER PACKAGES 415 A.2 GETTING STARTED 415 A.2.1 Python for MATLAB® and R users 418 A.3 CODE BASICS 419 A.3.1 Writing and Importing Code 419 A.3.2 Control Flow 420 A.3.3 Functions 420 A.3.4 The doc String 421 A.3.5 map and lambda 421 A.3.6 Exceptions 422 A.3.7 Classes 422 A.4 USING NUMPY AND MATPLOTLIB 423 A.4.1 Arrays 423 A.4.2 Random Numbers 427 A.4.3 Linear Algebra 427 A.4.4 Plotting 427 A.4.5 One Thing to Be Aware of 429 FURTHER READING 430 PRACTICE QUESTIONS 430
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2014 MachineLearningAnAlgorithmicPer | Stephen Marsland | Machine Learning: An Algorithmic Perspective, Second Edition | 2014 |