retagged by
3,565 views
1 1 vote
Both of the batch size and number of epochs are integer values and seem to do the same thing in Stochastic gradient descent. What are these two hyper-parameters of this learning algorithm?
50% Accept Rate Accepted 31 answers out of 62 questions

1 Answer

Best answer
0 0 votes

In summary:

  • Stochastic gradient descent (SGD) in general is an iterative learning algorithm that uses a randomly selected (or shuffled) samples from the training dataset to update a model. However, it is also referred to updating the model parameters using just one sample at a time
     
  • Batch Size is a hyper-parameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated
     
  • Epochs is a hyper-parameter of gradient descent that controls is the number of complete passes through the training dataset

 

Let's review some basic definitions: 

What Is a Sample?

A sample is a single row of data. A sample may also be called an instance, an observation, an input vector, or a feature vector.

What is a Model Parameter?

A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data.

  • They are required by the model when making predictions.
  • They are estimated or learned from data.
  • They are often not set manually by the practitioner.
  • They are often saved as part of the learned model.

Some examples of model parameters include:

  • The weights in an artificial neural network.
  • The coefficients in a linear regression or logistic regression task.

What is a Model Hyperparameter?

A model hyperparameter is a configuration that is external to the model and whose value cannot be estimated from data.

  • They are often used in processes to help estimate model parameters.
  • They are often specified by the practitioner.
  • They can often be set using heuristics.
  • They are often tuned for a given predictive modeling problem.

A good rule of thumb to overcome this confusion is as follows:

If you have to specify a model parameter manually then it is probably a model hyperparameter.

Some examples of model hyperparameters include:

  • The learning rate for training a neural network.
  • The k in k-nearest neighbors.

What Is a Batch?

The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.

When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.

  • Batch Gradient Descent. Batch Size = Size of Training Set
  • Stochastic Gradient Descent. Batch Size = 1
  • Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set

In the case of mini-batch gradient descent, popular batch sizes include 32, 64, and 128 samples. You may see these values used in models in the literature and in tutorials. If the dataset does not divide evenly by the batch size, it simply means that the final batch has fewer samples than the other batches. Alternately, you can remove some samples from the dataset or change the batch size such that the number of samples in the dataset does divide evenly by the batch size.

What Is an Epoch?

The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset.

One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. For example, as above, an epoch that has one batch is called the batch gradient descent learning algorithm.

For more information, please take a look at this article.

selected by

Related questions

5 5 votes
1 answers 1 answer
9.1k
9.1k views
tofighi asked Jun 26, 2019
9,104 views
Assume we have a $5\times5$ px RGB image with 3 channels respectively for R, G, and B. IfR2000012001201021210101020G0212211100002202002002111B0100111201102021011012112 We...
3 3 votes
1 answers 1 answer
9.4k
9.4k views
tofighi asked Apr 4, 2019
9,400 views
In the figure below, a neural network is shown. Calculate the following:1) How many neurons do we have in the input layer and the output layer?2) How many hidden layers d...
1 1 vote
1 1 answer
788
788 views
1 1 vote
1 1 answer
1.3k
1.3k views
mcneils asked Mar 18, 2019
1,274 views
How do you determine the weight values that connect to the other data points when solving for our output in neural networks?