The hypothesis (model) of Logistic Regression which is a binary classifier ( y={0,1}) is given in the equation below:
Hypothesis
S(z)=P(y=1|x)=hθ(x)=11+exp(−θ⊤x)
Which calculates probability of Class 1, and by setting a threshold (such as hθ(x)>0.5) we can classify to 1, or 0.
Cost function
The cost function for Logistic Regression is defined as below. It is called binary cross entropy loss function:
J(θ)=−1m∑mi(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i))))
Iterative updates
Assume we start all the model parameters with a random number (in this case the only model parameters we have are θj and assume we initialized all of them with 1: for all θj=1 for j={0,1,...,n} and n is the number of features we have)
θjnew←θjold+α×1m∑mi=1[y(i)−σ(θ⊤jold(x(i)))]x(i)j
Where:
m= number of rows in the training batch
x(i)= the feature vector for sample i
θj= the coefficient vector corresponding the features
y(i)= actual class label for sample i in the training batch
x(i)j= the element (column) j in the feature vector for sample i
α= the learning rate
Dataset
The training dataset of pass/fail in an exam for 5 students is given in the table below:

If we initialize all the model parameters with 1 (all θj=1), and the learning rate is α=0.1, and if we use batch gradient descent, what will be the:
a) Accuracy of the model at initialization of the train set (accuracy=number of correct classificationsall classifications)?
b) Cost at initialization?
c) Cost after 1 epoch?
d) Repeat all a,b,c steps if we use mini-batch gradient descent and batch size=2
(Hint: For x(i)j when j=0 we have x(i)0=1 for all i )