I have an unfinished project on calculating the kinetic parameters of the OH-CH3OH gas-phase reaction using linear regression.
In this project, I implemented two methods: one using the sklearn library and another using gradient descent from scratch. Surprisingly, the loss value for the gradient descent method was lower than when using sklearn—something I wasn’t expecting.
This project was inspired by what I learned in the Machine Learning Specialization on Coursera by DeepLearning.AI, combined with my background in chemical engineering. As a beginner in coding and machine learning, I’m still learning the ropes and figuring out the standard practices. Any insights or feedback would be greatly appreciated! If anyone can point out why the results weren’t what I was expecting, I would really appreciate it! 
You can find the project on GitHub: GitHub - mjmortega/OH-methanol_LinearReg
1 Like
hi @Shota-kun
I don’t understand this part in the version without sklearn:
# multiply variables by 1000 for larger gradient
X=1000*X
y=1000*y
I believe the loss calculation is not accurate due to the line loss /= (2 * m * 1000000)
.
1 Like
Here’s the Qwen2.5-Coder’s reply when I asked what’s wrong with the code:
The code you provided has a few issues that need to be addressed:
-
The variables X and y are being multiplied by 1000, but this affects the scale of the gradient descent process. This can lead to numerical instability because the loss function will be very large initially.
-
The learning rate is set to a relatively small value (0.001), which might cause the gradient descent to converge slowly or not at all if the initial parameters are far from the optimal solution.
-
The error history and parameter history are only recorded for the first 100,000 iterations, which might be too few iterations to capture meaningful trends in the loss function.
1 Like
Hi @mahmutc . When I initially run the code without multiplying X and y by 1000, the w parameter would converge very slow.
If you can inspect the values of X, they are very low values. Also, this is the formula to calculate the gradient of the w parameter:
dloss_dw += error * X[i]
We can see that the loss or the error calculated will be multiplied by the corresponding X value for a particular index. Hence, the gradient of w will also become very low in just a few iterations and would not converge as fast as the b parameter.
Basically, the reason why I multiplied X and y by 1000 is to obtain a larger gradient for w for it to converge faster.
1 Like
Update
I found the issue. Basically, the way I used to implement gradient descent and calculate the loss function (or the cost function) is a bit different from how the Sklearn library uses linear regression.
sklearn.linear_model.LinearRegression()
loss = mean_squared_error(y, y_pred)
The functions above calculate the loss function as:
loss = 1/m Σ (yi - ŷi)^2
On the other hand, I drew inspiration from Andrew Ng’s course to use gradient descent, and this is the formula that I used for the loss function:
loss = 1/2m Σ (yi - ŷi)^2
We can see that the formula I used divides the standard cost function by 2. Which is the reason why Sklearn library gave me a loss function of 0.0096 while the gradient descent from scratch gave me 0.0048. I fixed the issue in my code by just using the standard cost function where dividing by 2 is not needed, but I also multiplied the gradients of the parameters by 2 so that I am still following the rules of calculus. Shown below is the sample derivation for just the parameter b, or theta0:
I now calculated approximately 0.0096 loss function for both methods. I am not saying that Andrew Ng’s method is wrong in any way and I think that he only used a different formula so that the gradient computation would look simpler and easier to remember. I believe that he is a great instructor and I highly recommend his online courses to everyone.
I just want to share this little discovery of mine. Of course I am still a beginner so please tell me if there are things that I mentioned incorrectly because I am always eager to learn something new
.
2 Likes