If you check your output file, you might be able to see if the force constants you are computing with 'frequency' are actually being used by the optimizer (as opposed to an empirical one). I suspect not. The admittedly not very intuitive way to accomplish what you are doing is with
full_hess_every = 0
which will compute the Hessian (and transform and use it) once at the beginning of the optimization.
full_hess_every = 1
computes the hessian at every single step. If you use this keyword, you do not need the first, explicit frequency call. It is impied.
Beyond that, you are right to try different starting geometries, as the better your first guess, the more likely you are to converge.