Question on offline RL

Hey, I'm kind of new to RL and I have a question, in offline RL the key point is that we are learning the best policy everywhere. My question is are we also learning best value function and best q function everywhere?

Specifically I want to know how best to learn a value function only (not necessarily the policy) from an offline dataset, and I want to use offline RL tools to learn the best value function everywhere but I am confused on what to research on learning more about this. I want to do this to learn V as a safety metric for states.

I hope I make sense.