There are several ready tools for speech recognition, that one can use to train custom models given the appropriate dataset.
CMU Sphinx : Used more in an academic setting, one of the oldest libraries.
Kaldi – hard to set up, very flexible to use. Typically used by academics.
Deep Speech – Easy to set up, reasonably flexible to use
Google API : Speaker segmentation is supported.