Without reliable software metrics threshold values, the efficient quality evaluation of software could not be done. In order to derive reliable thresholds, we have to address several challenges, which impact the final result. For instance, software metrics implementations vary in various software metrics tools, including varying threshold values that result from different threshold derivation approaches. In addition, the programming language is also another important aspect. In this paper, we present the results of an empirical study aimed at comparing systematically obtained threshold values for nine software metrics in four object-oriented programming languages (i.e., Java, C++, C#, and Python). We addressed challenges in the threshold derivation domain within introduced adjustments of the benchmarkbased threshold derivation approach. The data set was selected in a uniform way, allowing derivation repeatability, while input values were collected using a single software metric tool, enabling the comparison of derived thresholds among the chosen object-oriented programming languages. Within the performed empirical study, the comparison reveals that threshold values differ between different programming languages.