The measures described so far can only be taken into account for joint rankings, i.e. for lists for which each element of a list is also on the other. Rankings are not conjuncturated when a ranking with such lists called top k rankings is cut to a certain depth k. We calculated the average overlap,19 20 a weighted measure for top-K rankings, which takes into account the cumulative intersection (or overlap) between the two lists and average at a certain depth (cut-off point) k (see the complementary annex online for details). We calculated the average overlap between collation pairs for networks with at least six treatments (139 networks) for a depth k equal to half the number of treatments in the network (or if T is an odd number). Random Effects models tend to provide relative processing effects with similar accuracy when heterogeneity increases. On the other hand, in the absence of heterogeneity in the use of fixed-effect models, the accuracy of the effects can vary considerably depending on the amount of data available for each intervention. In the latter case, the ranking metrics should not match. The role of precision in the ranking of differences of opinion is also more pronounced in cases where interventions have similar effects.

In fact, practitioners use these types of approaches in combination with kappa-like approaches in default settings and psychometric studies to determine the concordance between evaluators (e.g. B [3]; [20]). On the other hand, theoretical research doubts the applicability of these approaches for several reasons: first, a correlation only provides information in similar directions beyond items [10], which is an important restriction for rankings. In addition, none of the correlation approaches correct random agreement [11], which is a serious problem in Inter-Rater environments. Therefore, in most practical implications, correlations are not used as the only measure of concordance between evaluators. The approaches provided in the literature highlight problems with the measures currently used in the rankings and confirm the need for a new approach. In general, it is worth highlighting the lack of inter-advisor reliability measures that take into account the proximity of credit ratings while correcting the random agreement. While there are some strengths of correlated approaches, they do not take into account random agreement..

. .