TBL?

Niek · 15 nov 2009

Er waren vandaag bij F3p problemen met een vergeetachtig score programma,
wat door hard werk en grote inzet van div vrijwilligers grotendeels recht is gezet.

Het huidige programma is blijkbaar eigen bezit (niet open source, afblijven, geheim)?
en er is in dit soort situaties geen alternatief / backup / werkbare noodoplossing beschikbaar.

Zelf lijkt me zoiets in bijv M$ Excel niet al te lastig in elkaar te freubelen.
De figuren, k factoren, de piloten, scores etc, wat vermenigvuldigen,
deze weer optellen enz, is tot een puntentotaal per deelnemer per ronde te komen.

Maar wat mij hierna niet duidelijk is is het TBL, percentage en afstreep gebeuren etc,
kan iemand dat eens goed uitwerken / uitleggen?

Mvrg,

Niek

Julius Monchen · 15 nov 2009

Hoi Niek,

hieronder een artikeltje over de werking van TBL (helaas in engels).

TBL Scoring

The recent US Nationals sparked a bit of debate on the NSRCA email discussion group with regards to TBL and how it works. For your interest, I nabbed this bit of text for inclusion in the bulletin. Derek Koopowitz from the U.S. posted the following reply about TBL, this has come from the TBL Manual. Read on … FB Derek Writes …. I need to correct what Tony has written. TBL does not correct or adjust in-dividual scores for a judge. Don't worry about that... if you get a zero from a judge for a maneuver it will still be a zero on your scoresheet.

What TBL will do is it will find out if a judge is biased toward a pilot and it will discard the score. For those interested here is a synopsis of TBL:

The Tarasov-Bauer-Long (TBL) Scoring method has been around since the 1970's.

It has been used in the full size arena since 1978 and has been used at every full size IAC World Championship since 1980. The TBL method applies proven statistical probability theory to the judge's scores to resolve style differences and bias, and to avoid the inclusion of potential faulty judgements in contest results.

Why we need TBL
To understand just why we need TBL, and how it works, is of considerable importance to us all. It is important to the pilots because it is there to reduce the prospect of unsatisfactory judgements affecting their results, and it is important for judges because it will introduce a completely new dimension of scrutiny into the sequence totals, and it will also discreetly engage the attention of the Chief Judge, or Contest Director, if the judges conclusions differ sufficiently from all those other judges on the same panel.

When people get together to judge how well a pre-defined competitive task is being tackled, the range of opinions is often diverse. This is entirely natural among humans where the critique of any display of skill relies on the interpretation of rapidly changing visual cues. In order to minimize the prospect of any "way out opinions" having too much effect on the result, it is usual to average the accumulated scores to arrive at a final assessment, which takes everybody's opinion into account.

Unfortunately this averaging approach can achieve the opposite of what we really want, which is to identify, and where needed, remove those "way out opinions" because they are the ones most likely to be ill-judged and therefore should be discarded, leaving the rest to determine the more appropriate result. In aerobatics the process of judging according to the rulebook normally leads to a series of generally similar personal views. However, one judge's downgrading may be harsher or more lenient than the next, their personal feelings toward each competitor or aircraft type may predispose toward favor or dislike (bias), and they will almost certainly miss or see things that other judges do not.

How then can we "judge" the judges and so reach a conclusion, which has good probability of acceptance by all the concerned parties? The key word is probability, the concept of a perceived level of confidence in collectively viewed judgements has entered the frame. What we really mean is that we must be confident that opinions pitched outside some pre-defined level of reasonable acceptability will be identified as such and will not be used. This sort of situation is the daily bread and butter of well established probability theory which, when suitably applied, can produce a very clear cut analysis of numerically expressed opinions provided that the appropriate criteria have been carefully established beforehand.

What has been developed through several previous editions is some arithmetic which addresses the judge's raw scores in such a way that any which are probably unfair are discarded with an established level of confidence. To understand the process you need only accept some quite simple arithmetic procedures, which are central to what is called "statistical probability". The TBL scoring system in effect does the following: * Communizes the judging styles.
* Computes TBL scores
* Publishes results

Communizing the judging styles involves remodelling the scores to bring all the judging styles to a common format and removing any natural bias between panel members. Following some calculations, each judge's set of scores are squeezed or stretched and moved en-bloc up or down so that the sets all show the same overall spread and have identical averages (bias). Within each set the pilot order and score progression must remain unaltered, but now valid score comparisons are possible between all the panel judges on behalf of each pilot.

Computing the TBL score involves looking at the high and low scores in each pilot's set and throws out any that are too "far out" to be fair. This is done by subtracting the average for the set from each one and dividing the result by the "sample standard deviation" - if the result of this sum is greater than 1.645 then according to statistical probability theory we can be at least 90% confident that it is unfair, so the score is discarded.

This calculation and the mathematically derived 1.645 criteria is the key to the correctness of the TBL process, and is based on many years of experience by the full size aerobatics organization with contest scores at all levels.

The discarding of any scores of course changes for a pilot the average and standard deviation of their remaining results, and so the whole process is repeated. After several cycles any "unfair" scores will have gone, and those that remain will all satisfy the essential 90% confidence criteria.

Publishing the results is derived by averaging each pilot's scores. The final TBL iteration therefore has any appropriate penalty/bonus values applied and the results are then sorted in order of descent of the total scores to rank the pilots first to last.

These final scores may, or may not, be normalized to 1000 points, depending on the setting for the selected class. Educating and improving the judges is a useful by-product of this process in that it provides all the bells and whistles how each judge has performed by comparison with the overall judging panel average and when seen against the 90% level of confidence criteria.

The TBL system will produce an analysis showing each judge the percentage of scores accepted as "OK", and a comparison with the panel style (spread of score) and bias (average). Unfortunately TBL, by definition, brings with it a 10% possibility of upsetting an honest judge's day. The trade-off is that we expect not only to achieve a set of results with at least 90% confidence that are "fair" every time, but that the system also provides us with a wonderful tool to address our judging standards. TBL will ensure that every judge's opinion has equal weight, and that each sequence score by each judge is accepted only if it lies within an acceptable margin from the panel average.

TBL, however, by necessity takes the dominant judging panel view as the "correct" one and it can't make right scores out of wrong ones. If 6 out of 8 judges are distracted and make a mess out of one pilots efforts, then for TBL this becomes the controlling assessment of that pilots performance, and the other 2 diligent judges who got it right will see their scores unceremoniously zapped. In practice this would be extremely unusual - from the judging line it is almost impossible to deliberately upset the final results without collusion between a majority of the judges, and if that starts to happen then someone is definitely on the wrong planet.

Derek Koopowitz

Groet,

Julius

J.Heilig · 18 nov 2009

TBL is not capable to transform "bad judges" into good ones.

Enough said.

Jürgen

wvr · 18 nov 2009

TBL wordt niet gebruikt voor de F3P ,alleen voor de F3A.

Bij F3P wordt bij 5 juryleden (per figuur) de hoogste en laagste score genegeerd, het gemiddelde van de 3 overgebleven juryleden wordt als score voor dat figuur gebruikt.

Bij 4 juryleden geld hetzelfde, maar dan word het gemiddelde van 2 juryleden gebruikt.

Bij 3 juryleden worden er geen scores genegeerd.

Have fun in excell....

Niek · 19 nov 2009

Thanks

ts ook maar een evtuele nood oplossing..
En eens snappen hoe een en ander in elkaar zit.

J.Heilig · 20 nov 2009

wvr zei:
TBL wordt niet gebruikt voor de F3P ,alleen voor de F3A.
...

Hi Winfried,

unfortunately the official FAI documents say otherwise:

http://www.fai.org/aeromodelling/system/files/ciam_200803_minutes.pdf (Page 34)

Effective as of 01.01.2009 - also in the Sporting Code:

ftp://www.fai.org/sporting_code/sc4/sc4_f3_aerobatics_09.pdf (Annex 5M)

Jürgen

TBL?

Niek

Julius Monchen

J.Heilig

wvr

Niek

J.Heilig