Engine bar units on analysis boards #6911

tmmlaarhoven · 2020-07-01T14:32:18Z

When showing engine evaluations for a game (say in analysis mode, or watching a broadcast), there's the engine evaluation in the top-right corner of the screen, and there is an engine bar to the right of the board. This bar has horizontal dashes at the same height as the squares of the board, and one might guess that each of these dashes also corresponds to a +/-1 advantage. However, it seems that the engine evaluation needs to be approximately +1.3 to reach the first dash.

Would it not make more sense to make the dashes on this engine bar correspond to engine evaluations of -4, -3, -2, -1, 0, +1,+2, +3, +4? At least in top-level chess, a +4 advantage is "clearly winning" and further distinctions are not really necessary. And it makes it easier to see from the engine bar how big the advantage is according to the engine.

niklasf · 2020-07-02T08:32:40Z

The eval gauge is on a scale of winning chances (see curve on https://lichess.org/blog/WFvLpiQAACMA8e9D/learn-from-your-mistakes), not centipawns. But even there the ticks are arbitrary.

We could put the ticks at centipawn levels, which would look something like

|| |  |   |    |     |     |    |   |  | ||

or at more meaningful winning chance intervals (e.g. 10%)

|      |      |      |      |      |      |

tmmlaarhoven · 2020-07-02T12:29:53Z

@niklasf Thanks for the explanation.

Do you happen to know how these bars are defined on e.g. chesscom or chess24? I understand that if everyone makes a mistake lichess doesn't have to repeat it, but being consistent with other sites makes it easier to understand for users as well.

And I understand how the winning chance curve may have been chosen, but were the constants actually chosen to fit real data? If Stockfish or LC0 were given 1000 random positions with a 70% winning chance, to play out against itself, would it actually score 700-300?

ddugovic · 2020-07-03T09:52:25Z

@tmmlaarhoven #1494 initially was based upon many master games, however this did not reflect results in online play between amateurs.

Incidentally now Stockfish also has a formula (based upon engine self-play):
official-stockfish/Stockfish@1100688

niklasf · 2020-07-03T14:28:53Z

The ticks are handled in this file, if anyone wants to experiment with it:

lila/ui/ceval/src/view.ts

Lines 13 to 16 in 1fd0af8

    
           let gaugeLast = 0; 
        
           const gaugeTicks: VNode[] = [...Array(8).keys()].map(i => 
        
             h(i === 3 ? 'tick.zero' : 'tick', { attrs: { style: `height: ${(i + 1) * 12.5}%` } }) 
        
           );

Eugenio-Bruno · 2020-07-21T12:37:11Z

Hi, I'm "taking" this to begin working on a first issue and start contributing in the project.

I think I understand how the code works, so I'm first going to screenshot a few options we could take. One is changing the ticks to be CP based, another is putting the ticks at cp levels, another is putting them at more meaningful win intervals, eg 10%, but another is changing the gauge itself to go from -4 to +4 as originally suggested, which I think is a nice option. I'll screenshot all four options and then we can discuss which we want.

@tmmlaarhoven #1494 initially was based upon many master games, however this did not reflect results in online play between amateurs.

Incidentally now Stockfish also has a formula (based upon engine self-play):
official-stockfish/Stockfish@1100688

I don't think this can work that well, at least if we want to keep this simple. I'll explain my thoughts but TL;DR: too complicated to do properly. Properly converting cp eval (or win/draw/loss from stockfish/leela) to actual win% for an human game involves, in order of (I believe) importance:

Adjusting for player rating
Adjusting for opening/midgame/endgame stages

The first is also the more thorny issue. Being +3 in my games, at 800 rating, means almost nothing. Being +3 in a game between two GMs is almost surely absolutely winning. Solving these two issues is by no means impossible, but what it would require would be at least:

bucketing all positions with evaluation from the lichess pgn dumps by stage of the game and by rating, having buckets of the form (600-1000, opening), (600-1000, midgame), (600-1000, endgame), (1000-1200, opening)...
doing at least a linear regression from cp or wdl to winning chances for that single bucket
then applying that to games

And that leaves out corner cases with games where there are two very differently rated players.

Additionally there's a more philosophical issue where the gauge isn't objective anymore, it's based on the players playing that position...

A simplification might be just doing the regression for the biggest bucket, eg midgame 1400-1600, which is likely better than doing it for masters.

ddugovic · 2020-07-21T12:58:53Z

That simplification (assuming USCF 1500-rated amateurs of equal strength) is what I attempted to aim for.

Comparing the win% distribution to the raw CP distribution we see:

In even positions, win% is sensitive to +/- 50 CP changes (players need to see mistakes in even positions - players often attribute losses to "oh I just missed a single tactic" when in reality they made a series of mistakes leading to a difficult position where a tactic was missed)
In positions which favor one player, win% is not sensitive to +/- 100 CP changes (dropping a piece when you're already down a rook is not a blunder)

Eugenio-Bruno · 2020-07-21T13:05:06Z

As it relates to this issue, then, what would you suggest the ticks or the gauge be rescaled to? Is the current win% already calibrated to 1500 USCF or is that another piece of the puzzle that still needs to be done?

ddugovic · 2020-07-21T15:33:20Z

http://abema.tv/ (sometimes publicly available) shows for several games how to illustrate a win% bar.

As for tick marks, I'd suggest a uniform distribution by win% (using any sigmoid tuned for any strength, not necessarily 1500 USCF; I submitted a preference years ago and for good reasons maintainers changed it several times). I'd suggest against "raw CP" tick marks because even players as strong as GM Ashley do not understand CP values.

tmmlaarhoven · 2020-07-21T19:17:28Z

I would also like to suggest to add a useful mouseover text for the engine bar, either indicating the current engine evaluation (and what it means) or mouseover texts for the ticks somehow.

And I'd just like to add that assigning scores based on amateur level play might be fishy. Dubious gambits may score particularly well at lower levels, but are not sound. I don't think the engine bar should show that a dubious (incorrect) move with a difficult-to-find refutation is a strong move only because amateurs often fall for the trick.

Eugenio-Bruno · 2020-07-21T21:13:48Z

Thank you for the idea. I will also add the mousover text.

Regarding the amateur calibration, that's not how it works. The evaluation is still given by stockfish at full strength, so If you try a questionable gambit with strong refutation you lose winning chances because the cp score drops. At the same time though, what changes is that even though at a high level game being ahead by one piece might be an almost certain win, for an amateur it might be 70% - just an example random number to give an idea of what I'm talking about.

tmmlaarhoven · 2020-07-21T21:17:48Z

Ah, I see, thanks for the explanation. Should this winning percentage then somehow depend on the ratings of the players? (If you want to be truly adaptive, you could say that a 1200 being a piece up against a 2500 still has a 20% expected score just because of the rating difference.)

ddugovic · 2020-07-21T21:20:46Z

It depends what you are attempting to do:

If you are broadcasting a live match, predicting outcomes could be entertaining
If you are studying a game, the player ratings are stale anyway

tmmlaarhoven · 2020-07-21T21:24:12Z

Maybe if you are studying a game, the scores should then depend on your own rating? Assuming you'll be playing players in your own rating range. (I wouldn't want an engine bar saying a piece up "is just a scratch" if I'd lose 99/100 of those games.)

Edit: Ah, I see that's already what you proposed above.

Eugenio-Bruno · 2020-07-21T21:26:44Z

I took this because it's tagged as a good first issue, more to learn about the process and less about making something perfect, and it would be my first PR.

I'd say I follow @ddugovic's advice and go with 10% win percentage ticks from the default win% sigmoid formula already used in lila.

Sounds good?

ddugovic · 2020-07-21T21:41:33Z

@tmmlaarhoven I apologize for commandeering your feature request, just through my decades of teaching, public speaking, writing, coding, and competing in numerous chess leagues the two errors I identified above are common, and using a sigmoid data transformation prior to visualizing data greatly reduces incidence of those errors.

tmmlaarhoven · 2020-07-21T22:11:13Z

@ddugovic Don't worry about it! In no way is my personal opinion representative of the whole chess community, and all I want to do is make sure you thought about the design choices well. As long as you did, I trust you will make a good draft for how this could be implemented in a good way.

Eugenio-Bruno · 2020-07-22T17:33:41Z

Sorry I'm being slow by the way. I definitely hope to have this done by tomorrow. I've luckily just received a project to work on, so I can buy those lichess wings :P

Haven't forgotten about it.

niklasf added good first issue no scala labels Jul 5, 2020

Jul	AUG	Sep
	31
2019	2020	2021

ornicar / lila

Engine bar units on analysis boards #6911

Engine bar units on analysis boards #6911

tmmlaarhoven commented Jul 1, 2020

niklasf commented Jul 2, 2020 •

edited

tmmlaarhoven commented Jul 2, 2020

ddugovic commented Jul 3, 2020

niklasf commented Jul 3, 2020

Eugenio-Bruno commented Jul 21, 2020

ddugovic commented Jul 21, 2020

Eugenio-Bruno commented Jul 21, 2020

ddugovic commented Jul 21, 2020

tmmlaarhoven commented Jul 21, 2020

Eugenio-Bruno commented Jul 21, 2020

tmmlaarhoven commented Jul 21, 2020

ddugovic commented Jul 21, 2020

tmmlaarhoven commented Jul 21, 2020 •

edited

Eugenio-Bruno commented Jul 21, 2020

ddugovic commented Jul 21, 2020

tmmlaarhoven commented Jul 21, 2020

Eugenio-Bruno commented Jul 22, 2020

ornicar / lila

Sponsor ornicar/lila

Join GitHub today

Engine bar units on analysis boards #6911

Engine bar units on analysis boards #6911

Comments

tmmlaarhoven commented Jul 1, 2020

niklasf commented Jul 2, 2020 • edited

tmmlaarhoven commented Jul 2, 2020

ddugovic commented Jul 3, 2020

niklasf commented Jul 3, 2020

Eugenio-Bruno commented Jul 21, 2020

ddugovic commented Jul 21, 2020

Eugenio-Bruno commented Jul 21, 2020

ddugovic commented Jul 21, 2020

tmmlaarhoven commented Jul 21, 2020

Eugenio-Bruno commented Jul 21, 2020

tmmlaarhoven commented Jul 21, 2020

ddugovic commented Jul 21, 2020

tmmlaarhoven commented Jul 21, 2020 • edited

Eugenio-Bruno commented Jul 21, 2020

ddugovic commented Jul 21, 2020

tmmlaarhoven commented Jul 21, 2020

Eugenio-Bruno commented Jul 22, 2020

niklasf commented Jul 2, 2020 •

edited

tmmlaarhoven commented Jul 21, 2020 •

edited