Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upEngine bar units on analysis boards #6911
Comments
|
The eval gauge is on a scale of winning chances (see curve on https://lichess.org/blog/WFvLpiQAACMA8e9D/learn-from-your-mistakes), not centipawns. But even there the ticks are arbitrary. We could put the ticks at centipawn levels, which would look something like
or at more meaningful winning chance intervals (e.g. 10%)
|
|
@niklasf Thanks for the explanation. Do you happen to know how these bars are defined on e.g. chesscom or chess24? I understand that if everyone makes a mistake lichess doesn't have to repeat it, but being consistent with other sites makes it easier to understand for users as well. And I understand how the winning chance curve may have been chosen, but were the constants actually chosen to fit real data? If Stockfish or LC0 were given 1000 random positions with a 70% winning chance, to play out against itself, would it actually score 700-300? |
|
@tmmlaarhoven #1494 initially was based upon many master games, however this did not reflect results in online play between amateurs. Incidentally now Stockfish also has a formula (based upon engine self-play): |
|
The ticks are handled in this file, if anyone wants to experiment with it: Lines 13 to 16 in 1fd0af8 |
|
Hi, I'm "taking" this to begin working on a first issue and start contributing in the project. I think I understand how the code works, so I'm first going to screenshot a few options we could take. One is changing the ticks to be CP based, another is putting the ticks at cp levels, another is putting them at more meaningful win intervals, eg 10%, but another is changing the gauge itself to go from -4 to +4 as originally suggested, which I think is a nice option. I'll screenshot all four options and then we can discuss which we want.
I don't think this can work that well, at least if we want to keep this simple. I'll explain my thoughts but TL;DR: too complicated to do properly. Properly converting cp eval (or win/draw/loss from stockfish/leela) to actual win% for an human game involves, in order of (I believe) importance:
The first is also the more thorny issue. Being +3 in my games, at 800 rating, means almost nothing. Being +3 in a game between two GMs is almost surely absolutely winning. Solving these two issues is by no means impossible, but what it would require would be at least:
And that leaves out corner cases with games where there are two very differently rated players. Additionally there's a more philosophical issue where the gauge isn't objective anymore, it's based on the players playing that position... A simplification might be just doing the regression for the biggest bucket, eg midgame 1400-1600, which is likely better than doing it for masters. |
|
That simplification (assuming USCF 1500-rated amateurs of equal strength) is what I attempted to aim for. Comparing the win% distribution to the raw CP distribution we see:
|
|
As it relates to this issue, then, what would you suggest the ticks or the gauge be rescaled to? Is the current win% already calibrated to 1500 USCF or is that another piece of the puzzle that still needs to be done? |
|
http://abema.tv/ (sometimes publicly available) shows for several games how to illustrate a win% bar. As for tick marks, I'd suggest a uniform distribution by win% (using any sigmoid tuned for any strength, not necessarily 1500 USCF; I submitted a preference years ago and for good reasons maintainers changed it several times). I'd suggest against "raw CP" tick marks because even players as strong as GM Ashley do not understand CP values. |
|
I would also like to suggest to add a useful mouseover text for the engine bar, either indicating the current engine evaluation (and what it means) or mouseover texts for the ticks somehow. And I'd just like to add that assigning scores based on amateur level play might be fishy. Dubious gambits may score particularly well at lower levels, but are not sound. I don't think the engine bar should show that a dubious (incorrect) move with a difficult-to-find refutation is a strong move only because amateurs often fall for the trick. |
|
Thank you for the idea. I will also add the mousover text. Regarding the amateur calibration, that's not how it works. The evaluation is still given by stockfish at full strength, so If you try a questionable gambit with strong refutation you lose winning chances because the cp score drops. At the same time though, what changes is that even though at a high level game being ahead by one piece might be an almost certain win, for an amateur it might be 70% - just an example random number to give an idea of what I'm talking about. |
|
Ah, I see, thanks for the explanation. Should this winning percentage then somehow depend on the ratings of the players? (If you want to be truly adaptive, you could say that a 1200 being a piece up against a 2500 still has a 20% expected score just because of the rating difference.) |
|
It depends what you are attempting to do:
|
|
Maybe if you are studying a game, the scores should then depend on your own rating? Assuming you'll be playing players in your own rating range. (I wouldn't want an engine bar saying a piece up "is just a scratch" if I'd lose 99/100 of those games.) Edit: Ah, I see that's already what you proposed above. |
|
I took this because it's tagged as a good first issue, more to learn about the process and less about making something perfect, and it would be my first PR. I'd say I follow @ddugovic's advice and go with 10% win percentage ticks from the default win% sigmoid formula already used in lila. Sounds good? |
|
@tmmlaarhoven I apologize for commandeering your feature request, just through my decades of teaching, public speaking, writing, coding, and competing in numerous chess leagues the two errors I identified above are common, and using a sigmoid data transformation prior to visualizing data greatly reduces incidence of those errors. |
|
@ddugovic Don't worry about it! In no way is my personal opinion representative of the whole chess community, and all I want to do is make sure you thought about the design choices well. As long as you did, I trust you will make a good draft for how this could be implemented in a good way. |
|
Sorry I'm being slow by the way. I definitely hope to have this done by tomorrow. I've luckily just received a project to work on, so I can buy those lichess wings :P Haven't forgotten about it. |


When showing engine evaluations for a game (say in analysis mode, or watching a broadcast), there's the engine evaluation in the top-right corner of the screen, and there is an engine bar to the right of the board. This bar has horizontal dashes at the same height as the squares of the board, and one might guess that each of these dashes also corresponds to a +/-1 advantage. However, it seems that the engine evaluation needs to be approximately +1.3 to reach the first dash.
Would it not make more sense to make the dashes on this engine bar correspond to engine evaluations of -4, -3, -2, -1, 0, +1,+2, +3, +4? At least in top-level chess, a +4 advantage is "clearly winning" and further distinctions are not really necessary. And it makes it easier to see from the engine bar how big the advantage is according to the engine.