close
The Wayback Machine - https://web.archive.org/web/20200831022158/https://github.com/ornicar/lila/issues/6911
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine bar units on analysis boards #6911

Open
tmmlaarhoven opened this issue Jul 1, 2020 · 17 comments
Open

Engine bar units on analysis boards #6911

tmmlaarhoven opened this issue Jul 1, 2020 · 17 comments

Comments

@tmmlaarhoven
Copy link

@tmmlaarhoven tmmlaarhoven commented Jul 1, 2020

When showing engine evaluations for a game (say in analysis mode, or watching a broadcast), there's the engine evaluation in the top-right corner of the screen, and there is an engine bar to the right of the board. This bar has horizontal dashes at the same height as the squares of the board, and one might guess that each of these dashes also corresponds to a +/-1 advantage. However, it seems that the engine evaluation needs to be approximately +1.3 to reach the first dash.

Would it not make more sense to make the dashes on this engine bar correspond to engine evaluations of -4, -3, -2, -1, 0, +1,+2, +3, +4? At least in top-level chess, a +4 advantage is "clearly winning" and further distinctions are not really necessary. And it makes it easier to see from the engine bar how big the advantage is according to the engine.

@niklasf
Copy link
Collaborator

@niklasf niklasf commented Jul 2, 2020

The eval gauge is on a scale of winning chances (see curve on https://lichess.org/blog/WFvLpiQAACMA8e9D/learn-from-your-mistakes), not centipawns. But even there the ticks are arbitrary.

We could put the ticks at centipawn levels, which would look something like

|| |  |   |    |     |     |    |   |  | ||

or at more meaningful winning chance intervals (e.g. 10%)

|      |      |      |      |      |      |
@tmmlaarhoven
Copy link
Author

@tmmlaarhoven tmmlaarhoven commented Jul 2, 2020

@niklasf Thanks for the explanation.

Do you happen to know how these bars are defined on e.g. chesscom or chess24? I understand that if everyone makes a mistake lichess doesn't have to repeat it, but being consistent with other sites makes it easier to understand for users as well.

And I understand how the winning chance curve may have been chosen, but were the constants actually chosen to fit real data? If Stockfish or LC0 were given 1000 random positions with a 70% winning chance, to play out against itself, would it actually score 700-300?

@ddugovic
Copy link
Contributor

@ddugovic ddugovic commented Jul 3, 2020

@tmmlaarhoven #1494 initially was based upon many master games, however this did not reflect results in online play between amateurs.

Incidentally now Stockfish also has a formula (based upon engine self-play):
official-stockfish/Stockfish@1100688

@niklasf
Copy link
Collaborator

@niklasf niklasf commented Jul 3, 2020

The ticks are handled in this file, if anyone wants to experiment with it:

let gaugeLast = 0;
const gaugeTicks: VNode[] = [...Array(8).keys()].map(i =>
h(i === 3 ? 'tick.zero' : 'tick', { attrs: { style: `height: ${(i + 1) * 12.5}%` } })
);

@Eugenio-Bruno
Copy link

@Eugenio-Bruno Eugenio-Bruno commented Jul 21, 2020

Hi, I'm "taking" this to begin working on a first issue and start contributing in the project.

I think I understand how the code works, so I'm first going to screenshot a few options we could take. One is changing the ticks to be CP based, another is putting the ticks at cp levels, another is putting them at more meaningful win intervals, eg 10%, but another is changing the gauge itself to go from -4 to +4 as originally suggested, which I think is a nice option. I'll screenshot all four options and then we can discuss which we want.

@tmmlaarhoven #1494 initially was based upon many master games, however this did not reflect results in online play between amateurs.

Incidentally now Stockfish also has a formula (based upon engine self-play):
official-stockfish/Stockfish@1100688

I don't think this can work that well, at least if we want to keep this simple. I'll explain my thoughts but TL;DR: too complicated to do properly. Properly converting cp eval (or win/draw/loss from stockfish/leela) to actual win% for an human game involves, in order of (I believe) importance:

  1. Adjusting for player rating
  2. Adjusting for opening/midgame/endgame stages

The first is also the more thorny issue. Being +3 in my games, at 800 rating, means almost nothing. Being +3 in a game between two GMs is almost surely absolutely winning. Solving these two issues is by no means impossible, but what it would require would be at least:

  1. bucketing all positions with evaluation from the lichess pgn dumps by stage of the game and by rating, having buckets of the form (600-1000, opening), (600-1000, midgame), (600-1000, endgame), (1000-1200, opening)...
  2. doing at least a linear regression from cp or wdl to winning chances for that single bucket
  3. then applying that to games

And that leaves out corner cases with games where there are two very differently rated players.

Additionally there's a more philosophical issue where the gauge isn't objective anymore, it's based on the players playing that position...

A simplification might be just doing the regression for the biggest bucket, eg midgame 1400-1600, which is likely better than doing it for masters.

@ddugovic
Copy link
Contributor

@ddugovic ddugovic commented Jul 21, 2020

That simplification (assuming USCF 1500-rated amateurs of equal strength) is what I attempted to aim for.

Comparing the win% distribution to the raw CP distribution we see:

  • In even positions, win% is sensitive to +/- 50 CP changes (players need to see mistakes in even positions - players often attribute losses to "oh I just missed a single tactic" when in reality they made a series of mistakes leading to a difficult position where a tactic was missed)
  • In positions which favor one player, win% is not sensitive to +/- 100 CP changes (dropping a piece when you're already down a rook is not a blunder)
@Eugenio-Bruno
Copy link

@Eugenio-Bruno Eugenio-Bruno commented Jul 21, 2020

As it relates to this issue, then, what would you suggest the ticks or the gauge be rescaled to? Is the current win% already calibrated to 1500 USCF or is that another piece of the puzzle that still needs to be done?

@ddugovic
Copy link
Contributor

@ddugovic ddugovic commented Jul 21, 2020

http://abema.tv/ (sometimes publicly available) shows for several games how to illustrate a win% bar.

As for tick marks, I'd suggest a uniform distribution by win% (using any sigmoid tuned for any strength, not necessarily 1500 USCF; I submitted a preference years ago and for good reasons maintainers changed it several times). I'd suggest against "raw CP" tick marks because even players as strong as GM Ashley do not understand CP values.

@tmmlaarhoven
Copy link
Author

@tmmlaarhoven tmmlaarhoven commented Jul 21, 2020

I would also like to suggest to add a useful mouseover text for the engine bar, either indicating the current engine evaluation (and what it means) or mouseover texts for the ticks somehow.

And I'd just like to add that assigning scores based on amateur level play might be fishy. Dubious gambits may score particularly well at lower levels, but are not sound. I don't think the engine bar should show that a dubious (incorrect) move with a difficult-to-find refutation is a strong move only because amateurs often fall for the trick.

@Eugenio-Bruno
Copy link

@Eugenio-Bruno Eugenio-Bruno commented Jul 21, 2020

Thank you for the idea. I will also add the mousover text.

Regarding the amateur calibration, that's not how it works. The evaluation is still given by stockfish at full strength, so If you try a questionable gambit with strong refutation you lose winning chances because the cp score drops. At the same time though, what changes is that even though at a high level game being ahead by one piece might be an almost certain win, for an amateur it might be 70% - just an example random number to give an idea of what I'm talking about.

@tmmlaarhoven
Copy link
Author

@tmmlaarhoven tmmlaarhoven commented Jul 21, 2020

Ah, I see, thanks for the explanation. Should this winning percentage then somehow depend on the ratings of the players? (If you want to be truly adaptive, you could say that a 1200 being a piece up against a 2500 still has a 20% expected score just because of the rating difference.)

@ddugovic
Copy link
Contributor

@ddugovic ddugovic commented Jul 21, 2020

It depends what you are attempting to do:

  • If you are broadcasting a live match, predicting outcomes could be entertaining
  • If you are studying a game, the player ratings are stale anyway
@tmmlaarhoven
Copy link
Author

@tmmlaarhoven tmmlaarhoven commented Jul 21, 2020

Maybe if you are studying a game, the scores should then depend on your own rating? Assuming you'll be playing players in your own rating range. (I wouldn't want an engine bar saying a piece up "is just a scratch" if I'd lose 99/100 of those games.)

Edit: Ah, I see that's already what you proposed above.

@Eugenio-Bruno
Copy link

@Eugenio-Bruno Eugenio-Bruno commented Jul 21, 2020

I took this because it's tagged as a good first issue, more to learn about the process and less about making something perfect, and it would be my first PR.

I'd say I follow @ddugovic's advice and go with 10% win percentage ticks from the default win% sigmoid formula already used in lila.

Sounds good?

@ddugovic
Copy link
Contributor

@ddugovic ddugovic commented Jul 21, 2020

@tmmlaarhoven I apologize for commandeering your feature request, just through my decades of teaching, public speaking, writing, coding, and competing in numerous chess leagues the two errors I identified above are common, and using a sigmoid data transformation prior to visualizing data greatly reduces incidence of those errors.

@tmmlaarhoven
Copy link
Author

@tmmlaarhoven tmmlaarhoven commented Jul 21, 2020

@ddugovic Don't worry about it! In no way is my personal opinion representative of the whole chess community, and all I want to do is make sure you thought about the design choices well. As long as you did, I trust you will make a good draft for how this could be implemented in a good way.

@Eugenio-Bruno
Copy link

@Eugenio-Bruno Eugenio-Bruno commented Jul 22, 2020

Sorry I'm being slow by the way. I definitely hope to have this done by tomorrow. I've luckily just received a project to work on, so I can buy those lichess wings :P

Haven't forgotten about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.