Race condition in resync when a deregister occurs? #621
Comments
|
Confirmed same consul log output on a second instance (a day earlier):
|
|
Third instance, an hour ago:
|
|
Pattern: all 3 Deregister + Synced operations are on the same second, backing up my theory re the race condition. Running in parallel goroutines most likely? |
|
Hit this again today... |
|
Note: we now use a 300sec resync interval, and still managed to hit it. |
|
Do you happen to have a PR to fix this? I don't have a bunch of time to investigate/write a fix, but would be happy to review/merge something :) |
I don't as yet, it's on my list to try and find time to a) write a test that repro's it and b) fix the underlying issue. |
|
Hit this again today, still haven't had time to PR :( |
|
Note: doing further testing on master, as I realised v7 is quite old and a lot of new fixes have been applied. Will report back once I have any definitive results. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Using
gliderlabs/registrator:v7with a bash wrapper to start it, the relevant parts at the bottom being:See above
/registrator_start.sh, the bash wrapper.Tried to get them but they seem to be rotated out unfortunately :(
Dockerfilefor application that is having issues.Description of the problem:
resyncbehaviour is racy, and possibly needs a mutex/lock between unregister events + resync events?ServicePortwhich is never cleaned up, including on aresyncHow reproducible:
Steps to Reproduce:
Actual Results:
Note the
Syncedresponse after theDeregisteredin this (grepped bya088d8cbf79df4f9d001):And the consul service list for the IP in question. The first result is the stale entry (note no
ServicePort), the second is a valid replacement service that was spawned by ECS during a rolling deploy, and is correctly registered:Expected Results:
The text was updated successfully, but these errors were encountered: