Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Vocabulary min_count should error when using a nonexistent namespace #2292
Comments
|
We agree it'd be great to have a fix here, since a number of us have hit it, but we're not sure what the right solution is presently. |
|
When iterating through token counts in the loop at line 553 (https://github.com/allenai/allennlp/blob/master/allennlp/data/vocabulary.py#L553), the An alternate method is iterating through all the keys in If you think either solution is suitable, I could implement it. |
|
It seems to me like the second solution is correct - (I think) we want to fall back to a default of 1 for namespaces that aren't mentioned, so the first solution doesn't work. @nelson-liu, what do you think? |
|
@nelson-liu @matt-gardner If this issue is still pending, I would like to take this up. If so, should I go forward with the second approach to solve it? |
|
Yes, @nadgeri14, that would be great. The line numbers referenced above have changed, so the place to do it is probably right above this line, at the top of allennlp/allennlp/data/vocabulary.py Line 436 in c9bd9b2 |
|
@matt-gardner Thanks, that was really helpful. I need a small help: Could you please provide a test case, as it will help me to validate my solution before submitting the PR. Thanks in advance for the help. |
|
You just need a |


If you use the
min_countparameter of theVocabulary, but you specify a namespace that does not exist, the vocabulary creation will just silently proceed. It'd be great if it could error in this case, perhaps by popping off the namespaces inmin_countand erroring if any are left at the end of vocab creation (would probably go at the end of https://github.com/allenai/allennlp/blob/master/allennlp/data/vocabulary.py#L454)?