close
The Wayback Machine - https://web.archive.org/web/20201030092740/https://github.com/VowpalWabbit/vowpal_wabbit/issues/1895
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple data files as input #1895

Open
sharathmalladi opened this issue May 28, 2019 · 6 comments
Open

Allow multiple data files as input #1895

sharathmalladi opened this issue May 28, 2019 · 6 comments

Comments

@sharathmalladi
Copy link

@sharathmalladi sharathmalladi commented May 28, 2019

Describe the bug

Call vw with a bad argument and notice that vw does not return a non-zero error code. To detect whether vw rejected the arguments would require us to read the output and look for a line that says "sailing on!" .. which is not really a robust mechanism to return an error response.

To Reproduce

Steps to reproduce the behavior:
For example (notice the vw parameters are "bad vw arguments" which are invalid parameters):
VW COMMAND:

E:\sharathm\github\sharathmalladi-mwt-ds\DataScience>vw bad vw arguments -d D:/tmp/124bb2ca-a99f-489e-b29c-bc142baa6f51\6359742a010048a58c1892eabd731d4c\6359742a010048a58c1892eabd731d4c_merged_data_2019-01-03_2019-01-03.json.gz -p D:/tmp/124bb2ca-a99f-489e-b29c-bc142baa6f51\6359742a010048a58c1892eabd731d4c\6359742a010048a58c1892eabd731d4c_merged_data_2019-01-03_2019-01-03.json.gz.Custom Policy 1.pred predictions = D:/tmp/124bb2ca-a99f-489e-b29c-bc142baa6f51\6359742a010048a58c1892eabd731d4c\6359742a010048a58c1892eabd731d4c_merged_data_2019-01-03_2019-01-03.json.gz.Custom

Num weight bits = 18

learning rate = 0.5

initial_t = 0

power_t = 0.5

using no cache

Reading datafile = bad

can't open 'bad', sailing on!

num sources = 0

average since example example current current current

loss last counter weight label predict features

finished run

number of examples = 0

weighted example sum = 0.000000

weighted label sum = 0.000000

average loss = n.a.

total feature number = 0

E:\sharathm\github\sharathmalladi-mwt-ds\DataScience>echo %ERRORLEVEL%

0

Expected behavior

The error code after invoking vw should be non-zero since vw did not successfully output the predictions.

Observed Behavior

We instead get back an output that has a line that reads:
can't open 'bad', sailing on!

Environment

What version of VW did you use?
8.6.1

What OS or language did you use?
Windows command line

Additional context

None

@jackgerrits
Copy link
Member

@jackgerrits jackgerrits commented May 29, 2019

In this situation only bad is looked at out of bad vw arguments as a positional parameter for the --data option. This is a shortcut that's been around for some time. vw arguments are then ignored as unused values, and not options. The positional parameter actually overrides the value given by --data, and since bad is not a file a warning is printed when it can't be opened. So VW does actually exit successfully since there was no data to train on.

Yes, this seems counter intuitive. Handling the positional parameter in combination with the named parameter has been kind of tricky. I do agree, this seems like a bug. Not 100% sure how to deal with it yet.

@jackgerrits
Copy link
Member

@jackgerrits jackgerrits commented May 30, 2019

So I think what could be done here is support multiple data files as input, and then if none of the files are able to be opened them VW will exit with a non-zero return code. You would also need to pass --no_stdin in order for it to work though as stdin is treated as another input file.

@lokitoth lokitoth added this to Needs triage in Bug Triage via automation Jun 20, 2019
@arielf
Copy link
Collaborator

@arielf arielf commented Jun 21, 2019

+1 to supporting multiple data files as inputs.
I wanted this useful feature for a long time.

@jackgerrits jackgerrits removed this from Needs triage in Bug Triage Jun 28, 2019
@jackgerrits jackgerrits changed the title vw should output a non-zero error code if the arguments are invalid Allow multiple data files as input Jul 19, 2019
@jackgerrits
Copy link
Member

@jackgerrits jackgerrits commented Mar 19, 2020

#2355 additionally proposed support for globbing as well as passing a directory to the -d option.

@Sharad24
Copy link
Contributor

@Sharad24 Sharad24 commented Mar 19, 2020

@jackgerrits Is anyone working on this issue?

@jackgerrits
Copy link
Member

@jackgerrits jackgerrits commented Mar 19, 2020

No, you're welcome to work on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.