-
Notifications
You must be signed in to change notification settings - Fork 705
Execute xgboost train code generated by IR #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute xgboost train code generated by IR #982
Conversation
|
I ran But runing the python code manually, the result looks good: I will check this out. |
| FROM housing.test | ||
| PREDICT housing.xgb_predict.target | ||
| USING sqlflow_models.my_xgb_regression_model;`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some indents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure which style of the SQL string is better. How about following the former one?
In fact, I think this style is pretty good.
typhoonzero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++
|
CI currently will fail because of pingcap/tidb#12648 |
7f3b20b to
d387151
Compare
| if err != nil { | ||
| return err | ||
| } | ||
| code, err := xgb.Train(ir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about refining the interface xgb.Train(ir) to xgb.Train(&program, ir), that don't need to write String to the bytes.Buffer twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still planning to use string because it is easier for people to understand. And we don't think too much about the performance here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either way is fine, as long as it is consistent across all IR codegen. :)
2cd2892 to
6f0f18d
Compare
tonyyang-svail
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left a few comments for possible improvements.
| dtrain = xgb_dataset('train.txt', '''{{.TrainSelect}}''') | ||
| dtest = xgb_dataset('test.txt', '''{{.ValidationSelect}}''') | ||
| # FIXME(weiguoz): bring dtest back when VALIDATE clause is ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Maybe add a check in Train to make sure ValidationSelect in TrainIR is always empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be done in #988
| if err != nil { | ||
| return err | ||
| } | ||
| code, err := xgb.Train(ir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either way is fine, as long as it is consistent across all IR codegen. :)
6f0f18d to
21f6dd6
Compare
* execute xgboost train code generated by IR * add test_ir * escape % * fix ci * save model * ci for xgboost.train * remove model save * avoid starting the server twice
Fix #980