-
Notifications
You must be signed in to change notification settings - Fork 705
setup xgboost on sqlflow #662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setup xgboost on sqlflow #662
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sperlingxx Thanks for submitting this PR. Left a few comments.
@Yancey1989 please kindly continue the review.
| FROM iris.test | ||
| PREDICT iris.predict | ||
| WITH | ||
| append_columns = [sepal_length, sepal_width, petal_length, petal_width], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid append_columns? A training job should memorize the field names later used for a prediction job.
| fmt.Fprintf(&b, "%s %s, ", r.DetailColumn, stype) | ||
| } | ||
| // add encoding column | ||
| if len(r.EncodingColumn) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the usage of the encoding column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoding column stores leaf indices of this sample in each tree. We transform leaf indices in a string which format like: "index_0,index_1,......,index_n"
| FROM iris.test | ||
| PREDICT iris.predict | ||
| WITH | ||
| append_columns = [sepal_length, sepal_width, petal_length, petal_width], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does prob_column , detail_column is required? Can we add these columns by default, so the WITH statment can be shorter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prob_column, detail_column, leaf_column and append_columns are optional. Only result_column is required for prediction task, which has a default column field: "result".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can keep consistent with the TF example, specific result column in PREDICT iris.predict.result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT with only one comment, can fix it in this PR or the next PR.
This PR include: