Skip to content

Conversation

@sperlingxx
Copy link
Contributor

@sperlingxx sperlingxx commented Aug 14, 2019

This PR include:

  • update version of xgboost-launcher in build_image
  • python main function and xgboost code template based on it
  • xgCreatePredictionTable
  • xgboostFiller refine
  • integrate xgboost into sqlflow.executor
  • integrated test case

Copy link
Collaborator

@tonyyang-svail tonyyang-svail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sperlingxx Thanks for submitting this PR. Left a few comments.

@Yancey1989 please kindly continue the review.

FROM iris.test
PREDICT iris.predict
WITH
append_columns = [sepal_length, sepal_width, petal_length, petal_width],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid append_columns? A training job should memorize the field names later used for a prediction job.

fmt.Fprintf(&b, "%s %s, ", r.DetailColumn, stype)
}
// add encoding column
if len(r.EncodingColumn) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the usage of the encoding column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding column stores leaf indices of this sample in each tree. We transform leaf indices in a string which format like: "index_0,index_1,......,index_n"

FROM iris.test
PREDICT iris.predict
WITH
append_columns = [sepal_length, sepal_width, petal_length, petal_width],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does prob_column , detail_column is required? Can we add these columns by default, so the WITH statment can be shorter...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob_column, detail_column, leaf_column and append_columns are optional. Only result_column is required for prediction task, which has a default column field: "result".

Copy link
Collaborator

@Yancey0623 Yancey0623 Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can keep consistent with the TF example, specific result column in PREDICT iris.predict.result?

Copy link
Collaborator

@Yancey0623 Yancey0623 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT with only one comment, can fix it in this PR or the next PR.

@sperlingxx sperlingxx merged commit da6c1a2 into sql-machine-learning:develop Aug 15, 2019
@sperlingxx sperlingxx deleted the xgboost_dev branch August 15, 2019 10:02
@sperlingxx sperlingxx mentioned this pull request Aug 16, 2019
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants