Skip to content

Conversation

@weiguoz
Copy link
Collaborator

@weiguoz weiguoz commented Oct 11, 2019

Fix #980

@weiguoz
Copy link
Collaborator Author

weiguoz commented Oct 11, 2019

I ran bash scripts/test_ir.sh, got an error:

time="2019-10-11T15:50:32Z" level=debug msg="runExtendedSQL SELECT *\nFROM housing.train\nTRAIN xgboost.gbtree\nWITH\n\t\tobjective=\"reg:squarederror\",\n\t\ttrain.num_boost_round = 30\n\t\tCOLUMN f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13\nLABEL target\nINTO sqlflow_models.my_xgb_regression_model; finished, elapsed:2.6401397s" package=sql
time="2019-10-11T15:50:32Z" level=error msg="runExtendedSQL error:training failed exit status 1" package=sql
2019/10/11 15:50:32 stream read err: rpc error: code = Unknown desc = training failed exit status 1

But runing the python code manually, the result looks good:

[15:51:37] 404x13 matrix with 5252 entries loaded from train.txt

I will check this out.

@weiguoz weiguoz changed the title [WIP] Execute xgboost train code generated by IR Execute xgboost train code generated by IR Oct 12, 2019
Comment on lines 1082 to 1094
FROM housing.test
PREDICT housing.xgb_predict.target
USING sqlflow_models.my_xgb_regression_model;`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some indents?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which style of the SQL string is better. How about following the former one?
In fact, I think this style is pretty good.

typhoonzero
typhoonzero previously approved these changes Oct 12, 2019
Copy link
Collaborator

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM++

@typhoonzero
Copy link
Collaborator

CI currently will fail because of pingcap/tidb#12648

if err != nil {
return err
}
code, err := xgb.Train(ir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about refining the interface xgb.Train(ir) to xgb.Train(&program, ir), that don't need to write String to the bytes.Buffer twice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still planning to use string because it is easier for people to understand. And we don't think too much about the performance here.

Copy link
Collaborator

@tonyyang-svail tonyyang-svail Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either way is fine, as long as it is consistent across all IR codegen. :)

@weiguoz weiguoz force-pushed the execute_ir_xgboost branch from 2cd2892 to 6f0f18d Compare October 12, 2019 16:52
tonyyang-svail
tonyyang-svail previously approved these changes Oct 12, 2019
Copy link
Collaborator

@tonyyang-svail tonyyang-svail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a few comments for possible improvements.

dtrain = xgb_dataset('train.txt', '''{{.TrainSelect}}''')
dtest = xgb_dataset('test.txt', '''{{.ValidationSelect}}''')
# FIXME(weiguoz): bring dtest back when VALIDATE clause is ready
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Maybe add a check in Train to make sure ValidationSelect in TrainIR is always empty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be done in #988

if err != nil {
return err
}
code, err := xgb.Train(ir)
Copy link
Collaborator

@tonyyang-svail tonyyang-svail Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either way is fine, as long as it is consistent across all IR codegen. :)

@weiguoz weiguoz merged commit c8a2832 into sql-machine-learning:develop Oct 13, 2019
@weiguoz weiguoz deleted the execute_ir_xgboost branch October 13, 2019 00:11
shendiaomo pushed a commit to shendiaomo/sqlflow that referenced this pull request Oct 22, 2019
* execute xgboost train code generated by IR

* add test_ir

* escape %

* fix ci

* save model

* ci for xgboost.train

* remove model save

* avoid starting the server twice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Intermediate Representation] Enable SQLFlow to execute the XGB train code generated by IR

5 participants