Skip to content

Commit 32fb9df

Browse files
tonyyang-svailwangkuiyi
authored andcommitted
[Syntax] Change PREDICT to TO PREDICT (#1015)
* pass all tests in pkg/sql * pass lexer test * pass tests at pkg/sql/
1 parent 09a110c commit 32fb9df

34 files changed

+145
-125
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Done training
4444
```sql
4545
sqlflow> SELECT *
4646
FROM iris.test
47-
PREDICT iris.predict.class
47+
TO PREDICT iris.predict.class
4848
USING sqlflow_models.my_dnn_model;
4949

5050
...

cmd/sqlflowserver/main_test.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -551,7 +551,7 @@ INTO sqlflow_models.my_dnn_model;`, caseDB, caseTrainTable)
551551
ParseRow(stream)
552552
predSQL := fmt.Sprintf(`SELECT *
553553
FROM %s.%s
554-
PREDICT %s.%s.class
554+
TO PREDICT %s.%s.class
555555
USING sqlflow_models.my_dnn_model;`, caseDB, caseTestTable, caseDB, casePredictTable)
556556

557557
stream, err = cli.Run(ctx, sqlRequest(predSQL))
@@ -610,7 +610,7 @@ INTO sqlflow_models.my_dnn_model_custom;`
610610

611611
predSQL := `SELECT *
612612
FROM iris.test
613-
PREDICT iris.predict.class
613+
TO PREDICT iris.predict.class
614614
USING sqlflow_models.my_dnn_model_custom;`
615615

616616
stream, err = cli.Run(ctx, sqlRequest(predSQL))
@@ -1014,7 +1014,7 @@ INTO sqlflow_models.my_regression_model;`)
10141014

10151015
predSQL := fmt.Sprintf(`SELECT *
10161016
FROM housing.test
1017-
PREDICT housing.predict.target
1017+
TO PREDICT housing.predict.target
10181018
USING sqlflow_models.my_regression_model;`)
10191019

10201020
stream, err = cli.Run(ctx, sqlRequest(predSQL))
@@ -1090,7 +1090,7 @@ func CasePredictXGBoostRegression(t *testing.T) {
10901090

10911091
predSQL := fmt.Sprintf(`SELECT *
10921092
FROM housing.test
1093-
PREDICT housing.xgb_predict.target
1093+
TO PREDICT housing.xgb_predict.target
10941094
USING sqlflow_models.my_xgb_regression_model;`)
10951095

10961096
stream, err := cli.Run(ctx, sqlRequest(predSQL))

doc/design/design_ant_xgboost.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ INTO sqlflow_models.xgboost_model_table;
6262
select
6363
c1, c2, c3, c4
6464
from kaggle_credit_fraud_development_data
65-
PREDICT kaggle_credit_fraud_development_data.class
65+
TO PREDICT kaggle_credit_fraud_development_data.class
6666
USING sqlflow_models.xgboost_model_table;
6767
```
6868

doc/design/design_clustermodel.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,12 @@ USING existed_pretrain_model
3838
INTO my_cluster_model;
3939
```
4040

41-
PREDICT SQL:
41+
TO PREDICT SQL:
4242

4343
``` sql
4444
SELECT *
4545
FROM input_table
46-
PREDICT output_table.group_id
46+
TO PREDICT output_table.group_id
4747
USING my_cluster_model;
4848
```
4949

@@ -108,7 +108,7 @@ Therefore, there are four cases in total:
108108

109109
- In the first stage of the clustering model on SQLFlow, we plan to achieve the `first case`. We will achieve the other cases in the later.
110110

111-
- Users can use the trained cluster model in ` PREDICT SQL` to predict the group of input_table to get output_table.
111+
- Users can use the trained cluster model in ` TO PREDICT SQL` to predict the group of input_table to get output_table.
112112

113113
- Finally, the user can perform a combined aggregation operation on the output_table based on the SQL statement to obtain a result_table, which can be saved to the local dataframe and then analyzed according to his own needs.
114114

doc/design/design_database_abstraction_layer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ SQLFlow calls Go's [standard database API](https://golang.org/pkg/database/sql/)
88

99
### Data Retrieval
1010

11-
The basic idea of SQLFlow is to extend the SELECT statement of SQL to have the TO TRAIN and PREDICT clauses. For more discussion, please refer to the [syntax design](/doc/design/design_syntax.md). SQLFlow translates such "extended SQL statements" into submitter programs, which forward the part from SELECT to TO TRAIN or PREDICT, which we call the "standard part", to the SQL engine. SQLFlow also accepts the SELECT statement without TO TRAIN or PREDICT clauses and would forward such "standard statements" to the engine. It is noticeable that the "standard part" or "standard statements" are not standardized. For example, various engines use different syntax for `FULL OUTER JOIN`.
11+
The basic idea of SQLFlow is to extend the SELECT statement of SQL to have the TO TRAIN and TO PREDICT clauses. For more discussion, please refer to the [syntax design](/doc/design/design_syntax.md). SQLFlow translates such "extended SQL statements" into submitter programs, which forward the part from SELECT to TO TRAIN or TO PREDICT, which we call the "standard part", to the SQL engine. SQLFlow also accepts the SELECT statement without TO TRAIN or TO PREDICT clauses and would forward such "standard statements" to the engine. It is noticeable that the "standard part" or "standard statements" are not standardized. For example, various engines use different syntax for `FULL OUTER JOIN`.
1212

1313
- Hive supports `FULL OUTER JOIN` directly.
1414
- MySQL doesn't have `FULL OUTER JOIN`. However, a user can emulates `FULL OUTER JOIN` using `LEFT JOIN`, `UNION` and `RIGHT JOIN`.

doc/design/design_elasticdl_on_sqlflow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ INTO trained_elasticdl_keras_classifier;
3333
SELECT
3434
c1, c2, c3, c4
3535
FROM prediction_data
36-
PREDICT prediction_results_table
36+
TO PREDICT prediction_results_table
3737
WITH
3838
num_classes = 10
3939
USING trained_elasticdl_keras_classifier;

doc/design/design_submitter.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ type TrainDescription struct {
3636

3737
// SELECT *
3838
// FROM iris.test
39-
// PREDICT iris.predict.class
39+
// TO PREDICT iris.predict.class
4040
// USING sqlflow_models.my_dnn_model;
4141
type PredDescription struct {
4242
StandardSelect string // e.g. SELECT * FROM iris.test

doc/design/design_support_multiple_sql_statements.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ While splitting at the client-side is relatively simple to implement. We prefer
2020

2121
### Splitting Technique: Hybrid Parser vs. Lexer
2222

23-
The hybrid parser solution uses the third-party SQL parser (like [TiDB parser](https://github.com/pingcap/parser/blob/master/parser.y)) and SQLFlow parser to determine the end of an SQL statement. The third-party SQL parser first parses the extended SQL statement. It will raise error near SQLFlow extended keywords, like TO TRAIN and PREDICT. Then the SQLFlow parser starts from the error position and stops at the end of the first statement. However, this solution relies on the third-party SQL parser to report the error **accurately** on the keywords, like TO TRAIN and PREDICT, that it can't recognize.
23+
The hybrid parser solution uses the third-party SQL parser (like [TiDB parser](https://github.com/pingcap/parser/blob/master/parser.y)) and SQLFlow parser to determine the end of an SQL statement. The third-party SQL parser first parses the extended SQL statement. It will raise error near SQLFlow extended keywords, like TO TRAIN and TO PREDICT. Then the SQLFlow parser starts from the error position and stops at the end of the first statement. However, this solution relies on the third-party SQL parser to report the error **accurately** on the keywords, like TO TRAIN and TO PREDICT, that it can't recognize.
2424

2525
The lexer solution scans the entire SQL statements, finds the `;` tokens, and splits the SQL based on the position of `;` token.
2626

doc/design/design_syntax.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ Similarly, to infer the class (fraud or regular), we could
131131

132132
```sql
133133
SELECT * FROM kaggle_credit_fraud_development_data
134-
PREDICT kaggle_credit_fraud_development_data.class
134+
TO PREDICT kaggle_credit_fraud_development_data.class
135135
USING sqlflow_models.my_model_table;
136136
```
137137

doc/design/design_xgboost_on_sqlflow.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The following example shows how to predict using the model `my_xgb_model`.
2424

2525
``` sql
2626
SELECT * FROM test_table
27-
PREDICT pred_table.result
27+
TO PREDICT pred_table.result
2828
USING my_xgb_model;
2929
```
3030

@@ -42,4 +42,4 @@ The code generator `codegen_xgboost.go` outputs an XGBoost program in Python. It
4242
1. It tells the SQL engine to run the SELECT statement and retrieve the training/test data. It saves the data into a text file, which could be loaded by XGBoost using the DMatrix interface.
4343
1. Parse and resolve the WITH clause to fill the `xgboost.train` arguments and the XGBoost Parameters.
4444
1. Save the trained model on disk.
45-
1. For the PREDICT clause, it loads the trained model and test data and then outputs the prediction result to a SQL engine.
45+
1. For the TO PREDICT clause, it loads the trained model and test data and then outputs the prediction result to a SQL engine.

0 commit comments

Comments
 (0)