You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/design/design_clustermodel.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,12 +38,12 @@ USING existed_pretrain_model
38
38
INTO my_cluster_model;
39
39
```
40
40
41
-
PREDICT SQL:
41
+
TO PREDICT SQL:
42
42
43
43
```sql
44
44
SELECT*
45
45
FROM input_table
46
-
PREDICT output_table.group_id
46
+
TO PREDICT output_table.group_id
47
47
USING my_cluster_model;
48
48
```
49
49
@@ -108,7 +108,7 @@ Therefore, there are four cases in total:
108
108
109
109
- In the first stage of the clustering model on SQLFlow, we plan to achieve the `first case`. We will achieve the other cases in the later.
110
110
111
-
- Users can use the trained cluster model in` PREDICTSQL` to predict the group of input_table to get output_table.
111
+
- Users can use the trained cluster model in` TOPREDICTSQL` to predict the group of input_table to get output_table.
112
112
113
113
- Finally, the user can perform a combined aggregation operation on the output_table based on the SQL statement to obtain a result_table, which can be saved to the local dataframe and then analyzed according to his own needs.
The basic idea of SQLFlow is to extend the SELECT statement of SQL to have the TO TRAIN and PREDICT clauses. For more discussion, please refer to the [syntax design](/doc/design/design_syntax.md). SQLFlow translates such "extended SQL statements" into submitter programs, which forward the part from SELECT to TO TRAIN or PREDICT, which we call the "standard part", to the SQL engine. SQLFlow also accepts the SELECT statement without TO TRAIN or PREDICT clauses and would forward such "standard statements" to the engine. It is noticeable that the "standard part" or "standard statements" are not standardized. For example, various engines use different syntax for `FULL OUTER JOIN`.
11
+
The basic idea of SQLFlow is to extend the SELECT statement of SQL to have the TO TRAIN and TO PREDICT clauses. For more discussion, please refer to the [syntax design](/doc/design/design_syntax.md). SQLFlow translates such "extended SQL statements" into submitter programs, which forward the part from SELECT to TO TRAIN or TO PREDICT, which we call the "standard part", to the SQL engine. SQLFlow also accepts the SELECT statement without TO TRAIN or TO PREDICT clauses and would forward such "standard statements" to the engine. It is noticeable that the "standard part" or "standard statements" are not standardized. For example, various engines use different syntax for `FULL OUTER JOIN`.
12
12
13
13
- Hive supports `FULL OUTER JOIN` directly.
14
14
- MySQL doesn't have `FULL OUTER JOIN`. However, a user can emulates `FULL OUTER JOIN` using `LEFT JOIN`, `UNION` and `RIGHT JOIN`.
Copy file name to clipboardExpand all lines: doc/design/design_support_multiple_sql_statements.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ While splitting at the client-side is relatively simple to implement. We prefer
20
20
21
21
### Splitting Technique: Hybrid Parser vs. Lexer
22
22
23
-
The hybrid parser solution uses the third-party SQL parser (like [TiDB parser](https://github.com/pingcap/parser/blob/master/parser.y)) and SQLFlow parser to determine the end of an SQL statement. The third-party SQL parser first parses the extended SQL statement. It will raise error near SQLFlow extended keywords, like TO TRAIN and PREDICT. Then the SQLFlow parser starts from the error position and stops at the end of the first statement. However, this solution relies on the third-party SQL parser to report the error **accurately** on the keywords, like TO TRAIN and PREDICT, that it can't recognize.
23
+
The hybrid parser solution uses the third-party SQL parser (like [TiDB parser](https://github.com/pingcap/parser/blob/master/parser.y)) and SQLFlow parser to determine the end of an SQL statement. The third-party SQL parser first parses the extended SQL statement. It will raise error near SQLFlow extended keywords, like TO TRAIN and TO PREDICT. Then the SQLFlow parser starts from the error position and stops at the end of the first statement. However, this solution relies on the third-party SQL parser to report the error **accurately** on the keywords, like TO TRAIN and TO PREDICT, that it can't recognize.
24
24
25
25
The lexer solution scans the entire SQL statements, finds the `;` tokens, and splits the SQL based on the position of `;` token.
Copy file name to clipboardExpand all lines: doc/design/design_xgboost_on_sqlflow.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ The following example shows how to predict using the model `my_xgb_model`.
24
24
25
25
```sql
26
26
SELECT*FROM test_table
27
-
PREDICT pred_table.result
27
+
TO PREDICT pred_table.result
28
28
USING my_xgb_model;
29
29
```
30
30
@@ -42,4 +42,4 @@ The code generator `codegen_xgboost.go` outputs an XGBoost program in Python. It
42
42
1. It tells the SQL engine to run the SELECT statement and retrieve the training/test data. It saves the data into a text file, which could be loaded by XGBoost using the DMatrix interface.
43
43
1. Parse and resolve the WITH clause to fill the `xgboost.train` arguments and the XGBoost Parameters.
44
44
1. Save the trained model on disk.
45
-
1. For the PREDICT clause, it loads the trained model and test data and then outputs the prediction result to a SQL engine.
45
+
1. For the TO PREDICT clause, it loads the trained model and test data and then outputs the prediction result to a SQL engine.
0 commit comments