Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f7bf7c9
Created WhatTheHack template stub
Jul 18, 2023
c668964
Initial Add
Aug 11, 2023
2884ea4
Updates
Aug 13, 2023
08d16fe
Add spell checks
Aug 15, 2023
3643b19
Merge branch 'master' into xxx-FabricLakehouse
Aug 16, 2023
36df6b9
Minor updates
Aug 16, 2023
864c818
Rename hackathon folder
Sep 7, 2023
3e5d37b
Merge branch 'master' into xxx-FabricLakehouse
Sep 7, 2023
414ef97
Update spellcheck word list
Sep 7, 2023
7878596
Updated the Success Criteria to a numbered list
jcbendernh Sep 16, 2023
ccd1054
Updated the Solution 00, grammatical.
jcbendernh Sep 16, 2023
e163c61
Update .wordlist.txt
jordanbean-msft Sep 19, 2023
215f4b9
Update README.md
jordanbean-msft Sep 19, 2023
c280b80
Update Solution-00.md
jordanbean-msft Sep 19, 2023
0c9b932
Update Solution-04.md
jordanbean-msft Sep 19, 2023
d851303
Update Solution-05.md
jordanbean-msft Sep 19, 2023
798b97a
Update Solution-05.md
jordanbean-msft Sep 19, 2023
0679b0f
Update Challenge-00.md
jordanbean-msft Sep 19, 2023
2326f84
Update Challenge-00.md
jordanbean-msft Sep 19, 2023
5bd503c
Update Challenge-01.md
jordanbean-msft Sep 19, 2023
b8b91bf
Update Challenge-04.md
jordanbean-msft Sep 19, 2023
f20975f
Update Challenge-05.md
jordanbean-msft Sep 19, 2023
2853206
Update Solution-05.md
jordanbean-msft Sep 19, 2023
e0104ec
Update Solution-00.md
jordanbean-msft Sep 19, 2023
7b704dc
Update Solution-03.md
jordanbean-msft Sep 19, 2023
b3ac670
Merge branch 'xxx-FabricLakehouse' into xxx-FabricLakehouse
liesel-h Sep 20, 2023
50199a1
Merge pull request #2 from jcbendernh/xxx-FabricLakehouse
liesel-h Sep 20, 2023
94e1740
Merge branch 'microsoft:master' into xxx-FabricLakehouse
liesel-h Oct 18, 2023
719cc05
Additional lakehouse notes. Re-work coach guides
Oct 18, 2023
cdafc43
Add shipwrecks geojson data
liesel-h Oct 24, 2023
234482a
Merge branch 'xxx-FabricLakehouse' of https://github.com/liesel-h/Wha…
Oct 25, 2023
36fa62d
Move shipwrecks geojson to Solutions folder
Oct 26, 2023
213b2e1
Rework student and coach guides.
Oct 27, 2023
b68fb06
Updates post feedback. Reworked Coach guides, solution files, cleaned…
Jan 5, 2024
5c0f446
Update solution and resources code. Cleanup files
Jan 5, 2024
441523b
Jekyll complaining about fenced M code :(
Jan 5, 2024
f93a201
Jekyll parsing M again :(
Jan 5, 2024
d7222e0
Make word change
kriation Jun 17, 2024
32d3a1e
Make whitespace adjustment
kriation Jun 17, 2024
bdabe8e
Make wording adjustment of verb tense
kriation Jun 17, 2024
cab0282
Make change to position of markdown link
kriation Jun 17, 2024
5058dd3
Make change to example abbreviation
kriation Jun 17, 2024
4238541
Make correction to abbreviation
kriation Jun 17, 2024
50635e1
Make correction to abbreviation
kriation Jun 17, 2024
b31d8cc
Cut rogue comma
kriation Jun 17, 2024
520fc37
Cut rogue comma
kriation Jun 17, 2024
0e634e3
Make capitalization change
kriation Jun 17, 2024
619590c
Make change to markdown link
kriation Jun 17, 2024
76a07c4
Make slight word change to sentence
kriation Jun 17, 2024
646579f
Make minor spelling corrections
kriation Jun 17, 2024
ed45c26
Make minor spelling change
kriation Jun 17, 2024
448d958
Start optimizing .wordlist.txt for this content
kriation Jun 17, 2024
d88a159
Optimize .wordlist.txt
kriation Jun 17, 2024
defc4f1
Make minor spelling and grammatical corrections
kriation Jun 17, 2024
7499b67
Add word to .wordlist.txt
kriation Jun 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions 067-FabricLakehouse/.wordlist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Yarr
yarr
PySpark
TMTOWTDI
DAX
dataflow
ye'll
BOM
Nemo
bom
spatialdata
geojson
ECMWF
planetarycomputer
stac
ipynb
WAM
dimensionally
datawarehousing
BAFTA
Leeuwin
Batavia
Boorloo
Liesel
Whadjuk
FabricTrial
getpowerbi
workspace's
buddied
shapefiles
hoc
impactful
shtml
Binary file added 067-FabricLakehouse/Coach/Lectures.pptx
Binary file not shown.
269 changes: 269 additions & 0 deletions 067-FabricLakehouse/Coach/README.md

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 24 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-00.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Challenge 00 - Prerequisites - Grab your fins and a full tank! - Coach's Guide

**[Home](./README.md)** - [Next Solution >](./Solution-01.md)

## Notes & Guidance

Please make sure that the students review the [introduction](../README.md) and [Challenge 0 - Prerequisites - Ready, Set, GO! - Student's Guide](../Student/Challenge-00.md) ahead of time. Also ensure you have read the prerequisites section of the [Coach's Guide](./README.md).

Students will need access to a Fabric enabled workspace and Power BI desktop. Again, see the pre-reqs in the [Coach's Guide](./README.md) for more details.

Students will need to have created a Lakehouse in their workspace either ahead of time, or on the day. This is covered in the [Create a Lakehouse](https://learn.microsoft.com/en-us/fabric/data-engineering/create-lakehouse) tutorial on Learn for those not familiar.

You should provide the students with the ``resources.zip`` file and ensure they have uploaded each of the contained folders to their Lakehouse. Unfortunately, this has to be done one folder at a time so allow a small amount of time to complete. See the student resources section of the [Coach's Guide](./README.md) for details on creating the ``resources.zip`` file.

## Gotchas

Overall, this challenge is designed to level set student environments and ensure they are ready to undertake the rest of the hack. Some things to watch out for:

1. **Fabric not enabled in the tenant.** This is a common issue and needs validating/resolving when planning the hack. During the organisation stage, ensure that tenant(s) to be used by students have been enabled for Fabric, either globally or to a specific group of users (that the student is a member of). See Enabling Microsoft Fabric in the [README](./README.md)
2. **No Fabric capacity.** A Fabric capacity is required. A small SKU is acceptable (F2-F4). A trial developer subscription may be of use as well. See [README](./README.md)
3. **Workspace not Fabric enabled, or no permissions to create new workspace.** Ideally, a new workspace should be created for each student ahead of time either by their tenant admin or, if they have permissions, themselves. Students require at least Contributor role, although workspace Admin is suggested. See [learn.microsoft.com/en-us/fabric/get-started/workspaces](https://learn.microsoft.com/en-us/fabric/get-started/workspaces)
4. **Uploading Data to Wrong Location** You should ensure the students upload the data to the correct location (i.e., Files/Raw, Files/Bronze etc) not Files/data/Raw etc. This is a common mistake and can be confusing for students especially if you later provide them with the included code hints/solutions.


56 changes: 56 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-01.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Challenge 01 - Finding Nemo - Coach's Guide

[< Previous Solution](./Solution-00.md) - **[Home](./README.md)** - [Next Solution >](./Solution-02.md)

## Notes & Guidance

This first challenge is all about finding the data but not importing it (yet). The output is a list of datasets that meet the requirements, a strategy for ingesting / processing and a selection of the "best" tool - notebook, dataflow etc. Actual development starts in challenge 2.

For this challenge, the students will be searching for suitable data sources online. You should ensure that they are aware of the following:

- Licensing
- Copyright

This object of this challenge is to get the students to think about:

- the data they need to meet the requirements
- what sources are available
- how it is licensed
- how they can land this data automatically in OneLake

Whilst there is no formal output of this challenge, you should ensure that the students have made a note of the datasets they have found, and how they plan to ingest them as this will be useful in the next challenge.

## Solution

### (Strongly) Recommended Datasets

The example solutions have been built using Australian Bureau of Meteorology and Western Australian Museum datasets. These are the (strongly) recommended sources for the hack. However, students are free to use any datasets they like, as long as they meet the requirements to spatially locate a wreck and weather conditions at that point. Substitute datasets need to be licensed appropriately, and if students do decided to attempt creating a custom solution, you should ensure they are aware of the time constraints and the need to move on to the next challenge. You may also need to provide deeper technical guidance if they are not familiar with the tools but hey, if they want to try, let them! They can always revert to the provided solutions if they get stuck.

These two BOM datasets comprise forecasts for marine zones (with a textual zone key) and geo-coded marine zones (to allow spatially locating shipwrecks to a zone).

- BOM FTP data services: http://www.bom.gov.au/catalogue/data-feeds.shtml and http://www.bom.gov.au/catalogue/anon-ftp.shtml

- ``IDW11160`` - Coastal Waters Forecast - All Districts (WA)
- ``IDM000003`` - Marine Zones - http://reg.bom.gov.au/catalogue/spatialdata.pdf


This dataset contains wreck details (date, name, description etc) in GeoJSON format, allowing joining to ``IDM000003`` and by extension, ``IDW11160``.

- WA Museum
- ``WAM-002`` https://catalogue.data.wa.gov.au/dataset/shipwrecks (requires a free SLIP account and is CC BY 4.0)

### More Advanced Datasets

More advanced students might like to include climate (temperature and wave) models from [ECMWF Open Data](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) available via the Microsoft Planetary Computer. See https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/

### Common Issues / Pitfalls

- Students may struggle to find suitable datasets. Guide those who are a bit lost at sea with some hints such as
- _"Who would be interested in documenting the history of shipwrecks?"_
- A: Museums!
- _"And are there any in Western Australia?"_
- A: WA Museum (and Shipwreck Galleries in particular)
- _"Is there an open data portal for all Australian government data (Federal and State)?"_
- A: [data.gov.au](https://data.gov.au/)
- "Is there a government agency that might have data on weather conditions?"
- A: [Bureau of Meteorology](http://www.bom.gov.au/)
48 changes: 48 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Challenge 02 - Land Ho! - Coach's Guide

[< Previous Solution](./Solution-01.md) - **[Home](./README.md)** - [Next Solution >](./Solution-03.md)

## Notes & Guidance

This challenge implements the design developed in Challenge 1, landing data in a "raw" format and cleaning to bronze, ready to further transform in [the next Challenge](./Solution-03.md).

## Outcome
At the end of this challenge, students should have landed their data in OneLake and cleaned it to bronze. The data should be available in the Lakehouse as GeoJSON files for shipwrecks and marine zones, and as an XML file for forecasts.

## Solution

Students may wish:

- to download the data manually and upload to OneLake
- to use a notebook, a dataflow or a combination of both to retrieve and land the data
- to use the provided raw data files

Any of these are acceptable.

Once the raw data has been landed, students will need to clean the data to bronze. The example solution uses a notebook to perform this step for shipwrecks and marine zones as this is the most applicable tool for processing spatial data. Since parquet does not support ``geometry`` types, shipwrecks and marine zones are stored as GeoJSON in bronze.

> **A Note on Forecasts**
>Some students may wish to transform forecasts data from XML to json, csv, etc. and write to bronze files with a dataflow. They may then want to process this file to silver using Spark as part of Challenge 3. Whilst not strictly necessary it is perfectly acceptable to do so. However, the example dataflow solution in [Challenge 3](./Solution-03.md) processes forecasts directly to silver skipping this intermediate step.

#### Example Notebook - Data Exploration

An important part of data engineering is understanding the data. Students should be encouraged to explore the data and a sample notebook is provided in the [Solutions](./Solutions) folder ``Data Exploration.ipynb``.

Coaches can use this notebook to demonstrate how to explore the data, as prompts to help guide students, and to help students with the steps required to initially clean the data to bronze.

#### Example Notebook - Shipwrecks and Marine Zones Data Engineering

Challenge 2 and 3 solutions for shipwrecks and marine zones are contained in the notebook [Solution - Data Engineering.ipynb](./Solutions/Solution%20-%20Data%20Engineering.ipynb).


The notebook contains notes on the steps required for this challenge (and challenge 3). Coaches should step through each code cell to become familiar with the overall code, and be able to use snippets of code from this notebook to help students.

#### Uploading the Notebooks

Coaches should upload notebooks to their workspace following [How to use Microsoft Fabric notebooks](https://learn.microsoft.com/en-us/fabric/data-engineering/how-to-use-notebook#import-existing-notebooks).

#### Advanced Students

More (very..) advanced students might like to include climate (temperature and wave) models from [ECMWF Open Data](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) available via the Microsoft Planetary Computer. Coaches

An example notebook is included in the [Solutions](./Solutions) folder ``Loading Planetary Computer Climate Prediction Models.ipynb``.
141 changes: 141 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-03.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Challenge 03 - Swab the Decks! - Coach's Guide

[< Previous Solution](./Solution-02.md) - **[Home](./README.md)** - [Next Solution >](./Solution-04.md)

## Notes & Guidance

Challenge Three is about cleaning and loading data from ``Bronze`` files to ``Silver`` delta tables in preparation for reporting.

The students should be encouraged to explore both notebooks and data flows (for example, processing BOM forecast XML is easy in a dataflow, but the WAM-002 data is better suited to a notebook).

Overall, the method used by the students are very much a design choice by each.

## Solutions

Solutions are contained in the [Solutions](./Solutions) folder:

__Dataflow2__

- ``Load BOM Forecasts.pqt`` - dataflow template - M code for the dataflow is below.

__Notebooks__

- ``Solution - Data Engineering.ipynb`` - this notebook contains a solution for both [Challenge 2](./Solution-02.md) and [Challenge 3](./Solution-03.md) and should be read in conjunction with ``Data Exploration.ipynb``.

- ``Data Exploration.ipynb`` - shows one way to explore the data and the contains the steps required to clean the data to bronze, and to enrich to silver. Coaches should step through each code cell to become familiar with the overall code, and be able to use snippets of code from this notebook to help students.

- ``Loading Planetary Computer Climate Prediction Models.ipynb`` - super bonus boss level - loads ECMWF climate models from the Microsoft Planetary Computer for students who want to go further. **Warning** this is hard...

__Misc__

- ``Cleanup.ipynb`` - cleans up OneLake tables and files
- ``troubleshooting/Cancel-Dataflow.ps1`` - a PowerShell script to cancel a dataflow (or at least mark the metadata as cancelled). Rarely a dataflow may not complete correctly. This script can be used to help cancel the dataflow so it can be re-run.

---

## Dataflow

Coaches should import the dataflow template ``Load BOM Forecasts.pqt`` from the solutions folder for an example of loading the ``IDW11160.xml`` file into a Lakehouse table.

**Note: this dataflow assumes the source file is ``Files/Raw/BOM/IDW11160.xml`` the Lakehouse.**

### Importing the Dataflow

Coaches should review [Move queries from Dataflow Gen1 to Dataflow Gen2](https://learn.microsoft.com/en-us/fabric/data-factory/move-dataflow-gen1-to-dataflow-gen2) for an overview of the process.

1. Navigate to your Lakehouse. From the URL, copy the ``workspaceId`` (the part after ``/group/`` and before ``/lakehouses`` in yellow below) and the ``lakehouseId`` (the part after ``/lakehouses/`` in red below).
![](./images/workspacelakehouseid.png)
1. Navigate back to your workspace, and then create a new **Dataflow Gen 2**
![](images/dataflowgen2.png)
**Important! Make sure you select Dataflow Gen 2 from the Data Factory section.**
1. Import the ``Load BOM Forecasts.pqt`` template
![](images/importdataflow.png)
1. You should now see a new dataflow.
![](images/newdataflow.png)
1. Set the credentials by clicking on the ``Configure connection`` button and click ``Connect``
![](images/dataflowcreds.png)
1. Update the parameters with the ``workspaceId`` and ``lakehouseId`` from step 1.
1. Set the data destination for the Forecast query. Click on the query and set the destination to ``Lakehouse``
![](images/destination.png)
1. Navigate to the correct workspace and lakehouse.
1. Verify the column mappings (you may have to change type ``any`` to the correct type (or ``text``)
1. Finally Publish the dataflow. The dataflow should automatically run and load the data into ``Forecast`` table in the Lakehouse.
![](images/forecasts.png)


### Manually creating the dataflow

The dataflow can also be manually created.

1. First, create a new dataflow step 1-2 above).
1. Next, add two parameters for ``workspaceId`` and ``lakehouseId`` (see [Parameters - Power Query](https://learn.microsoft.com/en-us/power-query/power-query-query-parameters#creating-a-parameter).
1. Finally, add the following code as to a blank query (see [Share a Query](https://learn.microsoft.com/en-us/power-query/share-query#copy-the-m-code).

Set the credentials and publish the dataflow as above.

```M
let
Source = Lakehouse.Contents([]),
Navigation = Source{[workspaceId = workspaceId]}[Data],
#"Navigation 1" = Navigation{[lakehouseId = lakehouseId]}[Data],
#"Navigation 2" = #"Navigation 1"{[Id = "Files", ItemKind = "Folder"]}[Data],
#"Navigation 3" = #"Navigation 2"{[Name = "Raw"]}[Content],
#"Navigation 4" = #"Navigation 3"{[Name = "BOM"]}[Content],
#"Navigation 5" = #"Navigation 4"{[Name = "IDW11160.xml"]}[Content],
#"Imported XML" = Xml.Tables(#"Navigation 5"),
#"Navigation 6" = #"Imported XML"{0}[forecast],
#"Navigation 7" = #"Navigation 6"{0}[area],
#"Changed column type 1" = Table.TransformColumnTypes(#"Navigation 7", { {"Attribute:aac", type text}, {"Attribute:description", type text}, {"Attribute:type", type text}, {"Attribute:parent-aac", type text} }),
#"Expanded forecast-period" = Table.ExpandTableColumn(#"Changed column type 1", "forecast-period", {"text", "Attribute:start-time-local", "Attribute:end-time-local", "Attribute:start-time-utc", "Attribute:end-time-utc"}, {"text", "Attribute:start-time-local", "Attribute:end-time-local", "Attribute:start-time-utc", "Attribute:end-time-utc"}),
#"Expanded text" = Table.ExpandTableColumn(#"Expanded forecast-period", "text", {"Element:Text", "Attribute:type"}, {"Element:Text", "Attribute:type.1"}),
#"Pivoted column" = Table.Pivot(Table.TransformColumnTypes(#"Expanded text", {{"Attribute:type.1", type text}}), List.Distinct(Table.TransformColumnTypes(#"Expanded text", {{"Attribute:type.1", type text}})[#"Attribute:type.1"]), "Attribute:type.1", "Element:Text"),
#"Removed columns" = Table.RemoveColumns(#"Pivoted column", {"Attribute:start-time-local", "Attribute:end-time-local"}),
#"Renamed columns" = Table.RenameColumns(#"Removed columns", { {"Attribute:start-time-utc", "StartTime"}, {"Attribute:end-time-utc", "EndTime"}, {"Attribute:aac", "AAC"}, {"Attribute:description", "Description"}, {"Attribute:type", "Type"}, {"Attribute:parent-aac", "ParentAAC"}, {"synoptic_situation", "SynopticSituation"}, {"preamble", "Preamble"}, {"warning_summary_footer", "WarningSummaryFooter"}, {"product_footer", "ProductFooter"}, {"postamble", "Postamble"}, {"forecast_winds", "ForecastWinds"}, {"forecast_seas", "ForecastSeas"}, {"forecast_swell1", "ForecastSwell"}, {"forecast_weather", "ForecastWeather"} })
in
#"Renamed columns",
#"Changed column type" = Table.TransformColumnTypes(Source, { {"StartTime", type datetime}, {"EndTime", type datetime}, {"SynopticSituation", type text}, {"Preamble", type text}, {"WarningSummaryFooter", type text}, {"ProductFooter", type text}, {"Postamble", type text}, {"ForecastWinds", type text}, {"ForecastSeas", type text}, {"ForecastSwell", type text}, {"ForecastWeather", type text}, {"forecast_swell2", type text} })
in
#"Changed column type"
```

### Troubleshooting

Between the time of writing and the hack, there may have been changes made to the BOM xml file. Coaches are strongly advised to **download the latest forecast file and validate the dataflow** before the hack. Commonly, we have seen columns not included in some forecast files, or forecasts within a file which will cause the code above to fail.

We welcome contributions to help make this M code a little more robust.

## Building the Semantic Data Model

Fabric will create a default semantic model once the Delta tables have been written to OneLake that can be used for reporting, but coaches may wish to encourage students to expand this model (or create a custom model).


Students familiar with Power BI will most likely want to use Power BI Desktop to build their semantic model. This is not the way. Direct Lake semantic models are created and edited in the service, and coaches should familiarise themselves with the model editor

To launch the model editor, navigate to your workspace, and then select the Lakehouse. In the Lakehouse view, switch the the ``SQL Analytics Endpoint.`` then select the ``Model`` tab.

![](images/sqlanalysisendpoint.png)

The model window will open.

![](images/modelview.png)

From here, relationships can be defined, measures created etc as per standard Power BI modelling.

Refer to [Default Power BI semantic models in Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/data-warehouse/semantic-models)

### Optional - Gold Semantic Model

More advanced students (or those with a data modelling background) may wish to create a dimensional model using Spark, data flows or other tooling, instead of using the base tables.

Some suggestions:

- Create a ``Date`` dimension.
- Model ``Shipwrecks`` as dimensions for ``Country``, ``Port``,``ShipClass``, ``District`` etc
- Model ``Forecasts`` as a dimension for ``ForecastType`` and ``ForecastPeriod``
- Model ``Wrecks`` as a fact table - wreck date, location etc

> **Note:** Creating a dimensional model is not required for this challenge, but is a good way to introduce students to dimensional modelling in Fabric. Coaches with a Power BI background will be familiar with real-world reporting from data which has not been dimensionally modelled, and may wish to discuss the benefits of dimensional modelling with students. The focus of the hack is on the end-to-end Fabric experience, so coaches should not over emphasise building a 'correct' model for this small dataset.

At the time of writing, a second Fabric hack is being developed which will focus on datawarehousing and dimensional modelling in Fabric. We welcome contributions to this hack via the GitHub repo.

[< Previous Solution](./Solution-02.md) - **[Home](./README.md)** - [Next Solution >](./Solution-04.md)
Loading