You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/11-snakemake-intro.md
+35-5Lines changed: 35 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -134,7 +134,7 @@ $ python plotcount.py isles.dat show
134
134
135
135
Close the window to exit the plot.
136
136
137
-
`plotcount.py` can also create the plot as an image file (e.g. a PNG file):
137
+
`plotcount.py` can also create the plot as an image file (e.g. a PNG file):
138
138
139
139
```bash
140
140
$ python plotcount.py isles.dat isles.png
@@ -154,6 +154,30 @@ isles 3822 2460 1.55
154
154
```
155
155
{: .output}
156
156
157
+
> ## Zipf's Law
158
+
>
159
+
> Zipf's Law is an [empirical law](https://en.wikipedia.org/wiki/Empirical_law) formulated
160
+
> using [mathematical statistics](https://en.wikipedia.org/wiki/Mathematical_statistics)
161
+
> that refers to the fact that many types of data studied in the physical and
162
+
> social sciences can be approximated with a Zipfian distribution, one of a family
163
+
> of related discrete [power law](https://en.wikipedia.org/wiki/Power_law)[probability distributions](https://en.wikipedia.org/wiki/Probability_distribution).
164
+
>
165
+
> Zipf's law was originally formulated in terms of [quantitative linguistics](https://en.wikipedia.org/wiki/Quantitative_linguistics),
166
+
> stating that given some [corpus](https://en.wikipedia.org/wiki/Text_corpus)
167
+
> of [natural language](https://en.wikipedia.org/wiki/Natural_language) utterances,
168
+
> the frequency of any word is [inversely proportional](https://en.wikipedia.org/wiki/Inversely_proportional)
169
+
> to its rank in the [frequency table](https://en.wikipedia.org/wiki/Frequency_table).
170
+
> For example, in the [Brown Corpus](https://en.wikipedia.org/wiki/Brown_Corpus)
171
+
> of American English text, the word the is the most frequently occurring word,
172
+
> and by itself accounts for nearly 7% of all word occurrences (69,971 out of
173
+
> slightly over 1 million). True to Zipf's Law, the second-place word of
174
+
> accounts for slightly over 3.5% of words (36,411 occurrences), followed by
175
+
> and (28,852). Only 135 vocabulary items are needed to account for half
176
+
> the [Brown Corpus](https://en.wikipedia.org/wiki/Brown_Corpus).
Together these scripts implement a common workflow:
158
182
159
183
1. Read a data file.
@@ -278,13 +302,19 @@ There are several reasons this tool was chosen:
278
302
279
303
* It’s free, open-source, and installs in about 5 seconds flat via `pip`.
280
304
281
-
* Snakemake works cross-platform (Windows, MacOS, Linux) and is compatible with all HPC schedulers. More importantly, the same workflow will work and scale appropriately regardless of whether it’s on a laptop or cluster without modification.
305
+
* Snakemake works cross-platform (Windows, MacOS, Linux) and is compatible with all HPC
306
+
schedulers. More importantly, the same workflow will work and scale appropriately
307
+
regardless of whether it’s on a laptop or cluster without modification.
282
308
283
-
* Snakemake uses pure Python syntax. There is no tool specific-language to learn like in GNU Make, NextFlow, WDL, etc.. Even if students end up not liking Snakemake, you’ve still taught them how to program in Python at the end of the day.
309
+
* Snakemake uses pure Python syntax. There is no tool specific-language to learn like
310
+
in GNU Make, NextFlow, WDL, etc.. Even if students end up not liking Snakemake, you’ve
311
+
still taught them how to program in Python at the end of the day.
284
312
285
-
* Anything that you can do in Python, you can do with Snakemake (since you can pretty much execute arbitrary Python code anywhere).
313
+
* Anything that you can do in Python, you can do with Snakemake (since you can pretty
314
+
much execute arbitrary Python code anywhere).
286
315
287
-
* Snakemake was written to be as similar to GNU Make as possible. Users already familiar with Make will find Snakemake quite easy to use.
316
+
* Snakemake was written to be as similar to GNU Make as possible. Users already familiar
317
+
with Make will find Snakemake quite easy to use.
288
318
289
319
* It’s easy. You can (hopefully!) learn Snakemake in an afternoon!
0 commit comments