-
Notifications
You must be signed in to change notification settings - Fork 5
/
graveyard.qmd
307 lines (198 loc) · 11.9 KB
/
graveyard.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
# Other Topics {#sec-other-topics}
## Shell Commands {#sec-shell-commands}
When talking to computers, sometimes it is convenient to cut through the graphical interfaces, menus, and so on, and just tell the computer what to do directly, using the **system shell** (aka terminal, command line prompt, console).
Most system shells are fully functioning programming languages in their own right.
This section isn't going to attempt to teach you those skills - we'll focus instead on the basics - how to change directories, list files, and run programs.
### Launching the system terminal
In RStudio, you can access a system terminal in the lower left corner by clicking on the tab labeled Terminal. If the tab does not exist, then go to Tools -> Terminal -> New Terminal in the main application toolbar.
Sometimes, it is preferable to launch a terminal separate from RStudio. Here's how to do that:
::: panel-tabset
#### {{< fa brands windows >}} Windows
Option 1: Default Windows terminal (cmd.exe)
1. Go to the search bar/start menu
2. Type in cmd.exe
3. A black window should appear.
Option 2: Git bash (if you have git installed)
1. Go to the search bar/start menu
2. Type in bash
3. Click on the Git Bash application
If you choose option 2, use the commands for Bash/Linux below. Bash tends to be a bit less clunky than the standard windows terminal.
#### {{< fa brands apple >}} Mac
Option 1: Dock
1. Click the launchpad icon
2. Type Terminal in the search field
3. Click Terminal
Option 2: Finder
1. Open the Applications/Utilities folder
2. Double-click on Terminal
#### {{< fa brands linux >}} Linux
On most systems, pressing Ctrl-Alt-T or Super-T (Windows-T) will launch a terminal.
Otherwise, launch your system menu (usually with the Super/Windows key) and type Terminal. You may have multiple options here; I prefer Konsole but I'm usually using KDE as my desktop environment. Other decent options include Gnome-terminal and xterm, and these are usually associated with Gnome and XFCE desktop environments, respectively.
:::
### File Path Structure
On Windows, file paths are constructed as follows: `C:\Folder 1\Folder_2\file.R`. Paths are *generally* not case sensitive, so you can reference the same file path as `c:\folder 1\folder_2\file.R`. Usually, paths are encased in `""` because spaces make interpreting file paths complicated and Windows paths have lots of spaces.
On Unix systems, file paths are constructed as follows: `/home/user/folder1/folder2/file.R`. Paths are case sensitive, so you cannot reach `/home/user/folder1/folder2/file.R` if you use `/home/user/folder1/folder2/file.r`. On Unix systems, spaces in file paths must be escaped with `\`, so any space character in a terminal should be typed `\ ` instead.
This quickly gets complicated and annoying when working on code that is meant for multiple operating systems. These complexities are why when you're constructing a file path in R or python, you should use commands like `file.path("folder1", "folder2", "file.r")` or `os.path.join("folder1", "folder2", "file.py")`, so that your code will work on Windows, Mac, and Linux by default.
### Basic Terminal Commands
I have listed commands here for the most common languages used in each operating system.
If you are using Git Bash on Windows, follow the commands for Linux/Bash. If you are using Windows PowerShell, google the commands.
In most cases, Mac/Zsh is similar to Linux/Bash, but there are a few differences^[Mac used to use [bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) but switched to [Zsh](https://en.wikipedia.org/wiki/Z_shell) in 2019 for licensing reasons.].
::: {.column-page}
Task | {{< fa brands windows >}} Windows/CMD | {{< fa brands apple >}} Mac/Zsh | {{< fa brands linux >}} Linux/Bash
--- | --- | --- | ---
List your current working directory | `cd` | `pwd` | `pwd`
Change directory | `cd <path to new dir>` | `cd <path to new dir>` | `cd <path to new dir>`
List files and folders in current directory | `dir` | `ls` | `ls`
Copy file | `xcopy <source> <destination> <arguments>` | `cp <arguments> <source> <destination>` | `cp <arguments> <source> <destination>`
Create directory | `mkdir <foldername>` | `mkdir <foldername>` | `mkdir <foldername>`
Display file contents | `type <filename>` | `cat <filename>` | `cat <filename>`
:::
## Mathematical Logic {#sec-math-logic}
In @sec-data-struct and @sec-control-struct we talk about more complicated data structures and control structures (for loops, if statements).
I've included this section because it may be useful to review some concepts from mathematical logic.
Unfortunately, to best demonstrate mathematical logic, I'm going to need you to know that a vector is like a list of the same type of thing.
In R, vectors are defined using `c()`, so `c(1, 2, 3)` produces a vector with entries 1, 2, 3.
In Python, we'll primarily use `numpy` arrays, which we create using `np.array([1, 2, 3])`.
Technically, this is creating a list, and then converting that list to a numpy array.
### And, Or, and Not
We can combine logical statements using and, or, and not.
- (X AND Y) requires that both X and Y are true.
- (X OR Y) requires that one of X or Y is true.
- (NOT X) is true if X is false, and false if X is true. Sometimes called **negation**.
In R, we use `!` to symbolize NOT, in Python, we use `~` for vector-wise negation (NOT).
Order of operations dictates that NOT is applied before other operations. So `NOT X AND Y` is read as `(NOT X) AND (Y)`. You must use parentheses to change the way this is interpreted.
::: panel-tabset
#### R
```{r}
x <- c(TRUE, FALSE, TRUE, FALSE)
y <- c(TRUE, TRUE, FALSE, FALSE)
x & y # AND
x | y # OR
!x & y # NOT X AND Y
x & !y # X AND NOT Y
```
#### Python
```{python}
import numpy as np
x = np.array([True, False, True, False])
y = np.array([True, True, False, False])
x & y
x | y
~x & y
x & ~y
```
:::
### De Morgan's Laws
[De Morgan's Laws](https://en.wikipedia.org/wiki/De_Morgan%27s_laws) are a set of rules for how to combine logical statements. You can represent them in a number of ways:
- NOT(A or B) is equivalent to NOT(A) and NOT(B)
- NOT(A and B) is equivalent to NOT(A) or NOT(B)
::: panel-tabset
We can also represent them with Venn Diagrams.
#### Definitions
![Venn Diagram of Set A and Set B](images/other/SetA and SetB.png)
Suppose that we set the convention that ![Shaded regions are TRUE, unshaded regions are FALSE](images/other/TrueFalse.png).
#### DeMorgan's First Law
![A venn diagram illustration of De Morgan's laws showing that the region that is outside of the union of A OR B (aka NOT (A OR B)) is the same as the region that is outside of (NOT A) and (NOT B)](images/other/DeMorgan1.png)
#### DeMorgan's Second Law
![A venn diagram illustration of De Morgan's laws showing that the region that is outside of the union of A AND B (aka NOT (A AND B)) is the same as the region that is outside of (NOT A) OR (NOT B)](images/other/DeMorgan2.png)
:::
## Controlling Loops with Break, Next, Continue {#sec-controlling-loops}
<!-- https://www.py4e.com/html3/05-iterations -->
<!-- https://www.datamentor.io/r-programming/break-next/ -->
Sometimes it is useful to control the statements in a loop with a bit more precision.
You may want to skip over code and proceed directly to the next iteration, or, as demonstrated in the previous section with the `break` statement, it may be useful to exit the loop prematurely.
### Break Statement
![A break statement is used to exit a loop prematurely](images/other/break-statement.png)
### Next/Continue Statement
![A next (or continue) statement is used to skip the body of the loop and continue to the next iteration](images/other/next-statement-flow.png)
::: callout-warning
### Example: Next/continue and Break statements
Let's demonstrate the details of next/continue and break statements.
We can do different things based on whether i is evenly divisible by 3, 5, or both 3 and 5 (thus divisible by 15)
::: panel-tabset
#### R {.unnumbered}
```{r}
for (i in 1:20) {
if (i %% 15 == 0) {
print("Exiting now")
break
} else if (i %% 3 == 0) {
print("Divisible by 3")
next
print("After the next statement") # this should never execute
} else if (i %% 5 == 0) {
print("Divisible by 5")
} else {
print(i)
}
}
```
#### Python {.unnumbered}
```{python}
for i in range(1, 20):
if i%15 == 0:
print("Exiting now")
break
elif i%3 == 0:
print("Divisible by 3")
continue
print("After the next statement") # this should never execute
elif i%5 == 0:
print("Divisible by 5")
else:
print(i)
```
:::
:::
To be quite honest, I haven't really ever needed to use next/continue statements when I'm programming, and I rarely use break statements. However, it's useful to know they exist just in case you come across a problem where you could put either one to use.
## Recursion {#sec-recursion}
Under construction.
In the meantime, check out @datamentorRecursion2017 (R) and @parewalabspvtPythonRecursion2020 (Python) for decent coverage of the basic idea of recursive functions.
## Text Encoding {#sec-text-encoding}
I've left this section in because it's a useful set of tricks, even though it does primarily deal with SAS.
Don't know what UTF-8 is? [Watch this excellent YouTube video explaining the history of file encoding!](https://www.youtube.com/watch?v=MijmeoH9LT4)
SAS also has procs to accommodate CSV and other delimited files.
PROC IMPORT may be the simplest way to do this, but of course a DATA step will work as well.
We do have to tell SAS to treat the data file as a UTF-8 file (because of the japanese characters).
While writing this code, I got an error of "Invalid logical name" because originally the filename was pokemonloc. Let this be a friendly reminder that your dataset names in SAS are limited to 8 characters in SAS.
```
/* x "curl https://raw.githubusercontent.com/shahinrostami/pokemon_dataset/master/pokemon_gen_1_to_8.csv > ../data/pokemon_gen_1-8.csv";
only run this once to download the file... */
filename pokeloc '../data/pokemon_gen_1-8.csv' encoding="utf-8";
proc import datafile = pokeloc out=poke
DBMS = csv; /* comma delimited file */
GETNAMES = YES
;
proc print data=poke (obs=10); /* print the first 10 observations */
run;
```
Alternately (because UTF-8 is finicky depending on your OS and the OS the data file was created under), you can convert the UTF-8 file to ASCII or some other safer encoding before trying to read it in.
If I fix the file in R (because I know how to fix it there... another option is to fix it manually),
```{r csv-sas-via-r, eval = F}
library(readr)
library(dplyr)
tmp <- read_csv("https://raw.githubusercontent.com/shahinrostami/pokemon_dataset/master/pokemon_gen_1_to_8.csv")[,-1]
write_csv(tmp, "../data/pokemon_gen_1-8.csv")
tmp <- select(tmp, -japanese_name) %>%
# iconv converts strings from UTF8 to ASCII by transliteration -
# changing the characters to their closest A-Z equivalents.
# mutate_all applies the function to every column
mutate_all(iconv, from="UTF-8", to = "ASCII//TRANSLIT")
write_csv(tmp, "../data/pokemon_gen_1-8_ascii.csv", na='.')
```
Then, reading in the new file allows us to actually see the output.
```
libname classdat "sas/";
/* Create a library of class data */
filename pokeloc "../data/pokemon_gen_1-8_ascii.csv";
proc import datafile = pokeloc out=classdat.poke
DBMS = csv /* comma delimited file */
replace;
GETNAMES = YES;
GUESSINGROWS = 1028 /* use all data for guessing the variable type */
;
proc print data=classdat.poke (obs=10); /* print the first 10 observations */
run;
```
This trick works in so many different situations. It's very common to read and do initial processing in one language, then do the modeling in another language, and even move to a different language for visualization. Each programming language has its strengths and weaknesses; if you know enough of each of them, you can use each tool where it is most appropriate.
## References {#sec-other-topics-refs}