-
Notifications
You must be signed in to change notification settings - Fork 0
/
Independent Practice.rmd
409 lines (303 loc) · 12.9 KB
/
Independent Practice.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
---
title: "Independent Practice"
output: html_notebook
name: ''
editor_options:
markdown:
wrap: 72
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error = TRUE)
```
## Introduction
### Background
Welcome to using R! You may have used R before-or you may not have!
Either is fine as this task will be designed with the assumption that
you have not used R before. It includes "reaches" for anyone who may
want to do a bit more.
A bit of background, first: We're using RStudio Cloud, which has all of
the functionality of the desktop version of RStudio, and some additional
benefits for the workshop, specifically around the installation of
packages. If you wish to use RStudio desktop, you can download this
project (see the File pane in the lower right corner - click "More") and
click the project file file (the file that ends in .Rproj) to open it in
your desktop version.
[ADD SCREENSHOT]
### Organization
This independent practice is really a warm-up. It is a chance to become
familiar with how RStudio works. In the context of doing so, we'll focus
on three things:
1. Reading data into R (in the **Prepare** section)
2. Preparing and "wrangling" data in table (think spreadsheet!) format
(in the **Wrangle** section)
3. Creating some plots (in the **Explore** section)
4. Running a model - specifically, a regression model (in the **Model**
section)
5. Finally, creating a reproducible report of your work you can share
with others (in the **Communicate** section)
You may be wondering what these bolded terms refer to; what's so special
about preparing, wrangling, exploring, and modeling data - and
communicating results? We're using these terms as a part of a framework,
or model, for what we mean by doing learning in STEM education research.
The particular framework we are using comes from the work of Krumm et
al.'s [*Learning Analytics Goes to
School*](https://www.routledge.com/Learning-Analytics-Goes-to-School-A-Collaborative-Approach-to-Improving/Krumm-Means-Bienkowski/p/book/9781138121836)*.*
You can check that out, but don't feel any need to dive deep for now -
we'll be spending more time on this in first day of the summer
institute. For now, know that this document is organized around three of
the five components of what we're referring to as the **LASER cycle**.
Click the arrow to the right of the code chunk below to view the image
(more on that process of clicking the green arrow and what it does, too,
in a moment)!
```{r}
knitr::include_graphics("laser-cycle.png")
```
### How to use this document
This is an R Notebook. There are two keys to your use of it:
1. First, be sure that you are viewing the document in the "Visual
Editor" mode. You can use this mode by clicking the symbol that
appears like a letter A (or the tip of a pencil!) in the top right
of this window. Check the screencast, too, if this isn't immediately
visible or apparent.
2. Second, click "Preview" at the top of this screen to preview the
document as you work through it. This will allow you to see your
code and the input in a rendered - easy-to-read - document.
Let's get started! We are glad you are here and to begin this exciting
(and challenging) journey together.
## 1. PREPARE
By preparing, we refer to developing a question or purpose for the
analysis, which you likely know from your research can be difficult!
This part of the process also involves developing an understanding of
the data and what you may need to analyze the data. This often involves
looking at the data and its documentation. For now, we'll focus on just
a few parts of this process, diving in much more deeply over the coming
weeks.
### Packages 📦
R uses "packages," add-ons that enhance its functionality. One package
that we'll be using is the tidyverse. To load the tidyverse, click the
green arrow in the right corner of the block-or "chunk"-of code that
follows.
```{r}
library(tidyverse)
```
Please do not worry if you saw a number of messages: those probably mean
that the tidyverse loaded just fine. If you see an error, though, try to
interpret or search via your search engine the contents of the error, or
reach out to us for assistance.
### Loading (or reading in) data
Next, we'll load data-specifically, a CSV file, the kind that you can
export from Microsoft Excel or Google Sheets - into R, using the
`read_csv()` function in the next chunk.
Clicking the green arrow runs the code; do that next.
```{r}
d <- read_csv("sci-online-classes.csv")
```
#### Viewing or inspecting data
Last, let's check that the code worked as we intended; run the next
chunk and look at the results, tabbing left or right with the arrows, or
scanning through the rows by clicking the numbers at the bottom of the
pane with the print-out of the data you loaded:
```{r}
d
```
#### [**Your Turn**]{style="color: green;"} **⤵**
What do you notice about this data set? What do you wonder? Add one-two
thoughts following the dashes next (you can add additional dashes if you
like!):
-
-
There are other ways to inspect your data; the `glimpse()` function
provides one such way. Run the code below to take a glimpse at your
data.
```{r}
glimpse(d)
```
We have one more question to pose to you: What do rows and columns
typically represent in your area of work and/or research?
Generally, rows typically represent "cases," the units that we measure,
or the units on which we collect data. This is not a trick question!
What counts as a "case" (and therefore what is represented as a row)
varies by (and within) fields. There may be multiple types or levels of
units studied in your field; listing more than one is fine! Also, please
consider what columns - which usually represent variables - represent in
your area of work and/or research.
#### [**Your Turn**]{style="color: green;"} **⤵**
What rows typically (or you think may) represent:
-
What columns typically (or you think may) represent:
-
Next, we'll use a few functions that are handy for preparing data in
table form.
## 2. WRANGLE
By wrangle, we refer to the process of cleaning and processing data,
and, in cases, merging (or joining) data from multiple sources. Often,
this part of the process is very (surprisingly) time-intensive.
Wrangling your data into shape can itself be an important
accomplishment! There are great tools in R to do this, especially
through the use of the {dplyr} R package.
### Selecting variables
Let's select only a few variables.
```{r}
d %>%
select(student_id, total_points_possible, total_points_earned)
```
Notice how the number of columns (variables) is now different.
Let's *include one additional variable* in your select function.
First, we need to figure out what variables exist in our dataset (or be
reminded of this - it's very common in R to be continually checking and
inspecting your data)!
You can use a function named glimpse() to do this.
```{r}
glimpse(d)
```
#### [**Your Turn**]{style="color: green;"} **⤵**
In the code chunk below, add a new variable to the code below, being
careful to type the new variable name as it appears in the data. We've
added some code to get you started. Consider how the names of the other
variables are separated as you think about how to add an additional
variable to this code.
```{r}
d %>%
select(student_id, total_points_possible, total_points_earned)
```
Once added, the output should be different than in the code above -
there should now be an additional variable included in the print-out.
### Filtering variables
Next, let's explore filtering variables. Check out and run the next
chunk of code, imagining that we wish to filter our data to view only
the rows associated with students who earned a final grade (as a
percentage) of 70 - 70% - or higher.
```{r}
d %>%
filter(FinalGradeCEMS > 70)
```
##### [**Your Turn**]{style="color: green;"} **⤵**
In the next code chunk, change the cut-off from 70% to some other value
- larger or smaller (maybe much larger or smaller - feel free to play
around with the code a bit!).
```{r}
d %>%
filter(FinalGradeCEMS > 70)
```
What happens when you change the cut-off from 70 to something else? Add
a thought (or more):
-
### Arrange
The last function we'll use for preparing tables is arrange.
We'll combine this arrange() function with a function we used already -
select(). We do this so we can view only the student ID and their final
grade.
```{r}
d %>%
select(student_id, FinalGradeCEMS) %>%
arrange(FinalGradeCEMS)
```
Note that arrange works by sorting values in ascending order (from
lowest to highest); you can change this by using the desc() function
with arrange, like the following:
```{r}
d %>%
select(student_id, FinalGradeCEMS) %>%
arrange(desc(FinalGradeCEMS))
```
#### [**Your Turn**]{style="color: green;"} **⤵**
In the code chunk below, replace FinalGradeCEMS that is used with both
the select() and arrange() functions with a different variable in the
data set. Consider returning to the code chunk above in which you
glimpsed at the names of all of the variables.
```{r}
d %>%
select(student_id, FinalGradeCEMS) %>%
arrange(desc(FinalGradeCEMS))
```
### Reach 1 🎉
Can you compose a series of functions that include the select(),
filter(), and arrange functions? Recall that you can "pipe" the output
from one function to the next as when we used select() and arrange()
together in the code chunk above.
*This reach is not required/necessary to complete; it's just for those
who wish to do a bit more with these functions at this time (we'll do
more in class, too!)*
```{r}
```
## 3. EXPLORE
Exploratory data analysis, or exploring your data, involves processes of
*describing* your data (such as by calculating the means and standard
deviations of numeric variables, or counting the frequency of
categorical variables) and, often, visualizing your data prior. In this
section, we'll create a few plots to explore our data.
### Histogram
The code below creates a histogram, or a distribution of the values, in
this case for students' final grades.
```{r}
ggplot(d, aes(x = FinalGradeCEMS)) +
geom_histogram()
```
You can change the color of the histogram bars by specifying a color as
follows:
```{r}
ggplot(d, aes(x = FinalGradeCEMS)) +
geom_histogram(fill = "blue")
```
### Changing colors
#### [**Your Turn**]{style="color: green;"} **⤵**
In the code chunk below, change the color to one of your choosing;
consider this list of valid color names here:
<http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf>
```{r}
ggplot(d, aes(x = FinalGradeCEMS)) +
geom_histogram(fill = "blue")
```
Finally, we'll make one more change; visualize the distribution of
another variable in the data - one other than FinalGradeCEMS. You can do
so by swapping out the name for another variable with FinalGradeCEMS.
Also, change the color to one other than blue.
```{r}
ggplot(d, aes(x = FinalGradeCEMS)) +
geom_histogram(fill = "blue")
```
### Reach 2 🎉
Completed the above? Nice job! Try for a "reach" by creating a scatter
plot for the relationship between two variables. You will need to pass
the names of two variables to the code below for what is now simply XXX
(a placeholder).
```{r}
ggplot(d, aes(x = XXX, y = XXX)) +
geom_point()
```
## 4. MODEL
"Model" is one of those terms that has many different meanings. For our
purpose, we refer to the process of simplifying and summarizing our
data. Thus, models can take many forms; calculating means represents a
legitimate form of modeling data, as does estimating more complex
models, including linear regressions, and models and algorithms
associated with machine learning tasks. For now, we'll run a linear
regression to predict students' final grades.
Below, we predict students' final grades (`FinaGradeCEMS`, which is on a
0-100 point scale) on the basis of the time they spent on the course
(measured through their learning management system in minutes,
`TimeSpent`, and the subject (one of five) of their specific course.
```{r}
m1 <- lm(FinalGradeCEMS ~ TimeSpent + subject, data = d)
summary(m1)
```
#### [**Your Turn**]{style="color: green;"} **⤵**
Notice how above the variables are separated by a + symbol. Below, add
*another* - a third - variable to the regression model. Specifically,
add a variable students' initial, self-reported interest in science,
`int` - and any other variable(s) you like! What do you notice about the
results? We're going to dive into this *much* more: if you have many
questions now, you're in the right spot!
```{r}
m2 <- lm(FinalGradeCEMS ~ TimeSpent + subject, data = d)
summary(m2)
```
## 5. COMMUNICATE
Great job! Once you've finished your work, click the arrow beside the
button you used to "Preview" your document to see what it will look like
when you share it with others. When everything looks good, click "Knit
to HTML" at the top to render a report that you can be viewed using a
web browser and shared online.
You may also wish to "Knit to PDF"; note if you do, you now have two
forms of output in your "Files" pane. More on all of this to come.
Congratulations on getting started!