3
[1] 3
R
In R
we operate with variables. A variable can be seen as a container for a value. To get a better conceptual understanding of this, you can go through the following and code-along in your own R
-session.
R
, we state values directly in the chunk or the console, e.g.:3
[1] 3
Here, we just state 3
, so R
simply “throws” that right back at you!
Now, if want to “catch” that 3
we have to assign it to a variable, e.g.:
<- 3 x
3
and nothing is “thrown” back to you, because we now have the 3
stored in x
: x
[1] 3
x
moving forward, e.g. by adding 2
:+ 2 x
[1] 5
x
and the result is simply “thrown” right-back-at-ya x
[1] 3
x
by adding 2
, we would have to “catch” the result as before:<- x + 2 x
x
: x
[1] 5
<- x + 3 y
x
x
[1] 5
y
y
[1] 8
R
, we use the assignment operator <-
to perform assignmentdplyr
-pipeline, where we do not change the dataset by running the pipeline, but we must store the result of the pipelineBefore continuing, make sure that you are on track with the above concepts!
my_age
containing… You guessed it!0.5
to the variable (I.e. your age, when you’re done with this course)my_age
, did you remember to assign, thereby updating?tibble(sequence = c("aggtgtgag", "tggaatgaaccgcctacc",
"aagaatgga", "tct", "tgtatt", "tgg",
"accttcaacgagtcccactgt", "cgt",
"gaggctgagctggttgta", "ggggaacag"))
# A tibble: 10 × 1
sequence
<chr>
1 aggtgtgag
2 tggaatgaaccgcctacc
3 aagaatgga
4 tct
5 tgtatt
6 tgg
7 accttcaacgagtcccactgt
8 cgt
9 gaggctgagctggttgta
10 ggggaacag
<- tibble(sequence = c("aggtgtgag", "tggaatgaaccgcctacc",
my_dna_data "aagaatgga", "tct", "tgtatt", "tgg",
"accttcaacgagtcccactgt", "cgt",
"gaggctgagctggttgta", "ggggaacag"))
my_dna_data
my_dna_data
# A tibble: 10 × 1
sequence
<chr>
1 aggtgtgag
2 tggaatgaaccgcctacc
3 aagaatgga
4 tct
5 tgtatt
6 tgg
7 accttcaacgagtcccactgt
8 cgt
9 gaggctgagctggttgta
10 ggggaacag
Note here, that a variable can as we saw before with x
and y
store a single value, e.g. 2
, but here, we are storing a tibble
-object in the variable my_dna_data
and in that tibble
-object, we have a variable sequence
, which contains some randomly generated dna.
But what if we wanted to add a new variable to the tibble
-object, which is the lenght of each of the dna-sequences?
|>
my_dna_data mutate(dna_length = str_length(sequence))
# A tibble: 10 × 2
sequence dna_length
<chr> <int>
1 aggtgtgag 9
2 tggaatgaaccgcctacc 18
3 aagaatgga 9
4 tct 3
5 tgtatt 6
6 tgg 3
7 accttcaacgagtcccactgt 21
8 cgt 3
9 gaggctgagctggttgta 18
10 ggggaacag 9
Nice! Let’s see that data again then:
my_dna_data
# A tibble: 10 × 1
sequence
<chr>
1 aggtgtgag
2 tggaatgaaccgcctacc
3 aagaatgga
4 tct
5 tgtatt
6 tgg
7 accttcaacgagtcccactgt
8 cgt
9 gaggctgagctggttgta
10 ggggaacag
Wait! What? Where is the variable we literally just created?
We forgot something… We did not update the my_dna_data
, let’s fix that:
<- my_dna_data |>
my_dna_data mutate(dna_length = str_length(sequence))
my_dna_data
: my_dna_data
# A tibble: 10 × 2
sequence dna_length
<chr> <int>
1 aggtgtgag 9
2 tggaatgaaccgcctacc 18
3 aagaatgga 9
4 tct 3
5 tgtatt 6
6 tgg 3
7 accttcaacgagtcccactgt 21
8 cgt 3
9 gaggctgagctggttgta 18
10 ggggaacag 9
Did it make sense? Check yourself, add a new variable to my_dna_data
called sequence_capital
by using the function str_to_upper()
That’s it - Hope it helped and remember… Bio data science in R is really fun!