Biological Data Analysis: Homework 1
Due Thursday, Feb. 15
Note: this assignment is worth three points; all of the other homework assignments are worth two points each. You will e-mail this assignment to me, and print all of the other homework assignments.
1. Do not print your results from question 1; instead, enter the data in this spreadsheet and e-mail the spreadsheet to me by 11 a.m. on Thursday, Feb. 15. I'm going to be combining all of the spreadsheets into one giant spreadsheet, so do NOT add or delete or rearrange columns. Your answer to questions 2 and 3 should be in the body of the e-mail; don't send a separate e-mail for each question.
Everyone in the class is going to collect data on a number of people, so that we can combine it and have a giant data set to use both for homework assignments and for in-class demonstrations. Two of the variables, which arm is on top when you cross your arms, and which thumb is on top when you clasp your hands, have sometimes been considered simple genetic characters. This is not true. Another character, the length of time a person can stand on one leg with their eyes shut, is strongly associated with survival in older people, so it's actually pretty interesting.
Find 8 people who are at least 18 years old. You can include yourself as one of the 8 people, but you must get someone else to measure your balance time. Don't include anyone who has been asked these questions by someone else in this spring's BISC643 class; you may include people who were measured in a previous semester. Collect the following data for each person:
- Your name (be sure to copy it onto each line of the spreadsheet)
- Your sex (be sure to copy it onto each line of the spreadsheet)
- Full first name (no nicknames) of the subject.
- Full last name.
- Age (in years)
- Height, in inches or cm. Do not use feet+inches; if someone is 6 feet 1 inches tall, enter 73, not 6'1" or 6ft 1in or 6.1. Don't enter the units, just the number; I'll be able to tell whether it's inches or cm based on the size.
When they fold their arms, which is on top: right (like Mr. Clean) or left (like Jeannie)?
When they clasp their hands, which thumb is on top: left (like Barack Obama) or right (like Mitt Romney)?
- Shoe size.
- Which units they used for shoe size. The choices are US men (enter "USM"), US women (USW), UK/Indian men (UKM), UK/Indian women (UKW), or European (EU).
- Balance time. We want to make sure that everyone's balance time is measured under as similar of conditions as possible. Have the person take off their shoes; they can be in their socks or barefoot. Have the person stand on one foot, on a hard, flat floor (not carpet, rug, dirt, or grass) with their hands on their hips. When the person is ready, they should shut their eyes and say "start." Using a stopwatch (both iPhones and Android phones have a stopwatch function in the built-in Clock app), time how long the person can stand on one foot with their eyes shut. Stop the time when they open their eyes, put one foot down, or take their hands off their hips. Do this three times for each person, and record the best time. If their first or second time is greater than 60 seconds, they do not need to do the remaining trials. However, make sure that they balance for as long as possible; do NOT tell them that they can stop after 60 seconds.
- Which foot they stood on when they stood on one foot, right or left.
- What hand they throw a ball with, right or left.
- The class they disliked the most in high school. Enter the subject, such as "algebra" or "drivers ed." If they didn't attend high school, just ask them the class they disliked the most at the highest level of school they attended. Please use this question as an excuse to get your friends and relatives reminiscing about their younger days; ask them about their high school friends, extracurricular activities, the times they got in trouble, where they went on dates, etc.
You should answer questions 2 and 3 in the body of the e-mail you send to me, not an attachment. The spreadsheet from question 1 should be the only attachment.
2. Choose an article from the lab you're in (if you're in a lab) or from your favorite scientific journal. It should be a regular-sized article (not a brief note) in a specialized journal (not Science, Nature, or PNAS). If you don't have a lab or a favorite journal, I recommend you go to Web of Science and do a topic search for your favorite biological topic. Read through the paper and identify at least six variables that are analyzed in the paper. For each variable, provide the name of the variable (such as "LAM"), and if it's not obvious from the name, give a short explanation of what the variable is (such as "length of the anterior adductor muscle scar on a mussel shell"). Then say whether the variable is a measurement variable, a nominal variable, or a ranked variable. If a measurement variable has been converted to a nominal variable, or if the percentages from a nominal variable have been analyzed as if they were a measurement variable, mention this. You must have at least six variables; if you don't have six, do more than one paper.
3. Give the citation information (authors, year, article title, journal, volume, page numbers) for the article or articles you've used.
Mmmmm, bonus: Everyone who finds an article with a true ranked variable (not a measurement variable converted to a ranked variable for a non-parametric test) gets a donut at the next class.
Return to the Biological Data Analysis syllabus
Problem 1: Simple Boolean operations
Tip: Note that each of the code blocks in this Problem contain the expression . This tells R Markdown to display the code contained in the block, but not to evaluate it. To check that your answer makes sense, be sure to try it out in the console with various choices of values for the variable .
(a) Checking equality.
Given a variable , write a Boolean expression that evaluates to if the variable is equal to (the numeric value).
(b) Checking inequality.
Given a variable , write a Boolean expression that evaluates to if the variable is not (i.e., is not missing).
(c) Checking if a number is in a given range.
Given a (possibly negative) number , write a Boolean expression that returns if and only if is smaller than or bigger than .
(d) A more complicated example.
Given an integer number , write a Boolean expression that returns if and only if is an odd number between -8 and 12 or 100 and 150.
Tip: Recall the modulus operator we saw in lecture 1: . For integers and , is the remainder of divided by .
Problem 2: Vector Boolean operations
(a) R has two kinds of Boolean operators implemented, single (, ) and double (, ).
One of these operators takes advantage of something called lazy evaluation while the other does not. They also don’t behave the same way when applied to vectors.
Read the help file () and construct some examples to help figure out how the two behave.
To help you get started, try out the following two examples in your console:
Can you explain what’s happening? Write up a brief explanation below.
Replace this text with your explanation.
Two people were asked to give their preferences between two options: [Facebook, Twitter], [Firefox, Chrome], [Mac, PC], [Summer, Winter]. Their results are given below.
Use the function to determine if the two people have identical preferences. (Your code should ouput a single Boolean value, either or )
Use the function to determine if the two people have any preferences in common. (Your code should output a single Boolean value, either or )
(d) Missing values.
Let be the vector defined below.
Write a Boolean expression that checks whether each entry of is missing (recall missing values are denoted by ). Your expression should return a Boolean vector having the same length as .
Problem 3: Referencing vector elements
Write code that returns the indexes of that are missing.
(b) Getting non-missing values
Write code that uses negative indexes and your solution from (a) to return only the values of that are not missing. (i.e., your code should result in a vector with elements: 18, 25, 71, 45, 18)
(c) A more direct way of getting non-missing values
Using the negation operator and the function, write an expression that returns only the values of that are not missing.
(d) More practice
For the next three problem we’ll go back to the data set.
Write code to figure out which cars had a stopping distance of 15 feet or more.
(e) , practice
Use the function to figure out which car had the shortest stopping distance. (Your code should return the car’s index.)
(f) More practice
Use the function to figure out the speed of the car that had the longest stopping distance. (Your code should return the car’s speed.)
Problem 4: Data frame basics
(a) Importing data.
In Lecture 2 we saw how to use the function to import the survey data. Now we’ll use a different function. Use the function to import the survey data into a variable called .
Tip: The data file is located at . Do not download the file. Import the data directly using the URL.
Use the operator to select the TVhours column from the data
(c) [,] notation
Repeat part (b) using notation. i.e., Use notation to select the TVhours column from the data by name (i.e., obtain this column by using the name “TVhours” instead of using the column number)
(d) [] notation
Repeat part (c) with [] notation.
(e)  notation
Repeat part (d), but this time using single blackets () notation.
(Observe that this returns a new single-column data frame, not just a vector.)
Use the function to select all the survey data on Program and OperatingSystem for respondents whose Rexperience is “Never used” or who watched 5 or more hours of TV last week.
Problem 5: Data summaries and inline code practice.
(a) Bar graph
Create a bar graph of respondents’ Rexperience.
(b) Inline code practice
Replace all occurrences of ???? in the paragraph below with an inline code chunk supplying the appropriate information.
Of the ???? survey respondents, ???? were NOT from the MISM program. We found that ????% of the all students in the class use the Mac OS X operating system. ????% of of MISM students report having Basic competence in R.