15 Data as Sources
- Learn more background information.
- Answer your research question. (The evidence that data provide can help you decide on the best answer for your question.)
- Convince your audience that your answer is correct. (Data often give you ecidence that your answer that your answer is correct. (Data often give you evidence that your answer to your research question is correct or at least a reasonable answer.)
- Describe the situation surrounding your research question.
- Report what others have said about your research question.
Activity: Example of Data
Check out this very detailed data about frozen lasagna. Did you ever think this much data was available? Are there elements new to you? How might you use such data?
Movie: Reinterpreting Little Red Riding Hood
What is data? The word means many things to many people. (Consider “data” as it relates to your phone contract, for instance!) For our purposes, a definition we like is “units of information observed, collected, or created in the course of research.”
Erway, Ricky. 2013. Starting the Conversation: University-wide Research Data Management Policy. Dublin, Ohio: OCLC Research.
Data observed, collected, or cerated for research purposes can be numbers, text, images, audio clips, and video clips. But in this section on using data as sources, we’re going to concentrate on numerical data.
Data is the plural of datum. (It’s similar to how media is the plural of medium.)
Sometimes data is actually necessary to answer research questions, particularly in the social sciences and life and physical sciences. For instance, data would be necessary to support or rule out these hypotheses:
- More women than men voted in the last presidential election in a majority of states.
- A certain drugs shows promising results in the treatment of pancreatic cancer.
- Listening to certain genres of music lowers blood pressure.
- People of certain religious denominations are more likely to find a specific television program objectionable.
- The average weight of house cats in the United States has increased over the past 30 years.
- The average square footage of supermarkets in the United States has increased in the past 20 years.
- More tomatoes were consumed per person in the United Kingdom in 2015 than in 1962.
- Exploding volcanoes can help cool the planet by spewing sulfur dioxide, which combines with water vapor to make reflective aerosols.
So using numeric data in those portions of your final product that require evidence can really strengthen your argument for your argument for your answer to your research question. At other times, even if data is not actually necessary, numeric data can be particularly persuasive and sharpen the points you want to make in other portions of your final product devoted to, say, describing the situation surrounding your research question. (See Making an Argument)
For example, for a term paper about the research question “Why is there a gap in the number of people who qualify for food from foodbanks and the number of people who use foodbanks?,” you could find data on the website of Feeding America, the nation’s largest network of foodbanks. Some of that data may be the number of people who get food from a foodbank annually, with the number of seniors and children broken out. Those data won’t answer your research question, but they will help you describe the situation around that question and help your audience develop a fuller understanding.
Similarly, for a project with the research question “How do some birds in Australia use “smart” hunting techniques to flush out prey, including starting fires?,” you might find a journal article with data about how many people have observed these techniques and estimates of how frequently the techniques are used and by how many bird species.
There are two ways of obtaining data:
- Obtain data that already has been collected and analyzed. That’s what this section will cover.
- Collect data yourself. This can include activities such as making observations about your environment, conducting surveys or interviews, directly recording measurements in a lab or in the field, or even receiving electronic data recorded by computers/machines that gather the data. You will explore these activities in courses you take.
Finding Data in Articles, Books, Web Pages, and More
Numeric search data can be found all over the place. A lot of it can be found as part of another source- such as books; journal, newspaper, and magazine articles; and web pages. In these cases, the data do not stand alone as a distinct element, but instead are part of the larger work.
When searching for data in books and articles and on web pages, terms such as statistics or data may or may not be useful search terms. That’s because many writers don’t use those terms in their scholarly writing. They tend to use the words findings or results when talking about the data that could be useful to you. In addition, statistics is a separate discipline and using that term will turn up lots of journals in that area, which won’t be helpful to you. So use the search terms data and statistics with caution, especially when searching library catalogs. (See information on the Library Catalog. More information on searching is at Precision Searching.)
Even without using those search terms, many scholarly sources you turn up are likely to contain data. Once you find potential sources, skim them for tables, graphs, or charts. These items are displays or illustrations of data gathered by researchers. However, sometimes data and interpretations are solely in the body of the narrative text and may be included in sections called “Results” or “Findings.” (That shouldn’t keep you from displaying the data in charts, graphs, or tables as you like in your own work, though. See Data Visualization later in this section.)
If the data you find in a book, article, or web page is particularly helpful and you want more, you could contact the author to request additional numeric research data. Researchers will often discuss their data and its analysis – and sometimes provide some of it (or occasionally, all). Some may link to a larger numeric research data set. However, if a researcher shares his or her data with you, it may be in a raw form. This means that you might have to do additional analysis to make it useful in answering your question.
Depending on your research question, you may need to gather data from multiple sources to get everything you need to answer your research question and make your argument for it. (See Making an Argument.)
For instance, in our example related to foodbanks above, we suggested where you could find statistics about the number of people who get food from American foodbanks. But with that research question (“Why is there a gap in the number of people who qualify for food from foodbanks and the number of people who use foodbanks?”), you would also need to find out from another source how many people qualify for foodbanks based on their income and compare that number with how many people actually use foodbanks.
Finding Data, Data Depositories, and Directories
Sometimes the numeric research data you need may not be in the articles, books, and web sites that you’ve found. But that doesn’t mean that it hasn’t been collected and packaged in a useable format. Governments and research institutions often publish data they have collected in discipline-specific data depositories that make data available online. Here are some examples:
- United States Census Bureau
- Budget of the United States Government
- U.S. Bureau of Justice Statistics
- National Center for Education Statistics
- Daily Weather Maps NOAA)
- The World Factbook (CIA)
- OSU Knowledge Bank
The United Nations and just about every country provide information as numeric data available online. Free and accessible data like this is called open data. The U.S. federal government, all states, and many local governments provide open data. You can find them (among other places) at site: .gov.
Other data are available through vendors who publish the data collected by researchers. Here are some examples:
- Hoover’s Online (OSU Only)
- International Monetary Fund Statistical Databases
- World Health Organization Statistical Information System
- Census of Agriculture (OSU only)
- OECD Education at a Glance
- Corruption Perceptions Index
Don’t know if a depository that could contain data in your discipline? Check out a data directory such as re3data.org
Activity: Where to Find Data
Evaluating Data as Sources
Evaluating data for relevance and credibility is just as important as evaluating any other source. Another thing that is the same with data is that there is never a 100% perfect source. So just as is pointed out in Evaluating Sources, you’ll have to make educated guesses (inferences) about whether the data are good enough for your purpose.
Critical thinking as you evaluate sources is something your professors will expect. But you’ll benefit in other ways, too, because you’ll be practicing a skill necessary for the rest of your life, both in the workplace and in your personal life. It’s those skills that will keep you from being duped by fake news and taken advantage by posts that are ignorant or, sometimes, simply scams.
To evaluate data, you’ll need to find out how the data were collected. If the data are in another source, such as a book; web page; or newspaper, magazine, or research journal article, evaluate that source in the usual way (see Evaluating Sources). If the book or newspaper, magazine, or web page got the data from somewhere else, do the same evaluation of the source from which the book or article got the data. The article, book, or web page should cite where the data came from. If it doesn’t, then that is a black mark against using that data. (The data in a research journal article are often the work of the authors of the article. But you’ll want to be sure they provide information about how they collected the data.)
In addition, if the data are in a research journal article, read the entire article, including the section called Methodology, which tells how the data were collected. Then determine the data’s relevance to your research question by considering such questions as:
- Were the data collected recently enough?
- Is the data cross-sectional (based on information from people at any one time) or longitudinal (based on information from the same people over time)? If one is more appropriate for your research question than the other, is there information that you can still logically infer from this data?
- Were the types of people from whom the data were collected the same type of people your research question addresses? The more representative the study’s sample is of the group your research question addresses, the more confident you can be in using the data to make your argument in your final product.
- Was the data analysis done at the right level for your research question? For instance, it may have been done at the individual, family, business, state, or zip code level. But if that doesn’t relate to your research question, can you still logically make inferences that will help your argument? Here’s an example: Imagine that your research question asks whether participation in high school sports in Columbus City Schools is positively associated with enrolling in college. But the data you are evaluating is analyzed at the state level. So you have data about the whole state of Ohio’s schools and not Columbus in particular. In this case, ask yourself whether there is still any inference you can make from the data.
- Is the article in a peer reviewed journal? (Look at the journal’s instructions for authors, which are often located on the journal’s website, to see if it talks about peers reviewing the article and asking for changes [revisions] before publishing.) If it is a peer reviewed journal, consider that a plus for the article’s credibility. Being peer reviewed doesn’t mean it’s perfect; just more likely to be credible.
- Do the authors discuss causation or correlation? Be wary of claims of causation; it is very difficult to determine a causal effect. While research studies often find relationships (correlation) between various variables in the data, this does not equal causation. For instance, let’s return to our example above: If the study of Ohio high schools students’ sports participation showed a positive correlation between sports participation and college enrollment, the researcher cannot say that participation caused college enrollment. If it were designed to show cause and effect, the study would not have resulted in a correlation. Instead, it would have had to have been designed as an experiment or quasi-experiment, used different statistical analyses, and would have supported or not supported its hypotheses.
ACTIVITY: Evaluating Data as Sources
Modern software can help you display your data in ways that are striking and often even beautiful. But the best criterion for judging whatever display you use is whether it helps you and your audience understand your data better than only text, maybe even noticing points that you would have otherwise missed.
Specific kinds of charts and graphs accomplish different things, which is important to keep in mind as you evaluate data and data sources. For instance:
- Line charts are usually used to show trends, comparing data over time.
- Scatter plots show the distribution of data points.
- Bar graphs usually compare categories of data.
- Pie charts show proportions of a whole.
It’s important to decide what you want a display to do before making your final choice. Studying your data first so you know what you have will help you make that decision. Also, it may also be conventional in your discipline to display your data in certain ways. Examining the sources you were assigned to read in your course or asking your professor will help you learn what’s considered conventional.
Your professors will be examining your visual display to make sure you did not misrepresent the data. For example, the proportions of slices in a pie chart all have to add up to 100%. If yours don’t, you’ve done something wrong.
It’s easy to get overwhelmed by all the choices to be made between potential displays and what each can do: Here are two sites to help you sort them out once you know your data:
If you aren’t ready yet to use some of the specialized tools for display, make it a point to learn how to use the data display capabilities in Microsoft Word and/or Excel. You can find helpful tutorials on the Web. Good search statements to find those tutorials are:
- “Microsoft Word” (charts OR graphs)
- “Microsoft Excel” (charts OR graphs).
If you are OSU staff, students, or faculty, OSU Libraries’ Research Commons can help you choose a display, recommend a tool to accomplish it, and check out your finished data visualization before you have to turn it in. Contact the data visualization specialist.
If you are interested in displaying geospatial data on a map, consider how the Research Commons also helps OSU students, staff, and faculty find geospatial data and choose tools to display them.
Data is not copyrightable, but the expression of data is. So as with any other information source, you should cite any data you use from a source, whether it appeared in an article or you downloaded the data from a repository on the Web.
Unfortunately, data citation standards do not exist in many disciplines, although the DataCite initiative is working on them. Current workarounds include:
- Citing a “data paper,” where available.
- Citing a journal article that describes the dataset.
- Citing a book that includes the data.
- Citing the dataset as a website, where possible.
Examples: Citing Data
Data from a research database:
- APA: Department of Agriculture (USDA) (2008). “Crops Harvested”, Crop Production [data file]. Data Planet, (09/15/2009).
- MLA: “Crops Harvested”, Department of Agriculture (USDA) [data file] (2008). Data Planet, (09/15/2009).
Data from a file found on the open Web:
- APA: Center for Health Statistics, Washington State Department of Health. (2012, November). Mortality Table D1. Age-Adjusted Rates for Leading Causes of Cancer for Residents, 2002-2011. [Microsoft Excel file]. Washington State Department of Health. Retrieved from http://www.doh.wa.gov/
- MLA: Center for Health Statistics, Washington State Department of Health. Mortality Table D1. Age-Adjusted Rates for Leading Causes of Cancer for Residents, 2002-2011. Washington State Department of Health, Nov. 2012. Microsoft Excel file. Retrieved from http://www.doh.wa.gov/
Proper Use of Data
Once you have your data, you can examine them and make an interpretation. Sometimes, you can do so easily. But not always.
…you had a lot of information? Sometimes data can be very complicated and may include thousands (or millions…or billions…or more!) of data points. Suppose you only have a date and the high temperature for Columbus – but you have this for 20 years’ worth of days. Do you want to calculate the average highs for each month based upon 20 years’ worth of data by hand or even with a calculator?
…you want to be able to prove a relationship? Perhaps your theory is that social sciences students do better in a certain class than arts and humanities or life and physical science students. You may have a huge spreadsheet of data from 20 years’ worth of this course’s sections and would need to use statistical methods to see if a relationship between major and course grade exist.
You may find yourself using special software, such as Excel, SAS, and SPSS, in such situations.
Many people may have a tendency to look for data to prove their hypothesis or idea, as opposed to really answering their research questions. However, you may find that the opposite happens: the data may actually disprove your hypothesis. You should never try to manipulate data so that it gives credence to your desired outcome. While it may not be the answer you wanted to find, it is the answer that exists. You may, of course, look for other sources of data – perhaps there are multiple sources of data for the same topic with differing results. Inconclusive or conflicting findings do happen and can be the answer (even if it’s not the one you wanted!).
Conflicting results on the same topic are common. This is the reality of research because, after all, the questions researchers are studying are complicated. When you have conflicting results you can’t just ignore the differences—you’ll have to do your best to explain why the differences occurred.