Thursday, October 13, 2011

Data Analysis Techniques Readings


Connaway and Powell Chapter 9

Statistical Analysis: a form of analysis that deals with developing and applying methods and techniques for the organization and analysis of data that is typically quantitative so that conclusions that are developed from this data can be evaluated in an objective matter.

Categories: Categories are necessary to organize the data so that it may be analyzed. The categories should be established before data collection actually occurs.

Wildemuth  Chapter 29

Content Analysis: is a systematic and quantitative analysis of information. It should always follow the scientific method.

Latent Content: Latent content is difficult to quantitate. It is conceptual in nature and is not directly observable in analysis. It can often relate to emotions or other things that can’t be counted or observed quantitatively.

Wildemuth Chapter 30

Qualitative Content Analysis: A form of content analysis that focuses predominantly on speech, texts and their contexts. As it’s terminology insinuates, it is analysis that is of a qualitative nature rather than the quantitative nature that content analysis typically employs.

Conventional Qualitative Content Analysis:  Qualitative content analysis where coding categories are created directly and inductively from raw data.

Wildemuth Chapter 31

Discourse Analysis: The analysis of many forms of discourse, including all spoken discourse, whether it be formal or not, and all written text of any kind. In ILS this could apply specifically to reference interviews as an example.

Hermeneutics: In hermeneutics a researcher has preunderstandings of a concept of interest and focuses on relationships between texts. This method of analysis is reflexive.

Wildemuth Chapter 32

Deductive Reasoning: A way in which one uses logic to come to conclusions  that are true when using a broader group of tenets that are also (assumed) true.

Induction: The opposite of induction, Induction is when one observes specific facts and logically comes to a more general, broader conclusion.

Wildemuth Chapter 33

Measures of Central Tendency: These measures are most concerned with identifying a single number that can summarize an entire data set.

Measures of Dispersion: Quite the opposite of the measures of central tendency, the measures of dispersion show the outliers of your data set.

Wildemuth Chapter 34

Frequency Distribution: Frequency distribution is when the counts of how many cases there are in a category of a variable are organized into a table. This is done when analyzing categorical, or nominal, data.

Chi-Square Statistic: The Chi-square statistic is when you measure the difference between observation and what might be expected of a population in general.

Wildemuth Chapter 35

Codes: labels that are structured in syntax that are typically linked to data elements or chunks. Could also be defined as aggregates of data elements that the researcher or analyst views as coherent.

Optimal Matching Approach: An approach that involves direct comparison of the similarity (or lack thereof) of two sequences in completion.

Wildemuth Chapter 36

Correlation: A method of statistical analysis in which you examine the relationship that exists between two variables. When two variables are “perfectly correlated” then the variability of one variables is able to explain all the variability of the other variable.

Direction of the Relationship: The direction of the relationship between variables is indicated by the correlation statistic and it’s sign. When the sign is positive it means that as a variable increases the other variable will consequently also increase. If the sign is negative then as one statistic increases the other will decrease.

Wildemuth Chapter 37

Power: The ability to reject a hypothesis that is null at a significance level. It means that one can intuit a difference when one actually exists. Lack of power is a huge problem in social scientific research design.

Repeated Measures Design: When each participant is assigned to multiple conditions instead of just one.

Sunday, October 9, 2011

Data Collection Techniques Readings


Powell Ch. 5

The steps to planning the questionnaire include:

1.     Write your problem statement.
2.     Review the literature
3.     Brainstorm a solution
4.     Figure out what information is needed to test the solution in step three.
5.     Identify your population.
6.     Pick the technique that will best serve to collect the data.

Open-ended questions: Also known as unstructured questions, these questions allow for free responses, where as a fixed response questions limit the possibilities in regards to answers.

Wildemuth Ch. 18

Data Cleansing: During data cleansing corrupt or unnecessary data is removed and raw data files are processed for secondary data sets. This could include user sessions, query sessions, term occurrences, etc.

Path Completion: Web log mining applications use path completion to add any page references that may be missing so that those that may not be included because of page caching will be included in the reined log and then analyzed.

Wildemuth Ch. 19

Think-aloud protocols: A Research method that is employed in order to understand a subjects cognitive processes based on any verbal reports of their thoughts that occur during experimentation.

Concurrent protocols: Concurrent protocols in general follow procedures of the protocol and ask the subjects to speak their thoughts out loud while working to solve problems.

Wildemuth Ch. 20

Direct Observation: A data method that is only appropriate when there is something to observe. This data method focuses on participant behavior in a setting.

Nonparticipant Observation: In nonparticipant observation, the researcher strictly observes only.  The researcher does not participate in the setting that is being observed.

Wildemuth Ch. 21

Participant Observation: Participant Observation is when the researcher, on top of participating in a setting, is also observing the setting and the population within it. By doing this, the researcher gains a greater understanding of the populations habits and culture.

Passive Participation: Passive participation is when the researcher is only consider a participant in that they are present in the setting being observed and researched.

Wildemuth Ch. 22

Research Diaries: Research diaries are a set of data collection instruments and/or techniques. This could be anything from descriptive event logs to narrative personal accounts.

Unstructured Diaries: An unstructured diary is an open-ended diary. The writer is given little to no guidance as to the content that should be included; this includes form as well.

Wildemuth Ch. 23

Unstructured Interview: An unstructured interview is an interview in which very little of the conversation is guided or preplanned. These types of interviews are dependent on social interaction between interviewer and subject.

Descriptive questions: Descriptive questions allow interviewees to give descriptions about their activities.

Wildemuth Ch. 24

Essential Questions: Questions that deal with the central focus of the research.

Probing questions: These require the interviewee to elaborate further on their answers to any given question. The purpose of these questions is to gather more information.

Wildemuth Ch. 25

Focus group: A group of people who are gathered by researchers in order to discuss a topic of research.

Focus Group moderator: A Focus group moderator keeps any focus group discussion on track and can lead to either success or failure of a focus group.

Wildemuth Ch. 26

Survey Research: Survey Research is a method in which researchers can statistically estimate the distribution of characteristics of a population based on a small part of the population being surveyed.

Pretesting: Pretesting is when the survey instrument is reviewed either by experts or even by members of the target audience.

Wildemuth Ch. 27

Construct: An object of measurement that scientists put together that does not exist previously as a observable dimension of behavior.

Cognitive or affective variables: These variables can include a person’s attitudes, interests, beliefs, or feelings and are not directly observable.

Wildemuth Ch. 28

Response style: A source of bias that is problematic because they lead to results that measure individual characteristics that are not the construct of interest.

Semantic Differential Scales: a response format that consists of a word or phrase as stimulus and adjectival pairs that are opposites. The adjectiveal pairs or displayed on a continuum and the resondants are to mark their attitude toward the stimulus by marking that point on the continuum

Thursday, October 6, 2011

Article Review 2


Samantha Finefield
October 3, 2011
Article Review 2


Koontz, C., Jue, D., & Lance, K. C. (2005). Neighborhood-based in-library use performance measures for public libraries: A nationwide study of majority-minority and majority white/low income markets using personal digital data collectors. Library and Information Science Research, 27(1), 28-50.


Introduction:

The purpose of this article is to demonstrate the need for proper data collection methods in libraries in regards to non-book usage, i.e. internet usage, and also to demonstrate the need for “neighborhood-level” library data. Previous studies have shown that when library populations diversify, circulation tends to shrink. This does not, however, mean that libraries are any less busy. It simply means that minority groups and low-income whites tend to use other aspects of the library that are not being statistically monitored. Because of this, circulation can appear low leading to cuts in funding and often merging or closing of libraries in urban and socioeconomically diverse areas. This relates very closely to my topic because it is demonstrative of the inequities in library funding in high minority and/or low income areas. The researchers also employed survey techniques that I also plan to propose in my formal research proposal.

Problem Statement:

The researchers in this study wanted to answer three main questions:

1.)  What library-usage differences are present between library markets in majority White/low income markets and majority-minority markets?
2.)  What library performance measures could be created to discover and measure these uses?
3.)  Could a standardized data collection system be developed for all different public library systems.

Literature Review:

This study did not publish a formal literature review, however, many sources were cited within the article itself. A large portion of the cited literature deals with how to best accommodate diverse library populations and analyzing library services in urban environments.  Another portion of the referenced sources dealt with different library markets and how to forecast use in the different markets present. Finally, the researchers spent some time researching how distance from a library can affect how much usage and what type of usage is present. Many of these sources would also be helpful for my own research and I plan on utilizing them.

Method:

The researchers have self-proclaimed their research design to be “multifaceted”. For their population, they had to find public libraries that had a patron base of majority-minority or majority White/low income. To do this they initially used census data to identify the communities they should be researching in. By using this data and geographic information system (GIS) data, they were able to identify 3536 public library outlets that met their requirements. This is equivalent to about 20% of the United States’ library outlets.  They then chose a sample out of the 3536 libraries to equally represent the Northeast, South, Midwest, and Western regions of the United States. This narrowed the sample to 495 outlets.

After the sample was chosen, a survey was created that would ask questions in regards to data collection practices that were already in place for the library outlets in the population. In many cases, libraries did not answer or denied participation. This narrowed the population down to 177 libraries. Over the two year period, another large portion of the libraries in the study for various reasons decided not to or were unable to participate, so the sample was narrowed down to 92 libraries who participated from beginning to end of the study.

It is important to note that in analysis this project was not designed to answer why the data collected in regards to library usage might vary, but simply to prove that it does vary. The researchers chose to analyze their data in statistical chart form for different racial groups. They offered percentages on how different racial groups use their libraries so that one reading this study might be able to see all of the data side by side. This further demonstrates the need for neighborhood-level data so that not all libraries are judged by the simple manner of circulation percentages.

Caveats:

Because of the time at which this data was collected, the processes were understandable and admittedly by the researchers complex and complicated-which can often lead to error. Another major caveat is that when the population was narrowed down by region study then lost a large portion of participants, and it is unlikely that after losing such a large group that the distribution of participating libraries would still be even across the four geographic regions listed.


Appendix

Dr. Kumasi:

I made that chart as you suggested and have attached it here.


Research Question
Data Collection Technique
Data Analysis Technique
How do socioeconomic status and demographics play a role in library closures?
Unstructured interviews; Review of documents
Thematic analysis
How does the socioeconomic and demographic status of a library effect its circulation and funding?
Survey; Review of documents
Thematic analysis; descriptive statistics
How is the homeless population served by the public library?
Survey; unstructured interviews
Thematic analysis
Why are disadvantaged groups less likely to use public libraries and what variables affect this?
Review of documents; unstructured interviews; survey
Thematic Analysis

The first and fourth questions are the ones that I feel are the strongest but please feel free to disagree with me.