|
Data are the foundation of research and science. Once an
appropriate research topic is determined, proper data collection,
retention, and sharing are vital to the research enterprise.
Scientific concepts
Data embraces any collection of facts, measurements, or observations.
Different disciplines have different notions of what constitutes data,
ranging from material created in a wet laboratory, such as an
electrophoresis gel or a DNA sequence, to that obtained in
social-science research, such as a filled-out questionnaire, video or
audio recordings, or photographs. Data can be astronomical measurements,
microscope slides, climate patterns, cell lines, field notes, soil
samples, or results of statistical analyses.
How should the data be collected?
There are a number of methodological issues of which researchers should
be aware when selecting data. These include choices about:
- Data types (e.g., nominal, ordinal or interval measures).
- Samples ("frames") and sample size, instruments.
- Methodologies.
Different disciplines have preferences for different approaches,
and for what constitutes acceptable "rigor" for reliability and validity
of results. This is one reason why a careful prior review of the
existing literature on a topic is imperative when designing a research
protocol. For example, a key component of most protocol designs will be
the sample size (or "n"). From a purely methodological perspective, that
decision hinges on how large an error one is willing to tolerate in
estimating population parameters; or put differently, what effect size
will be required for the result to be considered significant. These must
be determined in advance of commencing data collection. But statistical
explanatory power must be balanced against time, cost and other
practical considerations, just like every other element of the protocol.
Data collection methods vary by discipline, and according to the
data types of interest; but the emphasis on ensuring accurate and honest
collection remains the same. Consequences from improperly collected data
include:
- Inability to answer research questions accurately.
- Inability to repeat and validate the study.
- Distorted findings resulting in wasted resources.
- Misleading other researchers to pursue fruitless avenues of
investigation.
- Compromising decisions for public policy or private
decision-making.
- Causing harm to human participants and animal subjects.
As with data selection, it is critical that researchers have
sufficient methodological skills to assure the quality of data
collection efforts. Everyone who participates in the investigative
effort should be trained in the methods. Where possible, researchers
should try to build checks-and-balances into the collection process.
Storage and Protection
In information security, it is conventional to speak of three core goals
for information protection:
- Confidentiality - limiting information access and disclosure
to authorized users;
- Integrity - ensuring that data is not changed inappropriately
after recording, whether by accidental or deliberate activity. Also,
the notion that the person or entity in question entered the right
information - that is, that the information reflected the actual
circumstances ("validity") and under the same circumstances would
generate identical data (what statisticians call "reliability").
- Availability - refers to the availability of information
resources to authorized users. Everyday risks like fire, water or other
environmental damage, or simple technical failures like hard disk
crashes, must be considered. It's an essential practice to make
frequent, periodic backup copies of a data collection, and store these
copies in a secure secondary location that is protected both from
intruders and environmental threats.
UTA Guidance regarding information security and data can be found
here: http://www.uta.edu/oit/iso/index.php.
Read more about
The Practice of Keeping Research Notebooks: Paper vs. Electronic.
Retention and disposal
Data handling procedures should describe when, how, and who may handle
data for storage, retrieval, sharing, archiving and disposal purposes.
These procedures may depend on the nature of the project, the cost of
maintaining that data, research sponsors' requirements, etc.
Retaining data on paper files and electronic media long past the
end of a project can increase the chances of unauthorized access.
Disposal of sensitive data requires care and technical expertise to
ensure that the information could not be reconstructed from the storage
media. Review UT Arlington's Records Information Management policies
here: http://www.uta.edu/policy/rim.
Data Analysis
Like data selection criteria, the choice of statistical analysis
methods should always precede data collection. Waiting until later in
the research process increases the risk that analytic decisions will be
driven by consideration of which produces the most favorable results.
Any bias occurring in the collection of the data, or selection of method
of analysis, will increase the likelihood of drawing a biased inference.
Every field of study has developed its accepted practices for data
analysis; if an unconventional approach is used, it is crucial to
clearly state this is being done and show how this new and possibly
unaccepted method of analysis is being used, as well as how it differs
from other more traditional methods. Whether statistical or
non-statistical methods are used, researchers should be clear - to
themselves and to the persons to whom the analyses are presented - of
the limitations and possible biases of their methods.
Publication and reporting
The practice of ensuring research integrity extends to the stage of
documenting and preparing results for publication. Publishing in
peer-reviewed journals or presenting in scholarly meetings is the
primary mechanism for investigators to disseminate their findings to the
research community. This community relies on authors to report the
events of a study honestly and accurately. All researchers should be
aware of the issues that compromise the integrity of data reporting and
publishing:
- Misrepresentation of data quality, or of the data itself.
- Analysis of data by several methods to find a significant
result.
- Fabrication or falsification of data.
- Inadequate evaluation of prior research.
- Misleading discussion of observations.
- Reporting conclusions that are not supported.
- Failure to disclose conflicts of interest.
- Plagiarism.
- Unjust attribution of authorship.
Ownership issues
Data "ownership" generally refers to both the possession of and
responsibility for information. As a legal concept, it embraces the
range of rights and obligations with respect to a data collection,
including rights and obligations to share. All investigators and
research staff should review the institution's policies with respect to
data ownership, to make sure their understanding matches the
institution's. If a specific third-party sponsor is involved, the
sponsor / granting agency may set out the terms of copyright.
Review UT Arlington's policies here: http://www.uta.edu/ra/otm/index.html.
Notebooks and journals
Data and data books collected by undergraduates, graduates, and
postdoctoral fellows on a research project generally belong to the
grantee institution, or the PI under conditions described above. In any
case, students should generally not assume that it will be permissible
to take "their data" when they leave. Appropriate arrangements need to
be made in advance. If the faculty PI does not raise the issue, the
student or fellow must. Usually arrangements may be made to take copies
of the data when they leave.
|
|