Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Identify the steps involved in data management, from initial planning for data c

ID: 3866352 • Letter: I

Question

Identify the steps involved in data management, from initial planning for data collection to retiring, archiving, or destroying data no longer in use. Describe the tools that are required to accomplish each step in the data management process. Explain the approach needed to identify from among those tools which are best-of-breed or most commonly used. Identify the steps involved in data management, from initial planning for data collection to retiring, archiving, or destroying data no longer in use. Describe the tools that are required to accomplish each step in the data management process. Explain the approach needed to identify from among those tools which are best-of-breed or most commonly used. Identify the steps involved in data management, from initial planning for data collection to retiring, archiving, or destroying data no longer in use. Describe the tools that are required to accomplish each step in the data management process. Explain the approach needed to identify from among those tools which are best-of-breed or most commonly used.

Explanation / Answer

A data management plan is an integral part of the research plan. The data plan can be reviewed and expanded during research but main principles and procedures should be determined before the research starts, at the latest before data collection begins.

The aim with data management planning is to ensure that good scientific practice is followed in the research, data are kept safe and secure at all stages of research, and data sharing is possible after the original research has been completed.

At the planning stage of the research, researchers should find out whether the research requires ethical review. If data are collected outside Finland, it is best to find out beforehand what practices are followed regarding ethical review in the country of collection. University websites often have instructions on the matter (you can search with phrases such as 'ethics review board', 'ethics review committee', 'human participants').

Data management plan

A data management plan describes how research data are collected or created, how data are used and stored during research and how made accessible for others after the research has been completed. You can attach the following kind of concise data management plan to your research plan:

» Data Management Plan Models
» FSD's Guidelines in DMPTuuli

If a concise data management plan has been used for funding application, you will need to expand and specify the plan once the research has started. If circumstances change, the plan needs to be updated. A good data management plan contains a solution to all the questions specified below.

1. Data

What kind of data are collected is mainly determined by the research questions. Research data are typically questionnaire surveys, interviews, focus group discussions, written material, visit or meeting recordings, official documents, archival material, websites, or register or media data.

Data collection methods are determined by the type of data sought for. Quantitative data can be collected through interviews, postal or online questionnaires, by using existing source material, or by measuring. Qualitative data are often collected by recording individual interviews, group interviews, sessions or meetings as audio or video files. Written material collection is usually initiated by publishing writing requests or invitations, and then collecting the writings via email, post or a specifically created website. Official documents can nowadays often be obtained from the Internet but some are available only by request or by obtaining permission to use them for research purposes. Access to register data generally requires applying for permission.

Researchers and research teams can collect the data themselves or can contract a data collection company to do it. If data collection is contracted out, it is best to send the call for tender to several companies. Data management plans are useful for drawing up tender calls.

2. Rights

Copyright issues may be relevant for research data even though most empirical research data are outside the scope of the Finnish Copyright Act. If there are copyright issues involved, the owner of the copyright determines how the data can be used. Data use requires permission from the copyright owner. Regardless of whether data are protected by copyright or not, it is important to clarify the roles of persons involved in the research, since reusers of the data will cite the creators of the data when using it.

Research teams should always make an agreement on data ownership and usage rights. Usage rights should be determined both for the research project and for usage after the project has been completed. Before making any agreements, the requirements of the research funder(s) should be investigated in order to make the agreement follow their guidelines.

If an external contractor is used for data collection, the research team should determine, at the latest when the contract is being made, who is the owner of the data, who has data management responsibility and in what ways research participants are informed of future uses of the data. If the data will be archived for data sharing at the Finnish Social Science Data Archive (FSD), the agreement may state that the external contractor delivers copies of the data and associated metadata straight to the Archive.

Data archives specify access rights to archived data. Official documents are generally freely accessible to researchers. If data created from research have access restrictions, the depositor of the data will determine access conditions to it. Openly available web material is openly available for research as well but archiving such material for reuse may not always be possible due to restrictions set by the Finnish Copyright or Personal Data Acts.

When information is collected directly from research participants, reuse possibilities of the data are determined by the information given to participants on the future uses of the data.

3. Confidentiality and data security

Confidentiality in the research environment basically means planned and careful processing of personal data. Personal data should only be collected and processed to the degree necessary for the research, and unauthorised access to the data must be prevented. In cases where it is necessary to include personal identification numbers in the data, there must be clear rules about who can process such confidential material.

When data contain personal data, the Finnish Personal Data Act requires that researchers completes a description of the file. There is a separate form for the description of scientific research data files. The data file description adds to the transparency of personal data processing. The description must be shown to research participants if they ask to see it. If data are collected through the Internet, it is also possible to use Privacy Notice form. Privacy Notice is an extended description of a personal data file and provides more detailed information. More information on the forms on the website of the Data Protection Ombudsman.

To avoid any ambiguity, the data file description should contain all information supplied to research participants on data collection, processing and use. If the dataset is to be archived at the FSD with personal data included, it is recommended to state this in the data file description as well as give this information to research participants before data collection.

Data security means keeping personal information collected, as well as computer systems, data files and transfers of data safe. It is easy to copy and disseminate digital research data files, or unintentionally destroy or change them. Making back-ups of data files and preventing unauthorised access to them are thus integral parts of data security.

It is recommended that data files requiring a large storage capacity are stored in the IDA Storage Service provided by the Ministry of Education and Culture. IDA Storage Service is a useful and safe solution also for collaborative research projects where the same data are analysed in more than one university. The service is aimed at Finnish universities and at the projects or research infrastructures funded by the Academy of Finland.

4. File formats and programs

A variety of statistical software are available for processing quantitative data. There are other software for processing and analysing qualitative data although many researchers still continue to use word processors for analysing textual data. The software chosen determines the file formats used. Because software systems keep developing and changing, it is best to store at least one copy of data in a software-independent format or in a standard format that many software are capable of interpreting.

Data files can be stored and transferred with optical media (e.g. CD, DVD, Blue-ray) and with non-volatile memory (e.g. memory cards or USB-sticks). The safest way to preserve data files is to store them on duplicate copies of magnetic media (e.g. hard drives or tapes).

5. Documentation on data processing and content

Technical and content decisions made at data entry stage influence the quality of data. Decisions to be made include, for example, whether to enter information into a matrix or the technical solution chosen for audio or video recording. Solutions chosen for post-collection processing also have an impact on data quality. For instance, in the case of quantitative data, the naming and organisation of data files, naming of variables, and documenting the codes and reasons for missing values. In the case of qualitative data, the transcription level chosen.

Sufficient data documentation in different stages of data collection and processing is a crucial factor for quality. Data documentation is also important for long-term preservation and usability. Good documentation, that is, carefully created metadata enable informed re-use of data and long data life cycle.

6. Life Cycle

Subsequent use value of research data is largely dependent on data management measures carried out during the research. Effective data management before and during data collection and processing is an essential requirement for generating data that can be used afterwards for new research, learning, or teaching of methodology.

If data are to be stored by an individual researcher, university department, research unit or organisation, data owners must provide a solution to all aspects of subsequent data management: storage, archival and dissemination packages, terms of use, data delivery, and dissemination of metadata. If data are deposited with the Finnish Social Science Data Archive, the archive will take care of all these aspects and in addition ensure confidentiality and data security.

It is not worthwhile to preserve all research data permanently. When considering whether to archive a dataset or not, uniqueness of the data, its usability, access conditions, re-use value in research and education, and archiving costs all need to be taken into consideration. Insufficient or poor data management during the research stage considerably increases the costs of archiving, as it often is time-consuming to process the data for reuse afterwards and find out information needed for metadata. Still, destroying a dataset must always be a conscious decision and not the result of an inadequate or careless data management.

TOOLS

Two tools are critical to the efficient and reproducible handling of statistical data: make and perl. Taken together, perl and make offer an unbeatable combination for handling data and managing large, complex statistical analyses.

MAKE - Make is one of several utilities originally developed for Unix programmers to aid in the management of large software programming projects. The usefulness of make should not be understated–it is an invaluable tool for the handling, management, and analysis of statistical data. Make is available for a variety of computer operating systems and will run on small laptops, desktops, workstations, and servers. Make builds on the fact that every computer file has a time stamp associated with the last time it was changed, edited, or updated. Make uses this time stamp to decide which files are out of date with respect to each other. If make determines that two or more files are out of date with respect to each other, make will automatically run the necessary programs to get the files synchronized with each other.

Large-scale analyses do not simply “appear.” They are performed incrementally, with the researcher focusing on one small aspect of the problem, and when completed, turning her attention to the next aspect of the analysis. In many cases, the results of one analysis (or data transformation) are used as inputs to the next analysis. Make allows the researcher to encapsulate the manipulations employed during each step of the analysis into a single set of computer instructions. This helps ensure that other researchers on other computers can repeat the overall analysis in the future. Further, make helps the researcher operate more efficiently. For example, suppose that one step of a complex analysis takes several hours to complete, and suppose that the outputs of this analysis are used by subsequent programs for additional analysis. If the researcher simply used a “batch” file to store the necessary instructions, the long analysis step would have to be performed each time the researcher made a change to one of the smaller, subsequent programs. By using make, only the necessary programs would be executed, eliminating the need to constantly run the long analysis each time.

PERL - Perl is another of the Unix utilities that can save the researcher hours of time when handling statistical data, and performing statistical analysis. Like make, perl is available for most operating systems and will run on laptops, desktops, workstations, and servers. Perl stands for “Practical Extraction and Reporting Language.” Perl was designed as a tool for manipulating large data files, and as a “glue” language between different software packages. Perl makes it easy to manipulate numbers and text, files and directories, computers and networks, and especially other programs. With perl, it is easy to run other programs, then scan their output files for specific results, and then send these specific results off to other programs for additional analysis. Perl code is easy to develop, modify, and maintain. Perl is portable, and the same perl program can run on a variety of computer platforms without changes. Perl programs are text files and can easily be shared with other researchers.

Example: The researcher spends a surprising amount of time transforming data from one format to another. For example, many data collection devices generate text files with the individual data elements separated by commas with each record containing a variable number of characters and fields. Many statistical analysis software packages require that each input record contain the same number of characters and fields. In order for these software packages to analyze data from data collection devices, the output of the data collection device must be cleaned (e.g., all “funny” characters must be removed), and reformatted into an electronic format that can be loaded by the statistical software. For small amounts of data, many researchers perform this cleaning and reformatting task by hand, or within a general-purpose software package like a spreadsheet. However, this approach is prone to errors, and can result in irreproducible results. It also becomes very cumbersome and time consuming when large data files, or large numbers of files are transformed.

Perl is the perfect tool for these types of jobs, and it renders the analysis repeatable, which is an important aspect of the scientific method.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote