What is Clinical. Data Management. Clinical Data Management is involved in all aspects of processing the clinical data, working with a range of computer. PDF | Clinical Data Management (CDM) is a critical phase in clinical research, which leads to generation of high-quality, reliable, and statistically sound data. Introduce the informatics process and data quality. • Describe data management processes. • Describe the role of data management in clinical research. 2.

Clinical Data Management Pdf

Language:English, German, Dutch
Country:San Marino
Genre:Children & Youth
Published (Last):17.01.2016
ePub File Size:15.55 MB
PDF File Size:8.67 MB
Distribution:Free* [*Registration needed]
Uploaded by: SHAUNA

Clinical Data Management (CDM). Data - important products Data Management Plan (DMP) development .. ASCII, SAS Transport, PDF, CDISC ODM Model). advance the discipline of Clinical Data Management (CDM). (e.g., ASCII, SAS Transport, Portable Document Format (pdf), CDISC ODM. Second Ed it ion. Pr a c t i c a l G u i d e t o. CLINICAL DATA MANAGEMENT Page ii Monday, June 12, PM Second Ed it ion.

The CDM process is designed to deliver an error free, valid and statistically sound database. To meet this objective, the CDM process starts early even before the finalization of the study protocol. Review and finalization of study document The protocol is reviewed from the database designing perspective, for clarity and consistency.

During the review the CDM personnel will identify the data items to be collected and the frequency of collection with respect to the visit schedule.

A case report form CRF is designed by the CDM team as this is the first step in translating the protocol specific activities into data that is generated The data fields should be clearly defined and be consistent throughout. The type of data to be entered should be evident from the CRF.

For example if weight has to be captured in two decimal places the data entry field should have two data boxes placed after the decimal. Along with the CRF the filling instructions called CRF Completion guidelines should also be provided to study investigators for error free data acquisition. DMP Document is a road map to handle data under foreseeable circumstances and describes the CDM activity to be followed in the trial. Generally these tools have built in compliance mechanisms with regulatory requirements and are easy to use.

System validation is conducted to ensure data security, during which system specification, user requirements and regulatory compliance are evaluated before implementation.

Study details like objectives, interval visits of investigators, sites and patient are defined in the data base entry. These entry screenings are tested with dummy data before moving them to the real data capture ref Binny Krishnankutty Data collection and C2 Data collection is done using case report forms CRF. CRF are tracked for missing pages and illegible data are not lost. In case of missing or illegible data, a clarification as obtained from the investigator and the issue is resolved.

This is applicable only in the case of paper CRF retrieved from the sites. Usually double entry data is performed where in the data is entered by two operators separately.

The second pass entry entry made by the second person helps in verification and reconciliation by identifying the transcription errors and discrepancies causes by illegible data. More over double data entry helps in getting a cleaner database compared to a single data entry. Discrepancy management includes reviewing discrepancies investigating the reason and resolving them with documentary proof or declaring them as irresolvable.

Discrepancy management helps in cleaning the data and gathers enough evidence for 4 South American Journal of Clinical Research Special Edition deviations observed in the data. Almost all CDMS clinical data management systems have a discrepancy management base where all discrepancies are recorded and stored with audit trail When discrepancies are found they are referred to investigators for clarification The CDM team reviews all discrepancies at regular intervals to ensure that they have been resolved.

The resolved discrepancies are recorded as closed. Some of these may include spelling errors. Ref Binny Krishnankutty Medical coding and database locking Medical coding helps in identifying and properly identifying the medical terminology associated with the clinical trial.

Classification of events, medical dictionaries available on line are used. Technically, this activity require needs the knowledge of medical terminology, understanding of disease entities, drugs used and a basic knowledge of the pathological processes involved.

Functionally it also requires knowledge about the structure of electronic medical dictionaries and the hierarchy of classification available in them. Some pharmaceutical companies customize dictionaries to suit their needs and meet their operating procedure Medical coding helps in classifying reported medical terms on the CRF to standard dictionary terms in order to achieve data consistency and avoid unnecessary duplication.

An investigator may use different terms for the same adverse event but it is important to code all of them to a single standard code and maintain uniformity in the process. The right coding and classification of adverse events and medication is crucial as an incorrect coding may lead to masking of safety issues or highlight the wrong safety concern related to the drug After a proper quality check and assurance, the final data validation is run.

All data management activities should have been completed prior to database lock. To ensure this, a pre-lock checklist is used and completion of all activities is confirmed. Any adjustment will require proper documentation and an audit trail has to be maintained with sufficient justification for updating the Locked database. Data extraction is done from the final database after locking.

This is followed by archival. The minimum educational requirement for a team member in CDM should be a graduate in life science and knowledge of computer application. Ideally medical coders should be medical graduates, however in the industry paramedical graduates are also 5 South American Journal of Clinical Research Special Edition recruited as medical coders.

Some key rolls are essential for all CDM Teams. The list of roles stated herein can be considered as minimum requirements for a CDM team. Controlling and allocating the data base access to team members is also the responsibility of the data manager Different professional organizations have outlines on clinical data management. It is active International forum for discussion of and feedback on current topics of relevance to the discipline of CDM.

To meet this expectation there is the graduate shift from the paper based to the electronic system of data management Developments in the technological front have positively impacted on the CDM process and systems there by leading to encouraging results on speed and quality of data been generated THE biggest challenge from the regulatory perspective would be in standardization of data management process across organizations and development of regulations to define the procedure to be followed and the data standards from industry perspective, the biggest hurdle would be the planning of data management systems in a changing operational environment where the rapid pace of technological development outdates the existing infrastructure.

As noted above, the design of a database always follows creation of the protocol document. The protocol defines what data is to be collected and on what visit schedule. This could provide enough information to create a database design; however, most companies define the data capture instruments before the database.

Data capture instruments are the means of collecting the data from the site. CRF pages or electronic data entry screens are the most common means of data collection, but other sources may include lab data files, integrated voice response IVR systems, and so forth. Because they collect data from the sites, the data capture instruments should be carefully designed to be clear and easy to use. In most but not all systems, the definition of a data collection device influences, but does not completely determine, the design of the database.

The fact that the data collection instrument does not completely define the database storage structures is critical. Data from a given CRF page or electronic file can often be stored in a variety of ways in a database.

All data managers, whether or not they are responsible for creating a database design from scratch, should understand the kinds of fields and organization of fields that affect storage, analysis, and processing of the trial data, and they should be aware of some of the options to weigh in making a design decision for those fields. High-impact fields Some kinds of data or fields have a higher impact on data management than others. These kinds of fields include: Examples of this type of data include: However, problems arise constantly with incomplete dates and varying date formats.

Therefore, the database design must take the possibility of incomplete dates and also different formats into account.

Dates on a CRF typically fall into three categories: Known dates related to the study visit date, lab sample date 2. Historical dates previous surgery, prior treatment 3. Dates closely related to the study but provided by the patient concomitant medication, adverse events [AEs] The first kind of date is needed for proper analysis of the study, and adherence to the protocol, and so in theory, should always be complete.

The second kind of date is often not exactly known to the patient and so partial dates are common. The last type is particularly difficult because the dates are actually useful but not always known exactly, especially if there is a significant time span between visits.

A normal database data type of date usually works just fine for known dates related to the study. If the date is incomplete, nothing is stored in the database, a discrepancy can be issued, and it is likely a full resolution can be found. Dates in the second category are frequently not analyzed but are collected and stored for reference and medical review.

A good option for these kinds of dates is a simple text field that would allow dates of any kind. The third category of dates presents data management with many problems.

These dates may be used for analysis, and so they should be complete, but the patient may not know the exact full date. A few companies have had data entry staff fill in the missing date parts according to entry guidelines. Unfortunately, this has resulted in inconsistencies and misunderstanding.

More typically, companies address this problem by creating three separate fields for the month, day, and year, and have a fourth, derived, field either in the database or in the analysis data set which creates a complete date. Depending on the circumstances, the algorithm to create the full date can make some assumptions to fill in missing pieces. See Figure 3. A study may well have dates that fall into all three categories.

Discussions with biostatisticians on the types of analyses to be performed will clarify which dates must be complete, which can be stored as partial dates for reference, and which can be approximated to allow some information to be extracted from them.

Database design considerations 23 CRF: Adverse event start date: The value would be left empty if the year were missing. Text fields and annotations Clinical data from any of the common sources, such as CRFs, electronic entry screens, and electronic files, always contain some text. The most common text-type fields are: Coded values Categories of answers, often known as coded values, comprise the largest number of text fields.

These fields have a limited list of possible answers and usually can contain only those answers. Coded text fields only present a problem if the associated field can contain more than one answer or if the answer frequently falls outside the predefined list.

When more than one answer is possible, the database design changes from a single field with a single answer to a series of fields, each of which may hold any eligible response from the codelist. These responses can be reviewed for clinical or safety CRF page: Treatment required check one: Treatment required check all that apply: Figure 3.

Database design considerations 25 monitoring, but it is difficult to analyze them because any kind of summarization or analysis of these values depends on some consistency of the data. Unfortunately, with many of these short and important texts, codelists are not practical as the groupings are not apparent until analysis takes place.

The best that can be done for those is to standardize spelling, spacing, and symbols as much as possible. Reported terms If the short texts or comments are important to the study, a codelist, possibly a very large one, should be used to group like texts.

Large codelists sometimes called dictionaries or thesauri are available for a very common class of short free-text fields: AEs, medications, and diagnoses.

These kinds of free text are often called reported terms and the matching of the terms to a codelist is a complex coding process.

See the coding section of Chapter 9 for more information on coding reported terms, and Chapter 23 for more information on the large codelists.

Long texts Longer texts, those that cover several lines, are usually associated with a group of fields or even an entire page or visit. Clinical research associates and medical monitors use this text to cross-check against other problems reported or found in the data. Long texts are never analyzed. The length of the text can pose serious problems for the database designer because some databases, query tools, and analysis programs have limitations on the length of text fields to which easy access is provided.

That is, very long comments take more work to extract. Long comments can be stored in several ways: In this case, a single field is likely to meet the needs of the long comment and has the advantage of making data entry and review straightforward. However, many reporting applications still have limits on the text that are in the hundreds of characters.

So, even if the database application does not impose limits, the query tool may not be able to extract the full text, or the analysis tool may not be able to include the field without truncation. Because of this, options other than a single field frequently must be considered. One common solution is to store the text in a series of numbered short fields grouped with the other related fields i. A series of related text fields has several drawbacks.

The designer must guess the number of fields to create, and data entry staff must determine the best way of breaking up the text between lines. Also, review of the complete long text requires the extraction and reformatting of the entire set of fields, which usually makes ad hoc retrievals of the text impractical.

Some database designers get around the inconvenience of numbered fields by storing the comments separately in their own data structure or grouping them using the tall-skinny database format described later in this chapter. In this approach, shown in Figure 3. When more room is needed for the comments, they are entered in additional rows of the grouping. With this structure, there is no need to guess at the maximum number of fields CRF page: Comment on any abnormal findings: Database design: Patient ID Visit Storing long comment text in a series of short, numbered fields.

CRF page: Database design considerations 27 or size of the text, but the data entry staff is still faced with the question of how to separate the lines of text. Ad hoc reporting from a column of comments is a bit easier than from a series of fields, but care must be taken if the database application does not automatically keep internal information on the order of lines.

Also, if the comments are regularly reviewed along with the rest of the data then a query will have to reference two storage locations and join the data together appropriately. Storage of texts With long and short text fields, and in fact with any field, there is little point in adjusting the size of the field to minimize overall database size. Hard disks and backup media are not expensive compared to the other costs of the study, and most database applications only use as much space as is needed by the data — that is, the individual database records are not expanded to their maximum possible width based on the record design.

The data takes up only as much space as it needs plus some record overhead. Database designers should make a good guess based on known studies and leave plenty of space for integers and text.

Header information To identify where a given set of data comes from, patient information such as investigator, patient number, and patient initials always appears with it when it is collected.

On paper CRFs, this information is found in a header box at the top of each page. On electronic CRFs, it appears on each screen. In electronic files of lab data, the header information may be found once in the file, once in a section, or on every line. Requiring the repetition of patient or other header information during collection helps ensure that the assignment of any given data to the patient is correct.

Patient-header information is common to all trials, but there are other fields that may be needed to identify the data and its place in the study or may be required for specific data management applications or systems. Examples of such other header fields include: Only the key code is then stored with each grouping or record of data. Facilitating the cleaning of the data is one of the best and most practical reasons for deviating from the store only once rule.

For example, if CRF pages are received singly from a site or can be separated, it can be a very good policy to enter the patient-header information new for each page and run discrepancy checks against initial enrollment information.

This would ensure that the header data was transcribed to the CRF properly and then entered in the database with the correct association. If dates vary within a single visit, that information should be captured and appropriately associated with each set of data, as it may affect analysis.

This might mean storing the date more than once per visit. Duplicating values may also be necessary to other systems integrated with, or dependent on, the data management database, such as when a document ID must be stored with each record or piece of data to allow automatic connections from the data to an imaging system version of the CRF.

Some clinical data management CDM systems allow the designer to make the decision of whether or not to duplicate the header information or store it only once. Other systems store the patient identifiers only once and allow the designer to decide for the remaining header information. Still other systems enforce the idea of not duplicating data. No matter which approach a company must use or chooses to use, data management must be aware of the implications and set up a process to identify and correct problems with header information.

The blank answer occurs most commonly with single answer check boxes such as: Check if any adverse events: If the box is blank, then either there are no AEs or the field was overlooked. See Chapter 2 for further discussion on this topic. Because data managers, especially at contract research organizations CROs , do not always have a say in the design of CRF pages, they may be faced with designing a database to support these types of fields. All the options for a database design to support single check boxes store the information of whether or not the box was checked, but they have different philosophical angles: The association of the field to the codelist has the potential of introducing confusion as it seems to imply that more responses are possible than the CRF would permit.

It does introduce the need for yet another codelist that does little more than ensure consistency at entry. Since a Yes-only codelist offers little in the way of categorizing responses, some companies just use a single character to indicate that a box or field was checked.

Calculated or derived values Data from the CRF, electronic entry screens, and electronic files are not the only data associated with a study. There are internal fields that can be convenient and even very important to the processing of data that are calculated from other data using mathematical expressions or are derived from other data using text algorithms or other logic.

Examples of calculated values include age if date of birth is collected , number of days on treatment when collecting treatment dates , weight in kilograms if it is collected in pounds or standard international SI lab values when a variety of lab units are collected. Examples of derived values include extracting site identifier from a long patient identifier, assigning a value to indicate whether a date was complete see above , and matching dictionary codes to reported AE terms.

Some of these values are calculated or derived in analysis data sets; others are calculated or derived in the central database. Database designers should identify the necessary calculated and derived fields and determine whether they will be assigned values as part of analysis or as part of the central database. If the values for internal fields are needed for discrepancy identification or report creation, then storing them in the database may be the most sensible approach.

If the values are used only during analysis, there may be no need to create permanent storage for them. Note that calculating or deriving values in the database means that the expression or algorithm is written only once and run consistently, whereas when calculations are performed at the time of analysis, the algorithm may have to be duplicated in several locations.

In this case, filling the derived values centrally reduces the effort to write, validate, and run the calculation or derivation. However, a patient number represents a kind of numeric field that frequently causes problems because of the way the data is displayed.

When the patient-number field is defined as an integer field, but the values are very long, many database and analysis systems will display the number in scientific notation.

Other examples of fields prone to this problem include document IDs and batch numbers. Patient-number fields may also have a leading zero problem.

To avoid either of these problems, define these special integer fields as text fields. Tall-skinny versus short-fat Our discussion of database designs up to this point has stayed away from the underlying structure of tables or records because many of the problems that clinical data fields present impact all applications.

Database normalization, in general, is the process of creating a design that allows for efficient access and storage. More specifically, it usually involves a series of steps to avoid duplication or repetition of data by reducing the size of data groupings or records.

In some systems, database records are intrinsically linked to the CRF page so that choices regarding normalization are not available to the designer; in other systems, a high level of normalization is enforced and the designer has no say. In many CDM applications, the database designer may choose how much to normalize a design. This discussion is geared at making those choices. There are several levels of normalized forms that are not discussed here. The normalized version of the table has fewer columns and more rows.

The visual impact that normalization has on a table has led to the colloquial, easily remembered nicknames for these structures: Both kinds of structures store the data accurately and allow for its appropriate retrieval, yet the choice impacts data management and analysis in several different ways. For example, data cleaning checks that compare start and end values of blood pressure at a single visit are much more easily performed in the short-fat format.

Database design considerations 31 Data storage in a short-fat form: Creation of the structures themselves and the associated checks is easier in the tall-skinny form since there are fewer fields and they are unique. Data querying is also easier in the tall-skinny form, since the field names containing data is clear and there is only one column per kind of field. The tall-skinny format does duplicate header data or other key data to each row.

Unless the underlying entry application manages automatic propagation of changes to header information, it would be necessary to make updates individually in each and every row! Clinical data contains many examples of repeated measurements that lend themselves well to storage in tall-skinny tables.

In general, any data collected in a tabular format is a candidate for storing in tall-skinny form. Lists of related questions, each of which has the same kind of response, may also be stored this way if convenient.

In all of the examples above, the data in each column of the table is of the same kind and type. That is, a column contains data from a single kind of measurement. A data cleaning check, such as a range, applied to the column makes sense for each value in that column. The tall-skinny form is so flexible that it is sometimes applied to data where the values in a single column are not the same kind of measurement.

Laboratory data is the classic example of this use of the tall-skinny form. One column may give the test name, one the test result, another the test units, and so on. See Chapter 8 for further discussion of using tall-skinny structures for this type of data. Taking this idea of a tall-skinny table a few steps further, we can reach a point where a table contains the names of all the fields to be measured, their value, and other information related to the measurement.

The additional information may include status of the data, such as whether the value has a discrepancy associated with it, whether it has been monitored at the site, and so on. Features and tools to conveniently access the data and also to reconfigure it for analysis are critical to these systems. Using standards Most companies have some standards in place for use in the design of a database and associated data entry screens.

Standards simplify the process of designing and building database applications and also speed the process. Designers should be required to use existing fields, codelists, modules, tables, and other database objects within reason as obviously it is wrong to force a value into a field that does not truly reflect the content of the field and intent of the value. Yet, without strong requirements to use what is there and sometimes even with those requirements , human nature and the desire to work fast always cause a proliferation of database fields that are really the same but are defined with different names and slightly varying characteristics.

Besides causing more work at the time of creation of the database, they greatly increase the effort required for validation of the database application and also of the analysis programs. Applications vary widely in how they support and enforce standard attributes of database objects. Some, such as systems linked tightly to CRF pages, may not have checks on standards at all. Others, such as large applications that support a range of data management activities, may support different levels of standards enforcement that must be set or turned on after installation or even on a study basis.

In all cases, a process supporting the technology should be in place to avoid undue proliferation of items. The process for using standards should define and manage: Database design considerations 33 Larger firms may have a standards committee or a standards czar who reviews all new items and enforces standards with special tools. The committee or czar may be the only ones empowered to define new objects and sometimes they are the only ones with permission to actually create those objects in the application.

These committees have value in that they create a person or group that has good oversight over both the philosophy of database design and the particulars of the database objects used by all projects. Unfortunately, they can become unwieldy and delay work by not meeting frequently enough, by taking too long to produce a new object, and by not providing enough people with privileges to create objects thereby causing difficulties when someone is out. Even small data management groups will benefit from having one or two people who review database designs so that the philosophy is similar and the fields are defined consistently.


Even a little bit of standardization can go a long way to reducing setup time and the associated validation of the database application. The most successful standardization efforts involve clinical teams, programmers, and statisticians working with data management and within database systems restrictions.

Standardization effort will not work unless all department managers commit to following the standards that are developed! After deciding on a design Deciding on a design is just the first step in creating a database, and the most time-consuming. The design is documented to form a specification that guides the actual database-building process. At a minimum, an annotated CRF is used to document the design, but companies may also require a separate design document. Chapter 4 discusses the specification, testing, and building of the database or EDC application in detail.

Quality assurance for database design Use of standards and reuse of similar modules are the best ways to ensure quality in database design. Every time a new database object is used and put into production, it opens up a chance for design errors.

Even when standard objects are used for new studies, the designer may choose the wrong object. That is, one person does the work and another reviews it. Review is important here because a poor database design may adversely impact not only entry but also data cleaning, extraction or listing programs, and analysis.

Just as programmers on critical applications in other industries have software code review, database designs should always be reviewed by a second person. Even the smallest of data management groups should be able to arrange for some level of review as it is not practical or wise to have only one person able to do a particular task.

Standard operating procedures for database design The procedures guiding database designs are frequently included as part of the database creation or setup standard operating procedures SOPs. If there is a separate SOP on design, it might discuss the use of standards and the process for requesting new kinds of database objects. Such an SOP should require as output from the design process an annotated CRF and possibly a database design document.

Responsibilities in database design As CDM systems become both more sophisticated and at the same time easier to use, more data managers are becoming involved in the design and creation of the central databases and entry applications.

This is both appropriate and efficient. Data managers know the typical data closely and are often aware of the problems associated with a particular collection method. With training in the application, a little background on the issues, and a good set of standards from which to build, database design or database building can be an interesting addition to the tasks of experienced data managers.

Development of EDC systems is, as of today, performed by programmers. Because data managers are much more familiar with the characteristics of the data than a typical programmer, they are a critical component to the design of entry screens and underlying database objects for these systems.

As we will see in the next chapter, they are likely to be involved in user acceptance testing for these systems if they are not, themselves, building the database. Because the database application will be used to create records of clinical data, and that data is the basis of decisions on the safety and efficacy of the treatment, and because the data may be used to support a submission to the Food and Drug Administration, a database application for a study falls under Good Clinical Practice GCP guidelines and 21 CFR 11 requirements for validation.

As we will see in detail in Chapter 20, validation is more than testing. Roughly speaking, validation requires a plan, a specification of what is to be built, testing after it is built, and change control once it is in production use. In the case of building and releasing a database application for a study: The same concepts also apply to electronic data capture EDC systems, but ideas specific to those applications will be addressed as they arise.

The discussion of edit checks data validation is specifically not discussed in this chapter; see Chapter 7 for information on building and releasing edit check programs. The validation plan in the form of an SOP will detail all the steps necessary to build and release a system and keep it in a validated state.

The high-level steps are explained in more detail below. For each step, the SOP should also describe what kind of output is to be produced as evidence that the step was carried out. Typically, that output would be filed in the study file or binder see Chapter 1. Specification and building In Chapter 3, the concept of designing a database before building it was introduced. The output from the design process for a given study is a specification of the database that is to be built.

The specification is, at a minimum, an annotated CRF.

Quite a few companies require a database design document in addition to the annotated CRF. The CRF page is also clearly marked to show how questions are grouped into modules or tables.

Because the annotated CRF is used not only by the database designer but also by edit check writers, entry screen designers, and even those browsing data through database queries, it is helpful if the codelists associated with an item are present along with any hidden, internal, or derived fields associated with each module see descriptions in Chapter 3.

The use of annotated CRFs is widespread enough to be considered industry standard practice. A separate design document, while not required by all data management groups, can provide information that is not readily obvious from the annotated CRF.

This document might include simple overview information such as the list of all groupings or tables and the names of all codelists used. In companies where there are firm CRF page and database standards, the design document can focus on deviations from those standards if any and introduce any new objects created for this study.

It might also include a more detailed discussion of design decisions that were made for problematic fields or tables. The database builder uses these specifications to create the database objects for a study. The building process itself acts as another form of database review.

The designer may notice a problem with the design or may not be able to implement the object as specified.

Practical Guide to Clinical Data Management, Second Edition

The designer and builder then work to find a solution and update the specification documents appropriately. Testing Validation always involves testing as one element of the process and all clinical database applications should be tested, without exception! A mistake in the creation of the database or a poor design choice will impact data storage and possibly analysis. Testing aims to identify these errors before production data is stored in the application so that changes can be made without impacting live data.

Just as with software testing, one has to take a practical approach and decide what kind and what amount of testing is most likely to identify problems, without taking up too many resources and too much time. The testing of a clinical database application most naturally takes place when the entry screens are ready using patient test data written on CRFs.

Depending on the complexity of the study, data management groups will typically test the data from 2 to 10 patients. If the goal is purely to test the entry screens, the test data will be mostly normal data with typical exceptions in date formats or text in numeric fields. Some companies use data that will later be used to test edit checks as test data, in which case many values will be chosen to later fail the edit checks.

Ideally, data entry staff will perform the entry of the test data as if it were real. Any problems that they identify in the fields or in the flow of the screens should lead to corrections and an appropriate level of re-entry of the test data.

But the testing should not stop once data entry is flowing smoothly! After the data has been entered, the responsible data manager should run the process or programs to fill in any derived or calculated fields. Then a tester should compare data extracted from the database against the original test CRFs as if this were a data audit.

Besides looking to see if the data values match, the tester is also checking: Are the calculated variables calculated and correct? Are all hidden variables filled in as needed? Has any data been truncated or otherwise improperly stored? Are there unexpected blank records? Are fields that should be carried over to multiple rows or groups properly carried? There may also be additional checks that are related to the application used to capture the data.

Finding any of the above problems is a very serious situation and it may be impossible to correct once data is in production. Since a validation process requires documentation of each step, the test data and results should be filed in the study file as evidence of the process.

Many companies also print a report of the final database structure along with screens, if warranted by the application they are using. The study is now almost ready to move into production use. Moving to production We know that validation is not just testing.

Therefore, completing the testing does not mean that validation is complete.

Clinical Data Management

There are usually several additional steps necessary before production entry and processing can begin. At the very least, study specific training must take place before live data is entered. Entry staff should be trained on the new study. This training is not a complex formal training on the application and company standards; rather, it focuses on study specific issues.

Typical preproduction entry training will include a discussion of difficult or nonstandard entry screens and a review of standard and study specific data entry conventions or guidelines. Frequently, a CDM group will also require a record of signatures and initials in the study file for anyone who will work on the study.

This is a good point to collect the initial set. After training, and only after training, should entry staff be given access or permissions on the data in the new production study. For more on training, see Chapter In addition to training, it is quite common to have additional requirements that must be met before production use of a study application. These may include, for example: Even if the checklist is not a controlled document or required as part of an SOP, it provides value to the people doing the work and improves quality because critical steps will not be missed.

Change control During the course of carrying out a study, it is very likely that a change to the database design will be needed.

Study setup 39 data coming in from the site such as texts longer than originally anticipated , texts are showing up where only numeric values were expected, or perhaps the protocol has been amended and now requires that additional data be collected during a visit.

After carefully validating the application, a change made willy-nilly can result in putting the database application in an unvalidated state. Once a system has been validated, it will only stay validated if no changes are made to the application. Larger software systems, such as the database systems, are under change control once they have been validated.

Changes are carefully tracked and appropriate testing and documentation is required. The same should apply to database applications for individual studies. In the case of a database application for a study, it is important to consider what a change to the database application is and what is not a change.

Adding, deleting, or modifying patient data according to standard practices and under audit trail is not a change to the application and does require change control. In most systems, adding users and granting access again, using appropriate procedures is not a change to the system. Adding fields, lengthening text fields, and modifying entry screens are all examples of changes that should be carried out under a change control process.

A change control process can be quite simple. It requires that responsible staff members: Many of the most common changes can be documented and carried out with a few sentences in the log or on the form. However, more complex changes that have several interlinked requirements or those that impact several groups would benefit from having a targeted document, in addition to a simple change control log entry, to describe the change process and impact in detail.

Some examples will help clarify the requirements. The case of lengthening a text field provides us with an example of a low-impact change. Existing data is unlikely to be affected. The database definition will change but depending on the system the change may have no impact on entry screens. Now consider the example of adding a field to the database because a protocol amendment requires that the sites now record the time a certain procedure is performed. In addition to the actual database change, adding a field touches on: This case shows us that if multiple areas or users are affected, or any impact cannot be described in one or two sentences, then a more detailed change plan is warranted.

However, because so much more is built into an EDC application as an integral part of the system, the specification step is more complex and takes longer. The specification document may not be an annotated CRF but it may be the protocol plus any other study specific requirements for the screens or even mockups of screens.

It may also include many of the edit checks that are programmed separately in traditional systems, and the specification frequently includes the structure of transfer datasets. The fact that all these things are programmed into the application, often by a separate EDC programming group, is the main reason that data management groups should allot a larger block of time for study setup and building for EDC studies than for paper based studies.

While experienced data managers are able to set up studies in the more traditional data management systems, it is currently the case that EDC applications are programmed by programming groups. Change control is even more critical in an EDC system because changes impact the sites directly. Because of the serious impact that even minor changes will have on an EDC application and its users, a more formal version control or release system is appropriate.

These version control or release systems are, or are like, those that a software company would use. Study setup 41 Quality assurance Care in building and releasing a database cannot be stressed enough.

Building the database from specifications is another form of review. Then, after building is complete, we have data entry and the data comparison testers who are reviewing the data from the database build. In all of these steps, the review is not a policing action where the reviewer is looking to catch the first person making a mistake; rather, it is a collaborative effort to identify potential problems.

As we will see in Chapter 5, many groups find that errors are introduced not during initial entry but when values need to be edited later. Similarly, many companies have a solid process for releasing databases but introduce errors when they make changes.

Looking more closely at change control can improve quality of the process and the data. SOPs for study setup In order to avoid writing a validation plan for each study database setup, an SOP needs to be in place that will be general enough to suit all studies and yet specific enough to satisfy the requirements for validation.

Change control may be part of that study database setup SOP or it may be discussed in detail in another procedure. The actual checklist for moving a study into production may or may not be part of the SOPs, but the requirement to have a checklist may be explicitly called for. In the traditional data management systems, the setup process may not appear to be programming since it is performed through user interfaces; and yet it is programming.

In fact, it is programming that has a huge impact on the data that is to be collected from a clinical trial. For that reason, every database setup should be validated as an application that will affect safety and efficacy decisions for the treatment in question.


Regardless of whether there is a computerized step involved in the process, and regardless of the specific application used, the main data entry issues that must be addressed by technology or process, or both, are: Companies aim to reduce transcription errors using one of these methods: Does it mean that the data is an exact duplicate of the values found on the CRF, or are there deviations or variations permitted?

In double entry, one operator enters all the data in a first pass, and then an independent second operator enters the data again. Two different techniques are used to identify mismatches and also to resolve those mismatches.

In one double entry method, the two entries are made and both are stored. After both passes have been completed, a comparison program checks the entries and identifies any differences. Typically, a third person reviews the report of differences and makes a decision as to whether there is a clear correct answer for example, because one entry had a typo or whether a discrepancy must be registered because the data value is illegible or is in some other way unclear.

The other double entry method uses the second entry operator to resolve mismatches. After first pass entry, the second entry operator selects a set of data and begins to re-enter it. If the entry application detects a mismatch, it stops the second operator who decides, right then and there, what the correct value should be or whether to register a discrepancy to be resolved by a third person.

A more experienced operator or coordinator is usually used to act as the third person reviewing the discrepancies identified by comparing the passes. Heads-up second entry works best if the second entry pass is performed by a more experienced operator, but many companies have seen good success with this method even when using temporary staff.

If the entry application supports it, it would be worth considering using different methods at different times or with different studies, depending on available staff.

Extensive checks at entry time are rarely incorporated into entry applications when data is to be entered in two passes. A check at entry would only slow down the operators and would bring little value.

Checks on the data is run after differences have been resolved. Single entry While relatively rare in traditional clinical data management, single entry is an option when there are strong supporting processes and technologies in place to identify possible typos or errors because of unclear data. Electronic data capture EDC is a perfect example of single-pass entry; the site is the entry operator transcribing from source documents, and checks in the application immediately identify potential errors.

Entering data 45 Single-pass entry could be considered an option in a traditional data management system when there are extensive checking routines built into the data entry application and further checks that run after entry. The entry operator would have to be trained in the protocol and encouraged to view the entry process as a quality and review process rather than as a simple transcription process.

To address the concerns that there is a higher error rate, companies using single-pass entry could consider a more extensive audit of the data than would be used for double entry.

OCR has been used for many years to successfully read preprinted header information. As the software has improved, it has also been used to identify handwritten numbers and marks in check boxes. However, success rates in reading handwritten free text are still rather poor. As more companies move toward imaging or fax-in of CRFs, the opportunity to use OCR as a first entry pass has increased. When used, OCR becomes the first data entry pass on numbers and check boxes and is followed by at least one other pass to verify those numeric values and fill texts or other fields that could not be read by the OCR engine.

The second-pass operator after OCR visually checks values read by the OCR engine and types in any text values that appear on the form. Sometimes the OCR engine will provide an indicator of how sure it is about reading each value to guide the operator to fields that need review. Because the visual verification is hard to enforce and because the operator may fill in a significant number of fields, there is a danger of introducing transcription errors.

Companies generally address this by doing yet another pass and also by including extensive post-entry checks. How close a match? We have discussed methods to help ensure accurate transcription of the data from the CRF, but what is accurate?

Does accurate mean that the data in the database exactly matches that written into the CRF field? Are data entry operators given some leeway to substitute texts, make assumptions, or change the data? These questions must be clearly defined by each and every data management group. Some companies do subscribe to the entry philosophy that the data transcribed from the CRF must match the CRF to the highest possible degree.

Their guidelines tell the entry operators: However, it should be noted that the current industry trend is away from most changes at the time of entry. The feeling seems to be that except for symbols, data should be entered as seen or left blank; any changes are made after entry during the cleaning process so that there is a record of the change and the reason for it in the audit trail. That being said, there still may be some changes permitted at entry time in addition to replacing symbols.

Permitted changes to texts might include: For example, in a study started in November , data entry may correct visit dates written as January to January That is, the value written on the CRF may be obviously incorrect or simply missing. If checks at entry time have been built into the entry application, those checks should never prevent entry staff from entering what they see.

Dealing with problem data No matter how well designed a CRF is, there will be occasional problems with values in fields. The problems may be due to confusion about a particular question or they may be due to the person filling it out. The most common problem is illegibility; another is notations or comments in the margins. Sometimes pre-entry review of the CRFs can help manage these and other problems but this can cause process problems in exchange. Because companies deal with these data problems in different ways, each data management group must specify the correct processing for each problem in data entry guidelines.

Illegible fields Illegible writing on the CRF always causes problems for data entry, data management staff, and clinical research associates CRAs. Entering data 47 management group should consider the following questions when planning an approach to illegible fields: Even when staff tries to appropriately identify values, some data is just illegible and will have to be sent to the investigator for clarification during the data cleaning process.

Notations in margins Investigators will sometimes supply data that is not requested. This most frequently takes the form of comments running in the margins of a CRF page but may also take the form of unrequested, repeated measurements written in between fields. Some database applications can support annotations or extra measurements that are not requested, but many cannot.

If site annotations are not supported, data management, together with the clinical team, must decide what is to be done: Pre-entry review In the past, many companies had an experienced data manager conduct a pre-entry review of all CRF pages. The idea was to speed up entry by dealing with significant issues ahead of time.

The problem is that extensive review and in-house annotation circumvents the independent double entry process and its proven value. Now some companies still do a pre-entry review but it is minimal and focuses on issues that will prevent entry.

More and more is being left to the entry operators. The entry staff is trained to identify and flag problem data appropriately. Review by a data manager only happens for significant issues.

This is a better use of resources in that more senior staff members work only on problems, not normal data. It also encourages data to be entered as seen with changes and reasons recorded in the audit trail. The philosophy toward pre-entry review does not have to be all or nothing. It is possible and sensible to satisfy both the need for smoother entry and the concerns of going too far by selecting an approach appropriate to the staff and supporting applications.

For example, if incomplete or incorrect header information would actually prevent entry of any data from the page, then a data coordinator might review the header information while logging pages into the tracking system.

A CRA or data coordinator might also review specific, troublesome data such as medications or adverse events AEs and clarify only those. The rest of the discrepancies may be left to the data entry staff to register manually or to the computer to register automatically. The data coordinator or CRA then addresses the discrepancies only after the data has been entered.

Modifying data Initial entry is not the only entry task performed by data management. Following initial entry there are edits or corrections to the data. The corrections may have been identified internally in data management, by the CRA, or through an external query to the investigator. Just as there is a process for entry, there should be a well-defined process for making these corrections to the data. Corrections may be made by regular entry staff or only by more senior staff.

Some firms allow any entry operator to make corrections; others require a more experienced data manager to make changes. Many data entry or data management applications support different access privileges to data in different states, but if systems support is unavailable, the procedures must be enforced by process. Most data management systems do not have a second pass on changes.

The industry has found that errors are often introduced when changes are made. Clearly, rerunning cleaning checks on these new values is essential and that is assumed.

Many companies have also instituted a visual review of changes. This may be done through the entry screen, through a report, or through the audit trail. Entering data 49 support it, data management groups must put into place a process that ensures the visual review always takes place.

Any changes after initial entry, made by any person, should be recorded in an audit trail. The Food and Drug Administration requires audit trails to record changes made to clinical data 21 CFR 11 , and it should be possible to view this audit trail at any time. There are, however, differences in how the term audit trail is interpreted and implemented. The form the audit trail takes, and the information associated with each implementation, adds further variations to audit trails across systems.

The quality and correctness of the database is determined by checking the database data against the CRF and associated correction forms. Quality assurance QA is a process, and quality control QC is a check of the process. QA for data entry builds on good standards and procedures and appropriately configured data entry applications. The approach that assures quality data entry is documented in the data management plan. QC for data entry is usually a check of the accuracy of the entry performed by auditing the data stored in the central database against the CRF.

Ideally, the auditors are not people who participated in data entry for that study. External quality assurance groups at some companies perform this task to ensure independence of review. They identify the CRFs to be used, pull the appropriate copies and associated query forms, and compare those values against the ones stored in the central database.

The result of the audit is usually given as a number of errors against the number of fields on the CRF or in the database. To conduct an audit, there must be a plan for the audit that includes: If the plan is study specific, it can be laid out in the data management plan or in a separate audit plan document.

After the audit, a summary should document the final count of fields, total number of errors, error rate, and any action taken.

Ten percent of the patients may be easy to select but does not guarantee good coverage of investigator sites. Ten percent of CRFs is better as long as all pages are represented.Similarly, missing data is also a matter of concern for clinical researchers. Even though the DMP is a plan, that is, it is the way you expect to conduct the study, it must be revised whenever there is a significant change because it will then show how you expect to conduct the study from that point forward.

Other examples of fields prone to this problem include document IDs and batch numbers. As its importance has grown, clinical data management CDM has changed from an essentially clerical task in the late s and early s to the highly computerized specialty it is today. Some examples will help clarify the requirements. These are entered manually in a second-review pass after OCR and, as in the case of double entry, this significantly adds to the processing effort.

Having the sponsor data management or clinical group check missing page reports from the CRO can help identify this issue early in the study.

Also, review of the complete long text requires the extraction and reformatting of the entire set of fields, which usually makes ad hoc retrievals of the text impractical.