A HEALTHCARE DATA ANALYST FACES MORE SOPHISTICATED INFORMATION NEEDS
It can be argued that as the number of potential healthcare data items that can be analyzed in modern clinical systems continues to grow, analysts have less understanding of the data compared to when databases were much smaller and used for specific purposes.
As health IT becomes more sophisticated, having more data is indeed a good thing. But it's simply a fact of life that a healthcare data analyst cannot have as intimate knowledge as he or she once had of data systems. Indeed, I often have to scratch my head and look up the exact definition of a data item if I haven't used it in a while or if I need to explain to somebody exactly what the source data represents.
When new data becomes available in healthcare information -- or any other type of -- system, or when embarking on a new analytics development effort, it is important to fight the urge to dive right in without first obtaining a clear understanding of the data and how it relates to the business.
Below is a high-level summary of what is critical to know about data before exploring new data or developing analytical tools such as dashboards, reports, alert agents and any sort of reporting. Future articles will cover these in more detail.
- What the data represents. Much healthcare data is generated on the front lines during the provision of care by clinicians and other staff. It is important for analysts to know the processes and workflows from which the data is taken, what the data is measuring and who is responsible for entering the data. If possible, outcome data should be attached to process data to help determine how efficient, effective and safe clinical workflows are.
- Where and how the data is stored. Fundamental to using data is for the data analyst to know where the data is located. Is the data being stored in an enterprise data warehouse, a data mart aligned with a clinical system or a standalone database? Along with knowing where the data is stored, understand the quality of the data. For example, are there missing values that might bias analysis, or are there invalid entries that need to be cleaned/addressed?
- The data type. Most database management systems require data to be stored as certain types (such as integer, character, and date/time). Regardless of how data might be physically stored in a database, what kind of data do the values represent in "real life"? Are there any data conversions that need to be done before the data becomes useful for the intended purpose? For example, numbers stored in character fields may need to be cleaned and cast to a numeric type such as float or integer to undergo appropriate operations.
- What logically can be done with the data. Given the type of data and how it is stored, what kind of database and mathematical operations can be performed on the data in meaningful ways? While you can do counts for any data type, even basic operations, such as addition, and statistics, such as mean, would not be valid on categorical and ordinal types of data, even if the values appear numeric.
- How to turn data into useful information. Raw data in and of itself is rarely useful. Even in this age of big data, an organization's executive, management and other decision-makers can make more effective decisions if the data can be compiled, analyzed and used to generate insight into an organization's operations. It can also highlight what the best way forward is if the data results range from specific, well-defined performance indicators on a dashboard to simulation and predictive analytics.