Insights on regulatory pathway, algorithms, cybersecurity, and validation.
Steve Tyrrell, Regulatory Compliance Associates Inc., and Alex Reid, Proactive Cyber Security
October 16th, 2018
The quest is on for machine learning (ML) to turn raw data into useful medical devices that improve outcomes and reduce the burden on the healthcare system. Supporting, and someday emulating, human thought processes enables ML devices to improve the decision-making process for patients and clinicians. Software designed to continually learn and improve poses both challenges and opportunities. For example, in the not-so-distant future, patients may experience devices such as an intelligent insulin pump that more effectively manages insulin needs in anticipation of a dessert about to be consumed. As ML matures, the possibilities to improve patient care are endless while the challenges are many.
Addressing Challenges of ML Device Development
As device developers seek to harness ML for next generation products, it’s important to address the unique ML challenges in the pre-commercial stage, including:
- The research phase, where selecting the ML algorithm drives subsequent mitigation considerations
- The building phase, which addresses verification, validation, and risk management concerns specific to ML
- Regulatory processes and the nuances of working with regulators in novel areas of technology
Selecting Types of ML Algorithms
ML typically uses supervised or unsupervised algorithms to discover patterns in the data to generate actions. In supervised learning, the developer guides the teaching process of the algorithm. This requires a known data set with inputs and outputs to train the machine to make predictions. The developer corrects the machine’s predictions in this learning cycle, and the system learns from the corrections. Natural language processing is an example of this, where the developer enters a sentence, asks the machine what it means, and over time, the machine learns the pattern and consequently makes smarter outputs.
The other type is unsupervised learning, where the developer does not provide teaching guidance along the way. Instead, the machine extracts general rules from the data using mathematical optimization and other techniques. An example involves the condition of peritonitis, a swelling of the peritoneal cavity. The machine takes pictures of the patient cavity and determines if infection is suggested based on its analysis of prior data.
Choosing to use either a supervised or unsupervised ML algorithm typically depends on factors related to the structure and volume of the data and the use case at hand. The developer can introduce errors in the model if the underlying assumptions are untrue. For example, a machine could learn how to visually differentiate between a criminal and business person if given a set of photographs. However, the resulting algorithm would be incorrect when applied to future photos because appearance doesn’t predict criminal behavior.
Managing Validation and Verification in ML
Besides choosing the best algorithm at the onset of new product development, the R&D professional needs to choose the right amount of data to validate the model. Mislabeled data, too little data, and too much data introduces risk into the machine. The risks are based on the type of algorithm in ML.
In supervised learning, the decision tree or statistics are used to teach the machine. It’s validated by using fault tree analysis that pairs with the decision tree to understand if the machine takes the wrong path based on input data. The challenge in validation is mathematically proving the error margin falls within the tolerance originally specified. The math requires an adequate data set where data points can be allocated between the learning the validation samples.
ML can make it difficult to determine appropriate data sizes due to the lack of standards and potential introduction of creative approaches. A developer, for example, might compare previous clinical studies to suggest sample sizes. In-vitro diagnostics validation might require some 450,000 patient data sets for algorithm development and validation to ensure there are ample sample sizes in the fault tree analysis.
In another example, IBM Watson allows developers to choose various AI algorithms. A developer searching for cancer tumors in a biopsy might choose a neural network, which can be difficult to understand and challenging to develop and validate. The neural network is trained using sets of data like a list of blood test results that indicate the patient has a certain number of cancer cells. Or, the algorithm can be trained by supplying it with images of healthy cells and those afflicted with cancer. In this example, the algorithm can be validated by comparing the training data set to a reasonable clinical study, which compares blood test results to correct diagnoses, the developer asserts that the algorithm has been adequately trained.
Algorithms, which are developed by using a pre-developed AI model, can be validated by leveraging the recommendation of the original creator on the amount of data needed to test to meet the desired margin of error.
Another way to determine sample size involves leveraging domain experts such as clinicians, who understand the frequency of all paths in the decision tree based on their knowledge of each tree node and its associated risks.
Minimizing Cybersecurity Risks in ML: Security and Privacy
Developers understand the need for security and privacy in healthcare applications. In ML, a new security risk involves the malicious introduction of bad data into the machine, which can lead to invalid and harmful outputs. Use of ethical hackers, however, can help mitigate the risk of bad data in supervised learning. These hackers specialize in simulating malicious acts that lead to limitations or boundaries on system learning, which ultimately protect against bad data.
The risk of bad data in unsupervised ML can be reduced by buying an established algorithm with embedded mitigation tactics (mathematical, programmatic, etc). However, a thorough review of the algorithm mitigations is necessary by cybersecurity specialists who understand medical devices and unsupervised machine learning algorithms.
Developers have long been wary of privacy issues related to protected health information in cloud applications. Since many ML platforms leverage cloud storage and therefore introduce new risks to the process, it’s important for ML developers to understand how their data is collated with other data sets. This shared data about the patient condition could be combined to violate privacy through a technique called inference by malicious entities. Inference is an approach that combines different innocuous and non-sensitive data to gain sensitive information. Consider the aggregated data for an automobile accident patient. It’s possible an attorney might slice the data and discover information about the victim’s diabetes to blame the patient for the mishap due to a potential diabetic coma. The use of polyinstantiation can mitigate these types of risks by slicing the data into sets for collation, and designing data silos so only the developer knows which piece goes into the algorithm, thereby preventing the disclosure of the entire patient database.
Working with Regulators in ML Technologies
Experienced device developers understand the well-established process for working with regulators and developing submissions. The challenge in ML surrounds the lack of precedence. Regulators are used to working with established frameworks where a consistent set of inputs generates a reliable set of outputs, but in ML, the outputs are continuously evolving. Thus, device developers must help regulatory agencies establish ways to assess the safety and effectiveness of products. Some suggested tactics include:
- Build a regulatory affairs team with experience in ML and multidisciplinary functions.
- Conduct early and frequent meetings with regulatory authorities so both sides can learn from each other.
- Find clinical and regulatory information throughout the world that is supportive of the desired goal. If negative information is uncovered, address it rather than ignore it.
- Do not submit a “black box.” Develop ways to communicate how and why a particular result occurred.
- Seek related credible sources, publications, guidance documents and experts, reference them, and utilize them.
- Recognize that regulators are used to understanding the device’s Mechanism of Action. In ML and other novel technologies, it is difficult to describe how the device works, so seek alternatives such as Safety Assurance Cases to help effectively communicate risks and risk management activities.
Using Safety Assurance Cases in ML
- Presents all claims that can be easily linked with supporting evidence to demonstrate the validity of safety claims
- Is a formal method used to demonstrate the validity of a claim. It is presented as a clear, understandable argument supported by scientific evidence
- Contains arguments based on statistical measurements of the system’s reliability and are grounded in risk-based and scientific methods to help discuss and draw conclusions
For regulators, safety assurance cases:
- Help to connect the dots in a structured way
- Helps them to see both claims and supporting evidence
- Helps them understand the “big picture”
For medical device manufacturers, safety assurance cases:
- Align medical device product development with FDA expectations.
- Help gain faster regulatory approvals. Medical device companies that move toward best practices by leveraging safety assurance case principles can clearly demonstrate product safety in a single document, making it easier for the FDA to review.
The three elements of an assurance case are claims, evidence, and arguments.
- The claim is a statement about a property of the system—typically, contained and/or driven by a requirements specification
- The evidence should provide information demonstrating the validity of the claim. This evidence may include verification and/or validation results including, but not limited to, test data, experiment results, and analysis. The evidence should also address the relevance to the claim, whether the evidence directly supports the claim, and whether it is providing sufficient coverage of the claim
- Arguments should link evidence to the claim and provide a detailed description of what is being proven. The arguments also should identify specific evidence that supports the claim
There are numerous examples, published by the FDA, industry, and academia that explain the reasoning for constructing an effective safety assurance case. It is important that companies understand the importance of a well-structured medical device product development process executed by experienced professionals, as well as the diligence and effective communication strategies that provide regulators, payers and medical professionals with the evidence and confidence needed to bring these new technologies to market.