false
Catalog
The Liver Meeting 2020
Emerging Trends Symposium Artificial Intelligence ...
Emerging Trends Symposium Artificial Intelligence (AI) in Hepatology
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
My name is Grace Su, and I am here today with my co-chair for this session, Dr. Meena Benzal. We're happy today to present you a very exciting and timely symposium on artificial intelligence and hepatology. Artificial intelligence, or AI, has become increasingly pervasive in our daily lives, including healthcare settings. AI is only beginning to affect how we care for our patients with chronic liver disease, and there are multiple ways to apply the technology that remain unexplored. We hope in this symposium to focus on the basics of the methodology and application of AI that is just beginning to emerge for the clinical practice of hepatology. The learning objectives for this session are Review the basic concepts of artificial intelligence Discuss potential applications of AI in clinical practice and population health for patients with liver disease and Describe the role of AI as it applies to liver imaging and pathology We have a very exciting lineup of speakers who work in this area, starting with Dr. Bott, who will review the basics of AI and its application for the care of patients with liver disease. Dr. Schattenberg will show us how we can use AI to leverage electronic medical records to predict clinical course and outcomes in liver disease. Dr. Caldellaro will show us how AI can be used in histological assessment. I will discuss how AI can be used in imaging of patients with liver disease. This session will end with a wrap-up from Dr. Banzal on how we can move forward to implement AI in a healthcare system, discussing the promises and challenges. Thank you. Good morning, everyone. I'd like to thank the organizers for the opportunity to present to you as part of this wonderful symposium on artificial intelligence and its application to hepatology. I'll be speaking to you about the ABCs of artificial intelligence and applications for the care of patients with liver disease. Here are my disclosures. Here's the outline of my presentation. So I'll start off by talking about definitions in artificial intelligence and machine learning, why we should use machine learning tools in the study of liver disease. I'll talk about different machine learning tools, their applications and their limitations. As well, we'll talk about the critical importance of data input. So starting off with definitions. What is artificial intelligence? We hear about AI every day, about driverless cars, big tech, surveillance. What is AI actually? Is it replacing human tasks? Is it thinking logically? Is it self-learning or taking over the world? Basically, in its essence, artificial intelligence is intelligence demonstrated by machines. There are many valid definitions of artificial intelligence. Intelligence combines many traits, including learning, reasoning, problem solving, etc. Similarly, AI draws on many fields of research, including computer science, game theory, statistics, etc. Therefore, there are many valid definitions. One definition is that AI is technology that behaves intelligently using skills associated with human intelligence. Another definition is computer systems that can use information and act in a goal-directed manner. There are many branches and applications of AI. So AI is many things with many branches. One can use AI for medical diagnosis or AI-assisted surgery, cardiac arrest prediction, prediction of outcomes in liver disease, natural language processing. So there are a variety of branches and applications of AI. What is machine learning, on the other hand? Well, when many people talk about AI, they're actually referring to machine learning. Machine learning is a subset of artificial intelligence. As you can see in the figure to the right. Machine learning is a field of applied artificial intelligence that allows software applications to become more accurate at generating predictive models without being explicitly programmed. ML detects hidden patterns and interrelationships within large datasets. Deep learning neural networks are a type of machine learning algorithm. Here's the machine learning pipeline. So one starts off by collecting and processing inputs. Then choose an algorithm that can map the input to the output, collect and process the outputs, and train, optimize the algorithm until predicted and true outputs are similar. So this is a nice figure actually illustrating this machine learning pipeline. From input to finally true output that is similar to the predicted output. How does one compare machine learning with traditional biostatistics? The prediction of a given outcome is often very difficult. So there are several variables that can affect clinical outcomes. Standard clinical risk assessment models will assume that risk factors are related in a linear fashion to clinical outcomes. Machine learning can better incorporate multiple risk factors and identify more nuanced relationships between risk factors and outcomes. ML algorithms learn from existing data to find novel patterns. You'll see here in this figure, there's a very nice comparison between traditional algorithms on the left side and more machine learning algorithms on the right side. So as I said earlier, linear relationships are easily interpretable with traditional biostatistics. However, machine learning allows for these non-linear relationships to be elucidated. Certainly, this is computationally more expensive and requires days of processor time to build models. But that really can yield very novel and interesting outputs and algorithms that can really serve to provide more accurate predictions. What are the factors that have led to an increased interest in machine learning over the last few years? One of them is the increasing adoption of electronic health records. So this has allowed for a large amount of data to be collected in a systematic way. And you can see here in this paper, they reported that up to 84% of US hospitals in 2015, so this is five years ago, had electronic health record systems. And that probably is much higher by now. Additionally, large datasets have become available and actually publicly available. So these include the Precision Medicine Initiative, MIMIC Critical Care Database, UK Biobank, the Scientific Registry of Transplant Recipients. So there's so many different large datasets that have become publicly available. Additionally, there's such a great diversity in healthcare data, health-related data that is being accumulated out there. So we have lab tests, we have imaging data, genomics, proteomics, lots of different types of molecular data, transcriptomics, etc. Devices that are collecting data, so wearables, vital signs, so many different types of data that are really being accumulated. And such complex data really requires more advanced algorithms for analysis. Additionally, there's been standardization, so diagnosis code, ICD-10, etc. Various types of data are being standardized. Additionally, there's been great progress in machine learning tools include the Precision Medicine Initiative, MIMIC Critical Care Database, UK Biobank, the Scientific Registry of Transplant Recipients. So there's so many different large datasets that have become publicly available. Additionally, there's such a great diversity in healthcare data, health-related data that is being accumulated out there. So we have lab tests, we have imaging data, genomics, proteomics, lots of different types of molecular data, transcriptomics, etc. Devices that are collecting data, so wearables, vital signs, so many different types of data that are really being accumulated. And such complex data really requires more advanced algorithms for analysis. Additionally, there's been standardization, so diagnosis code, ICD-10, etc. Various types of data are being standardized. Additionally, there's been great progress in machine learning tools in recent years, so there have been advances in deep learning techniques. There's been learning with high-dimensional features, which has been feasible. And additionally, there's been a democratization of machine learning with the availability of open-source software. There's also been, in recent years, an increased appreciation for the need to practice more precise medicine, so precision medicine. What this means is basically there's so much diverse data out there, and every individual has unique attributes, which will lead to prediction of certain outcomes, prediction of treatment response, disease progression. So it becomes very important to look at all this complex data and generate predictions at a patient level, really, rather than a whole population level. And so that's become more and more feasible in recent years, and this will lead to more targeted, actionable insights. So why are we interested in studying machine learning in liver disease? Liver diseases are by nature complex and heterogeneous in nature. There are various factors that affect susceptibility to disease, including sex, ethnicity, genetics, environmental exposures, lifestyle, body mass index, diabetes, medications, and the list goes on and on. There are complex, nonlinear patterns in liver enzymes, liver function tests, and various other types of tests in our patients with chronic liver disease that are quite difficult to analyze in a really complex way when these patterns are nonlinear. Additionally, electronic health record data, transient elastography, imaging technologies, histology, molecular data, there's lots of such data that have been collected in recent years. And so in liver disease, interrelationships abound because they are by nature so complex. And so you can see here, this is from a review that we published earlier this year, where you see there's been a tremendous growth in the number of publications pertaining to machine learning application in hepatology. So now we'll move on to different machine learning tools or applications and limitations. There are three main paradigms of machine learning. There's supervised learning, which is predicting labels from data. So for example, this image shows how one would use supervised learning to predict pneumonia from chest x-rays. Unsupervised learning, so learning representations of data. For example, discovering liver cancer subtypes from gene expression arrays. A third paradigm is reinforcement learning, so dynamic decision making using data. So this refers to real-time drug dosing practices to understand the optimal treatment course in a specific patient. This figure shows how machine learning tools have been applied in recent years to address questions in hepatology. One example has been gradient boosting machines to predict survival in patients with primary sclerosis and cholangitis. Neural networks to predict certain outcomes in the transplant population. Support vector machine to classify shear wave elastography images. So there are various types of studies that have been conducted in recent years using the diversity of machine learning tools based on the question. And so you see here this table actually nicely shows what task one wishes to perform with a particular study question. And then what are the types of models that are most appropriate for this specific task? So for classification, you see that logistic regression, support vector machines, random forests, neural networks. These are the types of models that are most appropriate for this kind of study question. Regression, linear regression, random forest, neural networks. Clustering, k-means clustering. Then time series models, for example. Recurrent neural networks, long short term memory networks. So these are the types of models that are most appropriate to time series models. Image based models, convolutional networks, neural networks, generative adversarial networks. These are the types of models that are most appropriate for that specific task. So this gives you an idea as to how one would use machine learning models, different types of models that are specific to a specific task. This figure gives you an idea. So this is from an abstract that we had presented last year. And basically, the study question was to effectively detect well compensated cirrhosis using machine learning tools. You can see here, actually, what we did was we developed various machine learning algorithms to detect well compensated cirrhosis. And the best machine learning algorithm actually was an ensemble. So an ensemble of all five models. And ultimately, we ended up with a sensitivity of 83% and an hour rock of 84.8. And so what this tells you is it's even possible to perform and generate an ensemble algorithm of existing algorithms and existing machine learning models. This is another example in which we use artificial neural networks to predict mortality post transplant of different cause. So we generated a personalized risk calculator. And this incorporated longitudinal data to finally generate recommendations based on predictive features. And you can see here in the figure to the right, we use the SRTR database for training. And finally, our own institutional data as a test set. And we're able to generate predictions with an AUC of 0.8 on average for overall mortality, as well as the individual major causes of mortality, such as cardiovascular events, cancer, infection, and graft failure. Finally, I'll move on to the importance of data input. This is the Achilles heel of AI. One should have clear research objectives in one study. Machine learning requires ongoing commitment to data quality to produce accurate results. So clearly, studies involving machine learning tools are dependent on accurate, clean, and well-labeled training data to learn from. Therefore, collection, cleaning, preparation, and labeling phases are especially critical when one is using machine learning tools on large data sets. So I'll conclude with some key takeaway points. Artificial intelligence has many powerful tools for modeling data to outputs. Applications in healthcare abound. AI and machine learning does not remove the need for clear research objectives or high-quality data. And certainly, the importance of artificial intelligence in hepatology will continue to grow in the coming years, given the complexity of data in liver disease and transplantation. So certainly, it is a great time to learn more about these wonderful tools out there and to connect with local collaborators to work with these tools to address your pressing clinical questions. Thank you so much for your attention, and I'm happy to take questions now. Thank you. Good morning, everyone. My name is Jan Schattenberg, and it's a pleasure to be presenting during this special Trent Symposium on Artificial Intelligence in Hepatology during the Liver Meeting Digital Experience 2020. I'd like to extend my special thanks to the course organizers, Dr. Banzal and Dr. Su, for inviting me and giving the opportunity to present. Now, the title of my talk is going to be Leveraging the Electronic Medical Record to Predict Clinical Courses and Outcomes in Patients with Chronic Liver Disease. I'm a professor of medicine at the University Medical Center in Mainz and the director of the Metabolic Liver Research Program here in Germany. My main research interest evolves around the identification of non-invasive biomarkers and novel therapeutic approaches in non-alcoholic steatohepatitis and chronic fibrosing liver disease. Here you can see my disclosures. So now let's move on and detail the background. Why could artificial intelligence in clinical hepatology be of impact and importance? As hepatologists, we're all aware that the identification of chronic liver disease and the prediction of the course whenever an individual patient is sitting in front during the doctor's visit is difficult and challenging. The presentation is very heterogeneous and some patients do reach endpoints while others never develop cirrhosis or cancer. Now this identification is even more difficult if you leave the hepatology space and are in a primary care provider office or environment. A lot of times the standard of care consists of clinical laboratory data that might or might not be available and is influenced by clinical and personal experience. Now how could machine learning techniques help? Before I get into more details, let me discuss one of the findings that has been one of my motivations to actually apply machine learning algorithms in chronic liver disease. Here you see some data from the German claims database that is a representative sample of the German population and you'll see on this side of the slide the coded NAFLD diagnosis per federal state in Germany in 2018. The first thing that you can realize is that these numbers are clearly below what's been published in the literature with regards to the expected prevalence of NAFLD. Now the second aspect of this slide is the healthcare report in Germany that actually reports the prevalence of type 2 diabetes and you see that there is a big national difference according to which state you're looking at. For example, eastern German countries you have a much higher prevalence of type 2 diabetes. Now that's not mirrored by a higher prevalence of NAFLD albeit type 2 diabetes being one of the major risk factors for NAFLD. So I think this is a very good example how in clinical real world the recognition of a liver disease that's likely associated with the higher prevalence of type 2 diabetes is just missed in clinical practice. In hepatology we see an emerging, a tremendous increase in very well enrolled, prospectively recruiting clinical research studies with outcomes linked to it and I'd like to highlight the European NAFLD registry litmus that has been going on for quite some time but there are others that you could name. And machine learning technology being applied to these large registries will help us to develop algorithms by doing repeated analysis and multi-step processes on weighted data sets to allow and train the machine to identify a patient. So how could this look like? I'm going to start out with an example on the identification of patients with chronic liver disease. Now one study that has been presented during ASLD last year is the NASHMAP study. Here a real world database, namely the NADDK database that enrolled patients longitudinal and does hold information on histological NASH or clinical characteristics that make NASH the most likely diagnosis after excluding other chronic liver diseases was used to build a machine learning algorithm and predict the presence of NASH in these patients. Now that was done in a total of 704 patients and the machine algorithm, and that's where the strength of these machine learning algorithms come from, builds a tree using the available data and separates NASH from non-NASH cases through these trees at multiple decision points. Now to me, one of the striking findings after starting to get into the field and studying machine learning for NASH is that actually the machine is able to look at cutoffs. For example, here Hb1c greater or less than 5.6 that as a clinician, I would not have considered as clinical relevant. And then it adds on additional parameters in those subgroups to divide the two cohorts NASH or non-NASH. Now the second major strength of the algorithms is obviously that it takes the weighted data that comes out of one tree and reanalyze it in a second tree. And then it could apply the same parameters but potentially choose a different cutoff. And by doing so over and over again, you end up with the best possible separation of cases versus non-cases using machine learning techniques. And going back to NASH map, we're able to identify a 14-feature model machine learning algorithm that predicted NASH and separated the non-NASH cases. And you see the clinical features that are used in that model. Now some of them are very self-explanatory. Hb1c and liver function tests obviously are not a surprise. But you can see that the machine is able to utilize the information on height, hematocrit, or white blood count that in clinical practice you would have not looked at so closely to distinguish a NASH from a non-NASH patient. And then another strength is obviously a simplification. So a NASH map algorithm is capable to separate NASH from non-NASH cases even if you reduce a number of features that are available for data input. And these are the results that have been published for the NASH map algorithm. You see that using the 14 or 5-tier feature model, the area under the receiver area under the receiver operating curve is 0.82 or 0.8 to predict the presence of NASH. Now how is this useful in clinic? This doesn't mean that the patient has NASH, but it enriches a risk-stratified population and it can help to determine further testing, managing, or follow-up appointments with a different specialist, in this case, of course, a hepatologist, to secure the diagnosis. Now a second big application is obviously research databases where you can ask scientific questions, but I think in the end the support in clinical practice will be of a much higher value than the research aspect because in clinics we're dealing with a lot of information that the primary care provider does have to process and liver disease is not typically on the top of his list. So if you apply these type of algorithms to medical records, a first assessment even outside of the hepatologist's office might guide the primary care physician to recommend specific follow-up. Now I'm going to get to the second part of this talk which is going to focus on the identification of adverse outcomes for patients with chronic liver disease. Here, a recent publication from the Mayo and International PSC group caught my eye. It's recently out in hepatology, but before I get to that specific presentation, I'd like to detail, of course, PSC, as you're all aware, is a disease with a very variable clinical cause. If a patient visits the physician, typically risk stratification is done on lab values, for example, the alkyne phosphatase elevations or a little more refined, the Mayo PSC risk score. The data that's actually shown on this slide is the data that has been published relating to the enhanced liver fibrosis score. Here, colleagues separated a PSC group and looked at transplant-free survival over 10 years, and you can see that in TERT cells, according to the ELF, you're able to break down the event rate. Now, the study that I mentioned in the intro and was recently published in hepatology is a study that used machine learning to risk estimate outcomes in PSC. Now, this was done in the Mayo PSC cohort and an international validation cohort. The authors excluded advanced PSC at baseline or cholangiosolar carcinoma and predicted the risk of hepatic decompensation at five years. Now, you see the number of events in this slide at the bottom table and the baseline characteristics that are fairly comparable between the derivation and the validation cohort. Interestingly enough, by feeding the well-defined clinical data into a machine algorithm, the machine was able to predict outcomes better than traditional risk scores. Now, looking at the occurrence of ascites, variceal hemorrhage, or encephalopathy, PREST-2 outperformed MELD, Mayo PSC, or elevated alkyne phosphatase by identifying nine out of 10 patients that do reach hepatic decompensation from PSC, and that was both seen in the derivation and the validation cohort. Now, the way these machine learning algorithms go about is that they weigh the clinical data differently, and you see on the right-hand side of this slide that the ranking of the variables that go into the algorithm can be different. Now, a second example that I'd like to present this morning, and I'm going to direct your attention to the presentation during the liver meeting digital experience this year, is the so-called fast NASH map algorithm. It uses the Optum electronic healthcare records looking at an endpoint and determining the first index date in this analysis, the time to progression of an endpoint was established. Now, this is a U.S. cohort, and by looking at patients that progress fast versus standard, namely, three years or faster after the index diagnosis versus more than six years following the index diagnosis, those two cohorts with differential progression characteristics were compared. A total of 4,013 patients were included in this study, and here you see some of the results, namely an AUC of 0.77 in sensitivity, meaning fast progressors are correctly identified as fast progressors in 70% of cases. How could this, again, help us in hepatology clinics or primary care provider clinics? I think it informs the provider of an at-risk liver disease population that he sees in his clinic. It will help to stratify and provide a basis to not feel overwhelmed by the many elevated liver function tests that might be seen and that then are at risk to be not followed up by directing and individualizing recommendations based on an abstract risk score that is generated by a machine learning algorithm. So, with that, ladies and gentlemen, I'd like to conclude. For me, key takeaways, machine learning techniques are available and using them in electronic healthcare records in the back end, they could analyze a large number of features and actually, through complex interactions, identify hidden association using different analysis approaches, like to highlight clustered, supervised, and unsupervised analysis techniques, and from my perspective, it remains to be shown whether this can outperform clinical experience or standard risk scores, but for sure, it's something that, looking at a broader scale, will be very applicable to other clinics, potentially outside of hepatology clinics. Now, how could this look like? Again, if you're a primary care provider and you have your patient record running an AI tool in the back end of that record, patients that are at high risk of fibrotic liver disease can be flagged up and then to refer this patient will be much easier for the physician than to actually assess all the clinical data and determine the hepatic risk score by himself. So, I think in the future, we'll see this emerging to support clinicians and complement hepatologists in clinical decision-making, in particular in places where hepatologists might not be readily available. In the end, machine learning is not going to displace hepatologists, but I think it will support and identify patients at risks, and with this, I'd like to thank you for your attention. I would first like to thank the scientific committee for this invitation to discuss the applications of artificial intelligence in the histological assessment and the prediction of clinical outcomes. So, this is the outline of my talk. We will first review the basics of AI for histological analysis. We will then move to the field of diagnosis and staging using AI-based pathology, then prognosis prediction, biomarker discovery. We will finally discuss the challenges and limitations and also the way to integrate AI into laboratory workflows, and we will have a glimpse of what pathology labs may look like in the foreseeable future. To understand how we process histological imaging using AI, you have to remember that an image is a matrix of numbers. Images of pathology include colors, so actually a pathology image is the addition of three matrices, one red, one green, and one blue. So, this is a matrix of numbers, one red, one blue, and all the matrices contain the pixel values. The numbers that are on the matrices can be processed through mathematical functions and thus computed by neural networks. You have here a schematic representation of a deep learning workflow for image processing. You have this picture, HCC picture, that is the addition, as we just discussed, of the three matrices. We will first extract numerical features from these matrices. We transform the values of the pixels into other values, and these values are fed into the neural network, which is composed of several layers of mathematical transformation. At the end, we have an output value which corresponds to the probability of the feature we want to predict. The neural networks have a particular feature because they are able to correct themselves to adjust the output value so it can be near the reality. I mean, if we put an HCC image and the output value is that it's not likely to be an HCC, so the prediction is wrong, the mathematical functions that are included in the neural network will modify themselves in order to modify the output value and to bring it closer to the probability for the time of being an HCC. So this is what happens during training of the model. The use of AI-based pathology for diagnosis and staging is a very active field with massive financial implications. Some may think that these approaches will completely replace human pathologists and also human radiologists. Well, this is not my opinion, and we will discuss this at the end of the talk. These approaches, more importantly, could be implemented for time-consuming and tedious tasks, and they will improve the standardization of scoring and staging. So far in the liver, there is no major paper for diagnosis and staging of liver diseases or tumors, but I wanted to show this paper that was published a year ago in Nature Medicine. The authors, they built a model that is able to automatically detect foci of prostatic cancer on a prostatic biopsy, and they reach really impressive performance. And the screening of prostatic biopsy is really a time-consuming task for pathologists, and we may imagine that these approaches, these models, may help to directly go to the biopsy that has foci of cancer and thus save times by not looking at all the biopsies from a patient. AI-based pathology will also probably have an impact on liver diseases, and I wanted to show this paper on mice models of NASH. The authors, they used convolutional neural networks, deep learning, to classify the NASH models and to provide a computational score that provides score for fibrosis, ballooning, inflammation, and steatosis, which are the features of NASH. And these approaches may help to standardize the way we look at NASH models. For patients, there are also been attempts to build scores and models that quantify the features of NASH, and there is this paper. It's not completely AI, but it's a computational pathology, and the authors aim to quantify the features of NASH on a biopsy, and for example, they are For example, they are able to measure the ballooning by analyzing the size of the cytoplasm of the cells. They are able to quantify the inflammation, and the lymphocytes were detected according to the size of the nuclei, and also to quantify the fibrosis. So these approaches may lead to improve the robustness of the staging systems of NASH. You are aware that pathological features are some of the most powerful predictors of prognosis in human cancers. The mathematical processing of histological slides should thus be able to provide a significant amount of information regarding prognosis. These approaches are also providing a unique opportunity for the standardization of pathological examination. I will now introduce a work that we did in the field of prognosis prediction and we aim to predict prognosis after a resection of HCC. We used the discovery series from Henri Mondeur University Hospital and we validated our models in the TCGA database. And of note is that the staining protocol were different between the two series so that really is a true validation of the models. So we fed the neural networks with the survival data and the histological slides and you have here the survival curve when we stratify the patient as high risk or low risk according to our deep learning models. You can see that the model validates well in the TCGA series and what was important is that the model outperformed every clinical biological and pathological features, classical features and also outperforms significantly composite score that include all the clinical biological and pathological features. So that shows that in the histological slides in the image there is really a significant amount of information that performs better than every classical variable associated to prognosis. Interestingly, one of our models allowed us to analyze the image areas that were the most predictive of prognosis and these pictures were reviewed by a pathologist and we showed that the high risk areas were enriched in cellular atypia, macrotrabecular architectural pattern and also vascular spaces and the images in low risk were enriched in fibrosis in the tumor, fibrosis in the non-tumor liver and also immune cells in the non-tumor liver and these kinds of approaches are really exciting because they give you an insight on how the model makes its decision and we may further plan biological translational research by analyzing the biological pathways that are activated in these areas. Finally, AI is also a very promising approach for biomarker discovery and we will now see an example. The field of biomarker discovery is one of the most promising fields of AI-based pathology and I will present this work by the team of Jacob Katter that was a landmark study in this field. It's not the liver but it's colorectal cancer, GI cancer and in this paper, the authors show that they are able to predict the microsatellite instability status in colorectal cancer with really an impressive performance and these approaches really will improve the workflows in laboratory because they will allow to better select the patient that will need to undergo molecular testing. So with these kind of approaches, you just analyze the image and you have a biological insight of the molecular features that are altered in the tumor. So maybe in the near future, we will first analyze the slides by deep learning and then sequence particular molecular alterations according to the predictions of the models. For the liver, we may expect a really interesting thing particularly in the field of cholangiocarcinoma. There are numerous targetable alterations in this cancer and these algorithms may facilitate the screening of patients and further improve precision medicine for patients with this highly aggressive malignancy. I will now give you my impression on what the pathology lab may look like in the near future. I took an example of pathologists that analyze an HCC surgical specimen, resection specimen, it performs a gross examination, selects the sample, then reviews the slide, select the relevant models, AI models that he wants to apply to the slides. For example, some AI tools with diagnostic modules, the detection of vascular invasion, satellite nodules, and also AI tools regarding prognosis, the prognostic score, and also the prediction of molecular feature. And at the end, you have a composite report with the classical features, differentiation, etc. And also some new insights into the prognosis, into the molecular features of the tumor. For example, here on the report, you have the AI HCC expression signature. It's just a fiction module that shows that there is a stem cell phenotype in the tumor, and also gene mutation prediction that shows that this tumor is not likely to be CTNN mutated, but very likely to be TP53 mutated. So these reports are more global and integrate both conventional examination and artificial intelligence analysis. I would like also to add that AI, although I'm very enthusiastic about the use of AI in pathology, AI will not solve every one of our problems. And to show that, I will illustrate with a kind of a joke, but you can upload slides into a Google website that performs deep learning on all kinds of images, and the output is what the image represents, what the model thinks the image is. And in this case, I uploaded an image that was taken from a biopsy of a patient with primary biliary cholangitis. And the answer was this otherwise probably very nice Mexican restaurant in Mexico City, La Xilonguita. So it's just a joke, but you have to remember that sometimes model can go completely wrong. And you always have to look at the output of the models very carefully, as it may be completely wrong. So the challenge in the next year will be to have really models that we can trust at 100% if we want to use them in our daily practice. I would like to thank again the scientific committee for this invitation to talk about all this really exciting field, and really exciting not only for pathologists, but for everyone that analyses data on a regular basis. Good morning, ladies and gentlemen. My name is Grace Su. I'm a professor of medicine and surgery, as well as the director of the Morphomics Analysis Group at the University of Michigan. I'm a hepatologist and chief of gastroenterology at the VA Ann Arbor Health Care System. My research has focused on analytic morphomics, a high throughput, highly automated methodology, which can harness medical imaging data, such as CT scans and linkage clinical outcomes. Using artificial intelligence techniques, we have developed some non-invasive methods for the diagnosis and prognosis of liver disease and hepatocellular carcinoma. I'm very excited to be part of this symposium, and I'm very grateful to the ASLD for allowing me to participate. I would like to disclose my relationship to two commercial entities, Applied Morphomics and Prenovo. How does the computer, quote unquote, see the liver? This is important for our understanding of how artificial intelligence can analyze medical imaging. A CT scan is essentially a grayscale image. Grayscale images can be mathematically represented by a grid of numbers. Each box in this grid represent what we call a pixel. And each pixel ranges in value between zero for black and 200 pixels for red. Zero for black and 255 for white. All the different shades of gray fall in between these two values. Each image can therefore be represented by this matrix of pixel values. In order for the computer to process these numbers and recognize the patterns, multiple methods have been used. The most common type of deep learning method used to analyze medical imaging are convolutional neural networks or CNN. Central to CNNs is the use of kernels, which are matrices that slide across the image. Values are multiplied with the input image values and the sum value would represent not only the input values, but the relationship of the input pixel values with its neighbors. You can imagine that different kernels when used allow for extraction and learning of features in the images. In the example above, you can see that when the same kernel slides across the image, you will calculate different values, which tell the computer which specific features are present. In the top image, the center of the image is a pixel of three, which is more white than its neighbors, which are represented by a presence of a pixel value of two. In the bottom image, the center of the image is a pixel value of one, which is less white than its neighboring pixel values of two. While you apply the kernel across the image, you multiply each number in the kernel with its respective image pixel, and they add the composite numbers. The composite function yields a seven for the top image and a minus three for the bottom. You can imagine that this particular kernel can be in a way sharpening the image, and when used in conjunction with other kernels can be used to detect edges. In deep learning networks, many different types and combinations of kernels and values are used, both in a two-dimensional and three-dimensional fashion, and can allow the computer to extract latent features for learning. In this way, multiple convolutional layers followed by pooling can allow for feature extraction, which can then be utilized for classification tasks. Other CNN architectures not shown here, such as U-Net, can also be used to do segmentation tasks, such as delineating the outline of a liver. How can we harness this for medical imaging in patients with liver disease? Imaging is commonly used in clinical practice for the care of patients with liver disease. We routinely perform imaging studies for diagnostic purposes, such as surveillance and diagnosis of liver cancer. Over 90% of hepatocellular carcinoma can be diagnosed via imaging alone. Although we often obtain CTs or MRIs to diagnose lesions and answer specific diagnostic questions, there's actually a wealth of information on the CT, which is not often routinely reported or collected. We know, for example, that the presence of fatty liver or cirrhosis can be characteristically visualized on CT scan, but whether it's reported is not consistent or quantified. In addition to these features, we routinely ignore the patient's body composition or phenotype. Body composition features like visceral and subcutaneous fat represent patient phenotypes, which are most accurately quantified using CT scans. The amount of core muscle, which is important in health, is also very well visualized in an abdominal CT. A key consequence of chronic liver disease and cirrhosis is the development of muscle loss or sarcopenia. One of the most important sets of muscles for health are abdominal core muscles. Among abdominal core muscles, one of the easiest to visualize and measure on imaging studies is psoas muscle. As you can see here at the spinal level, lumbar four, the psoas muscle, which flank the spine, is very easy to visualize. A simple delineation of this muscle can give you a sense of the muscle's structure. It can also give you an area. We found early on that psoas muscle was very predictive of survival after liver transplantation. In a cohort of patients undergoing liver transplantation who had an incidental CT for other clinical reasons, you can see that those with the smallest quartile of psoas muscle had a much worse survival than compared to those with the largest quartile, where a three-year survival was about three times better. This simple measurement on CT scan can provide very important added clinical information for risk assessment in patients undergoing liver transplantation. But manual delineation would be fraught with impreciseness and inaccuracy and really not practical implementation. This is where artificial intelligence can be of assistance. In this paper we published in the American Journal of Gastroenterology earlier this year, we sought to automate the process of measuring psoas muscle using deep learning methods. The training cohort was reference population of 5,268 trauma patients aged 16 to 91 from the University of Michigan who had CT scans obtained for clinical indications between 1998 and 2015. All these CT scans had curated manual segmentation of the psoas muscle, which we could use as ground truth. After training, we validated the algorithm on two independent cohorts, a healthy cohort of 1,655 kidney donors who had a CT scan for the purpose of donation and a cirrhosis cohort, which consisted of 254 patients who were prospectively followed at the University of Michigan Hepatology Clinic and had a CT scan within six months of enrollment that included the L4 region. We utilize convolution neural network where the input were the CT images and the output was the prediction map. Built with many layers of connected neurons meant to mimic human vision, we built interacting visual fields to produce more complex results. The neural networks perform very well on our healthy population of kidney donors with a dissimilarity coefficient of greater than 0.9. Dissimilarity is a way to mathematically calculate how two images align on top of each other. Measurements derived from these masks showed similar performance whether the measurements were calculated using manual delineation or automated CNN model prediction with interclass correlation of 0.957. We then wanted to test the automated algorithm on a sick population of cirrhosis patients who may have lower quality muscle and thus be challenging for the algorithm. We found in fact that the deep learning models also perform well on cirrhosis populations with dissimilarities and ICC all greater than 0.9. More importantly, we wanted to see if we can demonstrate how automatic measurements of psoas muscle could be used clinically. On the left is a survival curve showing that those with the low psoas muscle index had significantly lower survival than those with high psoas muscle index. On multivariate analysis, you can see that even adjusting for severity of liver disease using child classification, the psoas muscle provided incremental information for survival in this cohort of patients. With an artificial intelligence-driven automated approach, you can envision how these measurements might be generated for CT scans performed for clinical reasons and provide incremental data to assist in risk prediction. In addition to muscle mass, other body composition features also change in patients with cirrhosis. We previously shown that cirrhosis patients have distinct changes that occur with evolution of liver disease. This table shows how multiple body composition features which are expressed as age and gender-adjusted percentiles compared to reference population. You can see that not surprisingly, amount of total abdominal muscle area is lower in cirrhotics at 35 percentile. And this is clearly reflected by the degree of liver disease. Child B-C patients had significantly lower total area, muscle area, and particularly in normal density muscle than earlier stage child A patients. Visceral and subcutaneous fat areas also decreased with advancing liver disease while fat density increased in the child B-C patients. Using these body composition features, we showed that we could build models which could outperform standard MELD in predicting survival in cirrhosis patients, particularly those with earlier disease where MELD is less predictive, such as patients with compensated and child A cirrhosis. Recognizing the importance of measuring body composition features, we sought to develop an automated method using deep learning. This abstract, automated measurements of body composition and abdominal CT scans using deep learning can predict survival in patients with cirrhosis is a poster presentation at this meeting. In this study, a deep learning model was constructed to train a pixel-wise semantic segmentation network. The training cohort consisted of a set of randomly selected, de-identified abdominal CT scans from our database where manually delineated outlines of the body at L3, lumbar 3, were available to serve as ground truth. The end was 12,067. You can see that with the outlines of the skin, fascia, and spine, you can actually create masks that allow you to measure the areas that represent subcutaneous fat, abdominal, skeletal muscle, visceral fat, and bone density. To test for the accuracy of the automated CNN method and clinical usefulness, we utilized a cohort of patients who were not included in the training cohort, and that is our cirrhosis cohort who were followed at the University of Michigan Hepatology Clinic and had a CT scan for clinical indications with manual delineations and measurements at L3. As you can see, the overall accuracy of all the different contours between the ground truth and CNN method was quite high, with a mean accuracy of 0.977 in the training set and 0.975 in the test set. As proof of principle for clinical usefulness, we utilized the automated body composition measurements to build prediction models. As you can see here, in the cohort of cirrhosis patients, the C statistics of MELD for predicting survival was improved from 0.66 to 0.71 with the addition of the patient's body composition measurements from their CT scans. This shows one way in which artificial intelligence can automate the process of measuring body composition in imaging studies and create added value for CT scans performed in routine clinical care. Another area of great interest for the use of deep learning in patients with chronic liver disease is a diagnosis and prognosis of HEC. Here, I will highlight three abstracts which are presentations at this ASLD meeting. The first abstract by Qiu et al is entitled High Diagnostic Performance of a Deep Learning Artificial Intelligence Model in Accurately Diagnosing Hepatocellular Carcinoma on Computed Tomography from the University of Hong Kong. Here, the authors utilize archived triphasic liver scans, CT scans and clinical information with manually segmented and labeled lesions as ground truths. They compared the CNN models to LIHRAD classification in 1,288 scans with 2,551 liver lesions. They reported a high diagnostic accuracy of 97.4% for HCC compared to non-HCC. LIHRAD scoring in the same cohort revealed a diagnostic accuracy of 86.2%. This study shows the potential of using deep learning methods to improve diagnostic accuracy of HTC and CT scans. The next extract, entitled Automatic Capacitive Cellular Carcinoma Detection in Non-Contrast in VENUS Computed Tomography in Serotic Patients, a three-dimensional deep learning-based approach by Cheng et al. at Tsongan Memorial Hospital in Lingkou, Taiwan. Here, 955 CT scans with biopsy-proven HTC were used. They reported 1,112 lesions in these CTs and split the set 1,005 over 1,107 for development and testing. They created what appeared to be a 3D bounding box around the entire CT. As controls, 61 normal CTs from trauma patients were used. In examining the AUROC for predicting HTC, the prediction model performed better if you used a combination of features from non-contrast and the VENUS rather than non-contrast alone, which is expected given the radiological findings of HTC. The third extract, which will be presented at this meeting, is Deep Learning Radiomics for the Preoperative Prediction of Microvascular Invasion in Hepatocellular Carcinoma, a multicenter and prospective validation study from Wei et al. from multiple centers in China. 444 HTC patients from three tertiary hospitals in China who underwent surgical resection were used. 329 were retrospective studies used for model training and 115 were prospectively obtained. The imaging modality study were selected five phases of MRI. Manually drawn regions of interest were used and radiomics features derived. They found that the arterial phase had the best predictive power for microvascular invasion with an AUROC of 0.865 for the training and 0.78 for the validation. The combination of all five phase radiomic features improved the signature, which performed quite well with an AUROC of 0.93 in the training and 0.802 in the test set. They noted that in a small number of cases where they had matched histopathology, there appeared to be some geographic evidence that the algorithm could pick up areas of microvascular invasion. While preliminary, these are very exciting studies which point to the promise of deep learning in helping us predict prognosis in HTC. While the technology is very exciting, we should also note a word of caution. A great example often cited is an algorithm to predict whether the picture represents a wolf or a husky. If I were to show you the results of this prediction model, you might think that I'm doing pretty well. The model correctly differentiated a wolf from a husky in 90% of the cases, being wrong in only one of 10 cases. However, what if I were to tell you that the feature that the algorithm is picking up has nothing to do with the facial features of the husky itself. Rather, the algorithm classifies huskies and wolves purely based on the presence of snow. So, if there is snow, it calls the animal a wolf. If there is no snow in the background, it calls it a husky. You can understand that when you have a method which consists of a black box, it is critical to assess the model. It is critical to have some understanding of how the predictions are made so that we can be sure to have a robust model which classifies images based on rational features. In summary, deep learning models can be used to analyze medical imaging data to provide incremental clinical information. Promising applications include automation of measurements such as body composition and other segmentation tasks which will allow for implementation into the electronic medical records. Utilization of deep learning to improve HCC diagnosis and prognosis is a promising emerging trend. For the data that I provided from my lab, I would like to thank the Morphomics Analysis Group faculty and staff, a collaborative of investigators from multiple fields including medicine, surgery, radiology, engineering, and data science. I would also like to acknowledge our funding sources. Thank you for your attention. Good morning and welcome to the Liver Meeting DX 2020. I'm very excited to be with you here this morning for the Emerging Trends Symposium and I look forward to a very fun and engaging and productive liver meeting. So first, I'd like to thank my co-chair, Dr. Gray Sue, as well as the other outstanding speakers who really gave us insight into how AI might be leveraged to help us care for our patients with liver disease. And what I'd like to do now is take a little bit of a step back and think about implementation of AI in a healthcare system. Not only its tremendous promise, but challenges that we need to be aware of if we want to achieve true success and implementation. So we all are aware that the U.S. healthcare system is broken and among developed countries, we rank the lowest in terms of quality. However, our per capita spend is the highest. And that projected per capita spend is only going to increase as demand for services is increasing. This is driven largely by our aging population, our increasingly unhealthy population fueled by the obesity epidemic, general medical inflation, and increasing medical technology. But unfortunately, the funding is limited. Funds available for healthcare are declining because Medicare is running out of money. There's fewer taxpayers in line with this aging population. State budgets for Medicaid are getting compressed and the burden of that cost is being shifted to the hospitals and the healthcare systems. This compensatory cost shift to commercial payers is unsustainable as employers are dropping coverage or trying to develop novel ways for them to manage their employees internally. And therefore, deficit and debt spending are continuing to increase for healthcare. So what are health systems like Mount Sinai doing in response? They are taking risk and they are investing in value-based care. Mount Sinai health system is comprised of the Icahn School of Medicine. We have a flagship academic hospital plus six community hospitals, more than 300 community care locations throughout the New York City metro area, more than 6,000 physicians on medical staff, and clinical affiliations that further our geographic reach. So with a population that's so complex, diverse, and across such a large geography, how do we manage that? We've launched a clinically integrated network of hospitals and physicians. We are shifting physician compensation from an RVU-based model to now a more outcomes and quality-based model. And perhaps most relevant to our conversation today, we are investing substantially in IT infrastructure to enable our care teams to manage this complex population, standardize and improve care processes for chronic illness and specialty care. At the present time, we have value-based contracts with all commercial health plans, and we have some full risk-based contracts for Medicaid and Medicare lives. We have over 400,000 lives in some form of risk-based contract. So it's absolutely critical for our success to decrease costs and improve quality. How do we tangibly shift to value and decrease costs and improving quality? We focus on reduction in unnecessary testing and resource utilization, the elimination of redundancy, early identification of those most at risk for utilization, intervention for those most likely to respond to therapy, and enhancing personnel performance and efficiency. In terms of quality, we focus on care standardization, variance reduction, outlier detection, and we're using connected devices and remote patient monitoring. So how can AI help us? From a health system perspective, we care for a large complex population, and therefore risk stratification and timely identification of impactable factors is paramount to our success. So if you take this large complex population, the ability to identify those at risk for disease development, and then within each disease, recognizing that there's variable progression and that there's a complex interplay between different diseases, variable impactability of interventions, and variable impact of social determinants of health, all of these things will help us enhance, extend, and expand human capabilities, allowing us to deliver the type of care patients need at the time and place they need them. And so what are the promising applications of AI in healthcare systems? As Dr. Su and Dr. Calderero shared, rapid evaluation, both in terms of radiologic and pathologic images, could augment a physician's work. Dr. Schattenberg shared how targeting patients most at risk for adverse outcomes, leveraging the EHR. Now, we can also throw in claims. We receive both medicine and pharmacy claims. And so now if you layer on claims, in addition to information that you have in the electronic health records, you're getting information that's being obtained outside the structures of your four walls, allowing you to even more refine your risk stratification. With the use of natural language processing to capture unstructured data elements, we can reduce physician burnout, given the over-dependence of machine learning models on documentation structured fields. In terms of improved quality, we want to deliver better, smarter, informed clinical decision support embedded in clinical workflows, and not just clinical workflows for the physicians. Imagine that the call center has information about patients who are most likely to miss their appointment. They can receive a personalized call the day before, not a robocall, to address any issues that that patient may have in regards to transportation, et cetera, to make sure that they don't miss their appointment. The registrar may know that the patient's phone number is inaccurate or it changes very rapidly because they're using temporary phones, and they can address that with the patient at the point of care. From a physician perspective, it will allow them to adhere more to evidence-based guidelines, and we'll be able to identify those who are not adhering to evidence-based guidelines. But let us take a pause because without recognizing the limitations or pitfalls, we could kind of go down the wrong path. So there are a number of challenges with applications of AI in health care, and they can be broadly broken down into four categories, model integrity, model applicability, model transparency, and model security. In terms of model integrity, first, we must be aware of the potential for machine learning algorithms to not only introduce bias, but to propagate systemic existing bias. In terms of introducing bias, we can think about social bias as algorithms where an inequity in care delivery systematically leads to suboptimal outcomes for one group, or statistical bias, where the algorithm produces a result that differs from the true underlying estimate, perhaps due to inadequate sampling, heterogeneity of effects, and measurement error in predictor variables. As we know, most clinical data is generated as a consequence of human decisions, so implicit bias in current care can be reinforced. So what are some examples of these types of biases, and what can we do proactively to avert falling into these traps? When you think about low sensitivity of risk scores in minority subgroups, in this example, it's Framingham risk score, but it could be MELD score in elderly women, or could be some non-invasive markers of liver assessment. The important point is that this could introduce statistical bias, where algorithm of the training sample differs significantly from the different populations of interest. How can we overcome this bias? Well, we could oversample minority subgroups in our training sample, or tailor predictions or scores for specific subgroups. What about delayed diagnosis of lung cancer or liver cancer in patients with low socioeconomic status who lack transportation access to the clinic? This introduction of social bias kind of unearths the underlying disparities in diagnosis, and alludes to the inherent social determinants of health that we need to be aware of. So we can create flags for model uncertainty in predictions for certain high-risk groups. What if we're missing data in the electronic health record because patients haven't followed up? This missing data results in both statistical and social bias, and so maybe we need to base our predictions on more upstream data at presentation of illness, and not on subsequent follow-up data. The ultimate goal is to achieve algorithmic fairness. How do we do that? Well, we have to recognize that there's bias in the training process. Human decisions feed back into the training model, so we need to deal with proxies and identify features that could be correlated with protected or lacking information, and retrain the models, and sometimes collecting new data may be better than trying to clean up messy historical data. So how can we reduce bias in AI, and how can we actually leverage AI to unearth implicit bias in our clinical decision making? So we can bring that to the attention of providers in real time. So for example, if a particular provider in certain circumstances is always deviating from the AI-based recommendation, we can unearth that and allow them to understand perhaps where they may have some implicit bias that is causing them to not adhere to the recommendations. Greater dependence on unbiased data sources is always important, and therefore maybe we need to focus more on upstream events prior to clinical decision making, like continuous vital signs, initial triage data. Wolf and colleagues have published a tool, ProBased, which is a set of questions which allows one to assess the risk of bias in the prediction models, and therefore to select appropriate training sets for algorithm development. We must continuously track to ensure outputs are not reinforcing existing social bias, and I've already mentioned that oversampling underrepresented populations is critical to mitigate statistical bias. Now what about model applicability? Of course we want to make sure that it's applicable to diverse populations, and by enriching with minorities in the training algorithm will allow that to happen. But what about applicability across different radiology or EHR systems? If a model is built on one radiology system, can that be applied to another radiology system? In terms of natural language processing, we know that there's different ways that clinicians might say something, and so if our algorithm is too narrow, we may miss disease, and if it's too broad, the model may be full of too much noise. And of course new and rare diseases may be missed. So what about model transparency? Transparency to the patient, transparency to the provider or the health system, the employer, the insurance. Inherently, people distrust an algorithm when it deviates from their expectations and appears to be disadvantageous to them. Providing detailed information can sometimes help to align human expectations with eventual outcomes. Too much transparency, however, may provide more opportunities for objections empowered by confirmation bias. Those who are not disadvantaged by a decision have little incentive to delve into the details of how the decision was made. This becomes particularly relevant when you talk about proprietary algorithms that are black box. There's a clear risk of harm when applied to clinical practice without oversight. There could be patterns that don't make sense that lead clinicians to distrust it, and you can imagine if a patient is told to do a certain care plan and when they ask you why and you don't know how that recommendation came to be, there may be less patient engagement with the care plan, again, coming back to distrust. No matter what model we choose, they must meet accepted standards for clinical benefit, just as clinical therapeutics and predictive biomarkers have to. There are consequences for false positives and false negatives, and therefore, we must develop hazard prevention protocols. Dual safety mechanisms for crucial clinical processes, such as medical diagnoses and treatment decisions, but ultimately, the hope is that doctors make the decision but are informed by data-generated insights. Now, what about model security? We already know we spend a lot of time thinking about the privacy of healthcare. Now, there's large sharing of data across multiple stakeholders, including insurers and employers. In fact, insurers are working very hard to get access to all of our electronic medical record systems. They already have claims, both pharmacy and medical claims. You can imagine that they are running a number of algorithms to try to identify those patients at highest risk of utilization. You can also imagine that that could cause premiums to go up for patients even before they've had any kind of disease or outcomes. We must protect that information and be aware of this as we move forward in this area. Of course, there's going to be incentives for adversarial attack. I hope that I have convinced you that there is tremendous, convinced or we all can understand that there's a tremendous potential for the use of AI in healthcare, but recognizing bias and limitations proactively will improve applicability. Titrating transparency and security will be critical. From Dr. Bopp, we learned that AI is different things to different people. The applications are tremendous and the interest in AI is driven largely by the increased adoption of electronic health records, the diverse health data that we get from omics to social media, the availability of these large data sets and the drive for personalized medicine. Because of the heterogeneity and complexity of liver disease, hepatology is a perfect arena for AI applications. Dr. Schattenberg shared how the EHR can help with disease detection and risk stratification at the point of care. Dr. Calderaro shared how AI can tremendously augment the value of liver pathologists. Dr. Hsu shared how powerful radiology can not only be in disease detection, but risk stratification for patient outcomes. I come back to the key critical feature for healthcare systems, risk stratification. Whether you are a patient with liver disease or other chronic conditions, AI has the ability to enhance, extend, and expand human capabilities. I think now all of us will be here in live chat so that you can ask questions of all of the speakers. With the limited time, of course, all of us are available via email and we'll be happy to answer any questions now or later. With that, I thank you so much for coming. I really hope you enjoy the rest of the liver meeting. Thanks very much.
Video Summary
The symposium on artificial intelligence (AI) in hepatology showcased the increasing role of AI in healthcare, specifically in managing chronic liver disease. Presenters highlighted AI's potential applications in liver disease care, such as imaging and pathology, to improve diagnostic accuracy and streamline workflows. Speakers demonstrated how AI can predict clinical outcomes, analyze histological images, and advance personalized medicine in liver disease management. The session emphasized AI's transformative impact on clinical practice, decision-making processes, and patient care in liver disease. The video also discusses AI's potential in pathology and healthcare, focusing on liver disease and hepatocellular carcinoma. It explains the applications of AI in medical imaging analysis, like CT scans, and addresses the importance of addressing bias in AI algorithms for fairness and accuracy. Additionally, it mentions challenges and considerations in implementing AI in healthcare systems and the potential benefits of AI in automating measurements, identifying high-risk patients, and improving clinical decision support. The speakers stressed the importance of leveraging AI to enhance healthcare delivery and patient care while acknowledging the complexities and limitations of AI in the medical field.
Keywords
artificial intelligence
AI
hepatology
healthcare
chronic liver disease
diagnostic accuracy
imaging
pathology
clinical outcomes
personalized medicine
medical imaging analysis
bias in AI algorithms
×
Please select your language
1
English