false
Catalog
2021 Webinar: Challenges in NASH Clinical Trials
Challenges in NASH Clinical Trials
Challenges in NASH Clinical Trials
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello, everyone. Very happy to welcome you here for this ASLD podcast, organized by Drs. Mazen Nureddin and Naeem Alkhoury, and that I'm very honored to chair today. So we have a very challenging program related to drug development in NASH, and specifically tackling some of the hottest issues in the field. And we're very happy we have three experienced panelists who contributed each one individually and through all their collaborative work to moving this field forward. And we're very happy that they can together be with us today and share their thoughts through a formal presentation and then through a question and answers section. So the first talk will be given by Dr. Naeem Alkhoury, who is currently director of the Fatty Liver Program at Arizona Liver Health. And his contribution to the program today will be to understand what are the challenges in using histological assessment for clinical trial endpoints in NASH trials. So Naeem, we're looking forward to your presentation. Thank you, Dr. Aratzio and ASLD, for giving me the opportunity to present today on the challenges in histological assessment in the context of NASH clinical trials. These are my disclosures for this presentation. So what are the expectations from liver biopsy-based histologic assessment in the context of NASH trials? So number one, we want to have a liver biopsy that can accurately and reproducibly identify patients with fibrotic NASH. So this is the target patient population that's in need for pharmacologic treatment. We also need the biopsy to accurately identify a response to treatment, which is NASH resolution and fibrosis improvement, typically by one stage. We want the biopsy not to kill anyone in the process. And I hope that through my presentation that I can convince you that liver biopsy and the need for histologic assessment fails to meet each of these tasks. So let's start by identifying patients with fibrotic NASH that we want to enroll in clinical trials. Now we all know the NAFLD spectrum. We know that we are looking for patients with fibrotic NASH. And this is defined as having enough activity as determined by the NAFLD activity score. So we want the NAS to be four or higher and the presence of significant fibrosis defined as F2 or F3 fibrosis. To do this on biopsy, we look at steatosis, inflammation, ballooning grades, and then we look at the stage of liver fibrosis. The problem with what we have today based on histologic assessment is the fact that it's semi-quantitative. So we grade steatosis and lobular inflammation from 0 to 3. We grade ballooning from 0 to 2. We add these numbers up and get the NAFLD activity score that ranges from 0 to 8. And then fibrosis is scored from 0 to 4. But this semi-quantitative grading creates some issues and does not reflect the continuous nature of the disease. So as an example, you can have two patients with steatosis of 35% or 30% on their biopsy. And the first patient will be graded as grade 2 steatosis. The second one will be grade 1, despite having only 5% difference in the amount of steatosis. And then you can have two other patients that have 65% and 45%, so 20% difference. And both of them will have a grade 2 steatosis. And this is an example that we dealt with from two pathologists looking at the same slides to assess NASH histologic severity. So pathologist 1 gave this biopsy a NAFLD activity score of 5, 2 for steatosis, 1 for ballooning, and 2 for lobular inflammation, and then fibrosis stage of 2. But then pathologist 2 gave it a NAFLD activity score of 4. And here you can see that it was 0 for ballooning. And in some of the clinical trials, you need to have one point of each of the components of the NAFLD activity score. So having a score of 0 for ballooning will exclude this patient from participating in a clinical trial, despite having significant inflammation and stage 2 fibrosis. And what I'm showing in the slide on the left is, you know, the ballooned hepatocytes and how they can hide in plain sight. So it's not difficult for pathologists to miss a few ballooned hepatocytes within the context of NASH and having these steatotic hepatocytes and lobular inflammation. And this creates a lot of issues for us, and disqualifies several patients from participating in clinical trials. So to look at this in a more systematic way, this is a study conducted by Dr. Samer Gowri from Indiana University, where he had two pathologists read 65 biopsies from patients with NAFLD NASH. And he had each pathologist actually reread the slides. So we had a total of 260 readings. So when he had the same pathologist looking at the same slides at two different time points, this is what we call the intra-observer agreement. And you can see on the left side that there was okay agreement for steatosis, for fibrosis, in the range of 60% to 70%. But then the intra-observer agreement is definitely lower for ballooning, inflammation, and the diagnosis of NASH. Then they had the two pathologists compare their readings to evaluate the intra-observer agreement. And a similar picture emerged here. A little bit lower agreement, but still better agreement for steatosis, fibrosis, and less agreement on liver inflammation and ballooning. This is a slide summarizing the studies that included more than one liver pathologist interpreting NAFLD histology. You can see the number of observers or pathologists that read the slides and the number of cases. And again, similar picture. There's agreement for macrovesicular steatosis, fibrosis, less agreement on ballooning, and the diagnosis of NASH. So definitely the intra-observer agreement, as assessed by the Kappa coefficient, is low. And this is not what we want when we are relying on the pathologists to tell us who has significant disease and who needs to be included in a clinical trial. And in this slide, I'm showing data from the Eminence program. This was a clinical trial assessed in insulin sensitizer as treatment for NASH. And in this study, we had 339 patients that had biopsies at baseline to qualify for the trial and then biopsies at the end of treatment to assess for response. And we had pathologist A was the original pathologist that read all the slides and qualified patients for entry into the trial. At the end of the trial, we actually mixed the baseline slides with the end of treatment slides and we had the same pathologist, pathologist A, reread the entry slides. And to our surprise, we noticed that the pathologist was systematically scoring the biopsies as having lower NAFLD activity score, as having less NASH compared to when he read them at baseline to qualify the patients to enter into the trial. So the same pathologist looking at the same slide had low intra-reader Kappa coefficient. And because of this, we wanted to look at this in a more systematic way. So we had two other pathologists, pathologist B and pathologist C, reread the slides. And when you look at the study histologic eligibility criteria, they were met on the reread in 77% by pathologist A. So the same pathologist that qualified the patients would have disqualified 23% of patients from entering the trial. And then pathologist B agreed with pathologist A in 69%. And then pathologist C agreed in 77%. In fact, if you needed all three pathologists to agree on the entry criteria for the trial, they would have agreed in only 53%, which leaves about 46% of patients not meeting the entry criteria. So here you can see the dilemma we have in clinical trials. We have, you know, pathologists qualify the patients at entry, and then you have them read the slides at the end and they say this patient didn't qualify. This creates a lot of issues for us. So that was the first part, showing you that liver biopsy is not as accurate as we want to qualify patients to enter into the trial and to determine disease severity at baseline. But how about using liver biopsy and histologic assessment to identify response to treatment, what we call NASH resolution and fibrosis regression. And these are the endpoints for clinical trials that will qualify for FDA drug approval. We need to achieve NASH resolution without worsening in fibrosis. So this is mainly looking at steatosis and activity. And the other point is fibrosis improvement by one stage and worsening in NASH. So this is looking at the fibrosis stage where you can meet both of these endpoints. So this is what we want the biopsy to identify accurately for us, NASH resolution and fibrosis improvement. And here I'm showing again data from the Eminence program looking at the accuracy of liver biopsy in identifying NASH resolution. And you see the comparison between pathologists A to B, A to C, and then B to C. And you can see in the table the low agreement between pathologists. This was determined by the unweighted kappa coefficient that ranged from 0.32 to 0.49, so relatively low kappa. And again, this is to look at NASH resolution as an endpoint in a clinical trial. A similar picture also emerged when we looked at fibrosis regression. Prior to this study, I thought that the pathologists would agree more on fibrosis regression. But to our surprise, you can see also low unweighted kappa agreement. So this ranged for fibrosis improvement by one stage from 0.29 to 0.41. So you can see that even when it comes to identifying response to treatment, that the pathologists don't always agree on what constitutes a response to treatment. So key conclusions from the Eminence program is that liver biopsy interpretation is not accurate for both entry into the trial and for endpoints in terms of NASH resolution fibrosis improvement. And this reliance on semi-quantitative histology can lead to effective biomarkers and treatments to be deemed ineffective just because of the variability on biopsy. And we really don't have any evidence that one way of biopsy reading is better than another. So by increasing the number of readers, by having sequential reading, having pathologists read the slides together, we still don't have any data that doing any of this will improve the agreement. Can we improve histologic assessment? Maybe we can utilize artificial intelligence and machine learning to rescue us. The idea here that you would assemble a scientist team that includes computer scientists, pathologists, hepatologists, digitize the NAFLD liver biopsies, have the pathologists annotate the major findings from steatosis to ballooning, inflammation, fibrosis. And then you develop your algorithm or model and then see how it performs in comparison to the labeled data by the pathologists. And then eventually you take it to external validation where you have the algorithm read the slides that are not labeled. And this is what PathAI achieved in this slide. This is data from the STILL-R4 program that included 644 patients with biopsy proven disease. They had 75 pathologists read the slides and annotate the important findings. And then they developed the algorithm that was compared to the performance of the pathologists. And this is what happens when you unleash AI on a liver biopsy. So you see in panel A is the H&E staining for a NASH patient. And then in panel B is what you get with the algorithm where each histologic component of NASH is colored differently from steatosis to inflammation to ballooning to other findings. And then you can actually quantify each of these findings on a continuous scale. This is data from HISTOindex also utilizing their technology called second harmonic generation to provide better quantification of liver fibrosis with the idea that by using this we're going to have better discrimination between early stages of fibrosis and hopefully decrease the inter-observer variability. So promising technologies to decrease the inter- and inter-reader variability. So the current state of histologic assessment, it's manual, semi-quantitative, lots of variability, and we really have a limited number of experienced pathologists to help us. Hopefully in the future we can have an automated way to fully quantify all the findings and this will be available on a large scale. The main issue with all of this remains that we still need to do a liver biopsy, which is invasive. And this is from a recent meta-analysis where we looked at recent studies that reported complications from liver biopsies between 2010 and 2020. This analysis included 64,000 patients and we showed that incidence of major complications was at 2.4%, mortality was at 1 in 10,000, which is still significant for a disease that affects 25% of the adult population, and hospitalization rate at 0.65. And you see complications range from hematomas to hemobilia to pneumothorax, so the usual complications from liver biopsy. So what happens if we continue to rely on histologic assessment for trial design and later on to qualify patients for treatment? Well, here in the figure on the left, I'm showing recent data on the prevalence of NAFLD and NASH in the U.S. middle age group, and we showed that NAFLD affects 38% of adults and NASH affects 14% of adults. Currently we're not going to biopsy everyone with NAFLD, so we may implement an algorithm to find high-risk patients, as I show on the right side, using non-invasive tests such as FIP4 and ELF score. But even with this, you still have about 19% of individuals that need further assessment. If we do biopsy on 19% of patients with NAFLD, where we have 112 million Americans age 50 and older, so middle age, 42 million will have potentially NAFLD, and then if you do 19% of them that need referral to a specialist and you biopsy them, you're going to end up with 52,000 hospitalizations and 800 deaths just for the initial staging of these patients. So obviously this is not what we want on a large scale. So I'm going to conclude here and say it's time to say goodbye to liver biopsy. I know this is going to be hard for some of us to accept, but I think I showed you that liver biopsy has issues in terms of entry criteria for clinical trials, major issues with inter- and intra-observer variability, and also no agreement on what constitutes a response to treatment and the invasive nature of liver biopsy. So with this, I want to conclude and thank you for your attention and hand it off to Dr. Abdul-Malik and Dr. Nooruddin to tell you what we can do to replace liver biopsy in clinical trials. Thank you very much, Naeem, excellent talk, and this raises the challenge of trying to circumvent liver biopsy and the information on histological surrogates and move forward implementing hard clinical outcomes for the NASH trials, which brings us to the subject of the second presentation, which is how can we implement hard clinical outcomes in NASH clinical trials? Can we do it now? And this presentation will be given by Dr. Manal Abdul-Malik, who is Professor of Medicine and Director of the Nafl Clinical Research Program at Duke University. So Manal, looking forward to your presentation as well. Thank you, Vlad, and I'd like to thank the program organizers for the opportunity to speak and the AASLD also for this invitation. Here are my disclosures. So hard outcomes in NASH clinical trials, clearly the goal is to effectively treat patients such that we can achieve outcomes, which are also our regulatory outcomes, optimizing how patients feel, function, and survive. And these outcomes has been defined as resolution of NASH with the understanding that the presence of NASH is the very strongest predictor of fibrosis progression and that improvement in fibrosis is actually the strongest predictor of ultimately what we care about which is morbidity and mortality. So the drug development pathway emphasizes clinically meaningful benefit based on how patients feel and we're challenged here with NASH because for the majority of patients this is an asymptomatic disease. Patients however can report quality of life metrics which has been studied and could be a reasonable hard outcome for patients. We also can measure how patients function based on quality of life assessments and change from what they perceive as normal functional ability and ultimately how patients survive which is the focus of all cause mortality liver related outcomes such as ascites, variceal bleeding or even the rates of hospitalization and health care utilization. We've come to understand that fibrosis and really only fibrosis not hepatic steatosis not steatohepatitis is the primary predictor of not only liver related outcomes but transplant free survival and these metrics are inversely proportional so that the more fibrosis the lower likelihood of survival and fibrosis regression can alter the natural history of disease. So there are clear challenges in achieving these hard outcomes for NASH. There is variations in disease progression that are altered by comorbidities such as diabetes or cofactors such as genetic polymorphisms and PLA3. There is heterogeneity in the patient cohorts that we select to participate in clinical trials and I'll get into that shortly. There is a long natural history that sometimes spans decades to disease progression which really ultimately leaves us at patients who are fairly advanced in their disease or even have cirrhosis to even achieve a clinically meaningful outcome in the context of clinical trials and and we're forced to select patients with F3 and F4 fibrosis to reduce the time to event such that we can capture these endpoints. So there clearly is a need to utilize surrogates that correlate to risk and can predict a clinically meaningful hard outcome. For all the reasons that Naeem eloquently presented to you, liver biopsies fraught with challenges. Not only is it invasive but there is sampling variability. Here I show you a slide that shows stage 2 fibrosis but interestingly depending on the tranjectory of that needle that same liver could have been read out as having stage 1 fibrosis or potentially even as being interpreted as stage 3 fibrosis if it cross it comes across big septa. Not to mention the intra and intra observer variability and interpretation as was nicely already pointed out and the fact that the natural history of this disease while we use categorical and ordinal variables to raid and stage it this is not a linear disease process. So one of the other limitations is that we select for our clinical trials based on an outcome of disease. The presence of end organ injury by a liver biopsy but in fact this is only an outcome that really is depicted by multiple different drivers of disease whether it be insulin resistance or dyslipidemia or genetic polymorphisms or even confounded by social alcohol use or concomitant medications. The liver biopsy is an outcome not the disease in and of itself and so patient heterogeneity poses problems. It poses problems such that we require significantly large sample sizes to overcome the heterogeneity of patients and with that can come decreased drug effectiveness, the inability to achieve meaningful outcomes or fewer events in the context of four trials and then we expect the biomarker to be one size fits all but yet in this context biomarkers can perform poorly and then there is even the challenge with a lack of more uniform or un-uniform I should say time to event relationships. What in fact we need to move towards is molecularly and genomically phenotyping and profiling our cohorts of patients beyond histology alone such that we can develop smaller sample sizes for our trials, optimize drug efficiency and effectiveness for an endpoint and be able to enrich our studies not only for the patients and and their unique pathogenic drivers but for the concordant drugs that will treat them based on those drivers and achieve clinically meaningful endpoints in a more timely fashion and this will also allow for our biomarkers to perform more optimally and for more uniform time to event assessments. The challenge is the hierarchy of clinical trial endpoints while in early phase trials we can utilize biomarkers or MR based quantification of fat or elastography and on the other extreme we can achieve meaningful clinical endpoints of time to transplant, survival metrics or decompensation in patients who have advanced liver disease. This intermediate stage of NASH and even non-sorotic NASH poses a challenge because in this category we've defined and hindered outcome measurements on histology and as I showed you here fibrosis is the primary predictor of an outcome in this has been beautifully pointed out in many studies that we're likely to achieve such outcomes if patients have cirrhosis or advanced hepatic fibrosis entering into trials. But drill down into the details in a recent presentation by Arun Sanyal at the ASLD kind of qualified this for us a little bit more even if you were to select patients who clearly had fibrosis even based on surrogate endpoints such as liver stiffness or fib 4 because it's unethical to biopsy patients with known fibrosis. Take a look at the event rate in child's A, child's B and certainly it increases in child C but overall the proportion of patients with child's A cirrhosis who may have a survival outcome in a short interval of two to three years is low and likewise cardiovascular related outcomes or liver cancer. So implementing hard outcomes in clinical trials really is that even feasible for us and we have to pause and ask how and I would challenge the field and those who are doing research in it that maybe the approach we've taken to date is is misguiding us in able in being able to achieve these outcomes in NASH trials. We currently have considered this as developing surrogates to biopsy for NASH grading and NASH staging such that that surrogate can render a meaningful outcome. So we're developing surrogates for a surrogate and as Naeem eloquently pointed out we currently have challenges with the quote gold standard surrogate. I believe that the non-invasive biomarker could in fact serve as a predictor for an outcome and I will share with you data that suggests why. So maybe we need to shift our point of view that yes while liver biopsy is a surrogate to a clinically meaningful outcome that non-invasive biomarkers can in fact serve as comparable surrogates to those same outcomes and allowing us to move away and and extend ourselves beyond liver biopsy alone is achieving these endpoints. Now there is data to support that non-invasive biomarkers are as strong of predictors for liver outcomes as liver biopsy and in this study which was published by Paul and Google many years ago that looked at predictors of an outcome based on biopsy but which was subsequently applied the fib four score you could see that fib four scores can correlate to liver histology in its ability to predict not only liver biopsy but also liver related death and time to transplant and the transplantation event and likewise the naffled fibrosis score could predict not only severity of histology but the probability of death and liver transplantation and likewise there's data that was recently published from the Symtuzumab studies that applied the ELF score not only to progression of stage 3 to stage 4 fibrosis progression to cirrhosis which is a registration endpoint but also decompensating events and the ELF score was able to stratify out patients who progress to cirrhosis and also at a higher cutoff of 11.27 those who ultimately developed a liver related decompensated event this also applies to evidence that is evolving with transient elastography so a nicely done study by Jerome Bosier which was 360 patients with naffled which were followed for a duration of six year suggested that liver stiffness at baseline as well as over time could stratify patients based on their stiffness measure not only for survival but also for liver related complications and that the baseline nits the higher the baseline and also the change from baseline as applied to the Symtuzumab studies could predict liver related fibrosis clinically meaningful outcomes and so while evidence is evolving it certainly points into the direction that these biomarkers could be concordant to biopsy and a meaningful outcome and due to the innovative work of Rahit Limbaugh we've come to realize that MR-PDFF and a threshold of greater than 30% reduction from baseline is associated with a two-point improvement the naffled activity score NASH resolution and improvement in the histologic components of NASH including phalantepatocytes which I must say is the primary predictor of fibrosis progression there is emerging data that a threshold by MR-PDFF of greater than 30% relative reduction can also be associated with the higher odds of fibrosis regression and that the higher the threshold beyond 30% reduction in fact renders even higher odds for NASH resolution and as you can see if we achieve a 50% relative reduction in MR-PDFF that incurs 25 fold increased probability of NASH resolution compared to 9 to 18 fold. A beautiful study that came out of Mayo Clinic by Dr. Gidiner and Alina Allen's group demonstrated that MR elastography can predict future cirrhosis, decompensation, and death and that for each one kilopascal increase in liver stiffness by MRE non-cirrhotic naffled subjects are three times more likely to develop cirrhosis in the future and for every one kilopascal increase in liver stiffness by MRE patients with NASH cirrhosis are 32% more likely to develop decompensation and or die in five years. And as you can see here there is an increase in probability as liver stiffness increases by MRE and there's almost a an incremental dose dependence increase in the probability of hepatic decompensation as liver stiffness by MRE increases. So how do we implement hard outcomes in NASH trials? I really think it's time that we embrace a new approach. The old way is traditional parallel study designs a heterogeneous cohort histologic endpoints pre and post. The increased risk associated to this invasive histologic endpoint not to mention the increased cost and because of that we've been bound by smaller sample size of patients keeping in mind that it takes a very altruistic patient to volunteer for a clinical study for which there is an invasive procedure before and after and many and at times even at incremental markers down the road so that we introduce a recruitment bias and certainly that we can introduce interpretation bias with pathologic assessments. And we're bound to shorter duration trials. Maybe we should look at this differently and adaptive longer-term trials maybe even platform type designs should be a consideration. Molecularly and genomically phenotyping our patients for an enrichment of an outcome and a phenotype and using non-invasive endpoints both to risk gratify patients at baseline and follow them over longer periods of time would change from baseline. That way we can decrease risk potentially decrease cost and by doing so allow for ourselves to use non-invasive measures which could even allow for a representative of cohort enrolling in the clinical studies that is representative of the population at large and increase sample size. We can minimize bias because it doesn't require the altruism of undergoing serial liver biopsies for enrollment to our into our studies and allow for longer-term endpoints. However, this approach has to be founded in the belief that myths are equivalent to biopsy as surrogates for these outcomes. And I'll summarize by saying implementing hard outcomes in NASH trials is challenging due to the many factors that I pointed out prolonged compensated phase, heterogeneous populations, insensitivity of the surrogate of histology and those that are closer to an outcome are harder to predict a priority. More advanced patients who are closer to decompensation reach these outcomes more quickly but they may be outside the therapeutic window and it's not the cohort we're most interested in preventing an outcome. Resolution of NASH has yet to be associated with an hard outcome and we really need to focus on surrogates of fibrosis. And so I believe we need to make that leap of faith that non-invasive biomarkers are as adequate as biopsy for all the reasons I've shared with you in the recent data for predicting a clinically meaningful outcome. Thank you. Thank you, Manal. Very challenging concepts that you described here so we're looking forward to the general discussion at the end. And now since we have these issues with implementation with reading the biopsies on one hand on the first talk and the implementation of very long trials with hard clinical outcomes on the other hand as you just explained, maybe we should consider how non-invasive biomarkers can be directly used for as primary endpoints for drug approval. And in order to discuss this we're very happy to have Mazen Redin amongst us who is the director of the fatty liver program at Siddharth Sinai Medical Center and working forward here again to see how Mazen sees these problems. So Mazen, it's all yours. Thanks a lot, Vlad. So those are my disclosures. And my task is to talk about this NITs and how we can use them nowadays for a clinical trial and how to predict how we can approve a drug based on those. So actually, I'm going to set the stage for my talk, but thanks to Dr. Abdel-Malik and Al-Khoury for giving excellent talks that helped my talk a lot. But just a few reminders before I start my talk. So let's start with the logic things. So Naeem showed this slide already. What do we need today for drug approval for NASH F2 and F3 patients? It's NASH resolution without fibrosis worsening, or fibrosis improvement by one stage without NASH worsening. And a recent document published by the FDA, a paper nicely written by Dr. Frank Anania in hepatology showed those. And of course, if you get both of them at the same time, you will get drug approval. Now, let me give you an example of two drugs recently presented and one of them published, the semaglutide that achieved natural resolution, very nice results, but did not achieve fibrosis improvement statistically. And the leniferbin or leni actually achieved both natural resolution and fibrosis improvement. And maybe at the end of this talk, I can design a phase three study for these two drugs without a liver biopsy. Sorry to our liver pathologist on the call, but we can try. But before we get there, let's talk about some principles. If you want to talk about NIT, you have to understand what you're dealing with. You have to know what's your drug, the mechanism of action. So there are medications. This is the landscapes of the drugs we have nowadays. There are medications that deal with steatohepatitis and there are drugs deal with fibrosis. So you need to know what's your NIT, non-invasive task dealing with. Is it steatohepatitis or fibrosis or both? So that's something you have to keep in mind. And this is why we're all here today as hepatologists. No matter what's the liver disease, viral hepatitis or NASH or whatever, what we're all trying to accomplish as hepatologists that our patient don't get to decompensated sources or liver failure and death. And that's why the FDA said, we think that liver biopsy in NASH patients can tell us if improvement in liver biopsy is meaningful outcome that can tell us if you improve things on biopsy, you're likely moving away from decompensation, liver transplant and death. And to make things a little bit more complicated, there's the cardiac issues in NASH trials that you don't want the drug also lead to worsening cardiac outcomes. So something to keep in mind. So indeed, let me show you another things with another document from the FDA with the hepatitis C. And I'm not trying to compare viral hepatitis with the metabolic disease, but the principles. So the efficacy endpoints for hepatitis C drugs, it's the SVR and it used to be 24, SVR 24, now it's 12. And this is because the SVRs correlates with clinical outcomes such as development of HCC hepatic events, fibrosis and all cause mortalities. So this is important because they found a surrogate that correlates with outcomes. And they indeed, they commented on the outcomes that they hard to achieve and it doesn't make sense to do a study on outcomes. So they accepted this surrogate models. And Dr. Abdel-Malik alluded nicely that, well, where we were back in the days, that liver biopsy was a surrogate for outcomes. And we used to think that the NITs either imaging or serums is a surrogate of a surrogate. But maybe we are today and I do agree with her that the surrogate of a surrogates are now actually a surrogate of an outcome. So let's get there. So let's dive deeper and I'm gonna build on Dr. Abdel-Malik's presentation. So let's talk about the NITs in not in all details, in some details. So this is a slide showing the serum biomarkers and I'm sorry, I don't list all the biomarkers out there. I'm listing the most commonly used and the most data. So this is what you need in actually in all NITs. Three things, as my friend, Dr. Heller said in a recent webinar, you need them to show the slope that they collate with S severity, L longitudinal changes and O outcome. So for the S, the severity, the FIP4-Nafl-Fibrosis-Co-Cro-C3-Alf-ALT metabolomics. I'm not gonna talk about how they correlate with severity because you know how they show that they collate with histology very well. Dr. Abdel-Malik already talked to you about how they correlate with outcomes, which I showed you that the FDA is looking for that they collate with clinical events. Let me show you how they actually assess things longitudinally. So ALT, ALT is a very useful biomarker in NASH clinical trials. This data from Flint trial showed changes in ALT by 17 units that happened within six months correlated with steatohepatitis changes histologically. So our old friend that we have used all along, if it changed in the right direction with the treatment can be useful. The FIP4 score, a serum biomarker, this data from RegenerA trial showed that if FIP4, if histology improves, FIP4 moved the right direction. If histology worsened, FIP4 worsened as well. And if it doesn't change, FIP4 does not change. Cro-C3-ALT, and those are example from two study, the Madrigal or resmeterone drug, as well the NGM. Both of them, they improve with the treatment and both of them, they don't improve or move the opposite direction with the placebo suggesting both Cro-C3 and ALT can be used longitudinally with a treatment. So let's summarize the serum biomarkers. They all assess severity, they all assess longitudinal changes and correlation with outcomes already have been presented at the previous talks. Metabolomics, I don't talk about them much, but a recent sarcoplasmos study showed that they actually assess things longitudinally just published in hepatology. Let's move to imaging biomarkers and let's see if they do the slow thing, severity, longitudinal changes and outcomes. And I'll just give two example of the severity assessment because it's important. The MRE has been shown over and over that assess the severity of the histology very well. So as the transient listography, just a point on the CT1, multi-parametric recent data started suggesting that it correlates with steatohepatitis and fibrosis on histology. So it's upcoming data and it's very promising. Moving on the longitudinal changes, MRI-PDFF now is the most commonly used primary endpoints or efficacy endpoints of phase IIa. It started as a hypothesis, now it's shown in the Flynn trial that it correlates with steatohepatitis. Changes in the madrigal trials, the resmitterome also showed it correlates with steatohepatitis and meta-analysis have shown the same thing. So the data is out there. VCTE also, data from the regenerate was presented. If histology improves, VCTE improves the stiffness. If histology worsened, VCTE worsened. And if it stays the same, VCTE worsened. And I'm gonna add two outcomes as well that Dr. Abdel-Malik presented. This data is fresh out of the oven and I'm adding it because of this 20% that has been suggested in clinical trials. We said 20, 25% changes. We have been looking at it here. This 20% changes from multicenter European studies showed that it actually correlates with outcome changes in fibrous skin and liver stiffness. So it's not just correlating with longitudinal histological changes, it actually is also correlating with outcomes, suggesting this changes by 20, 25% are very useful in clinical trial setting and eventually in clinical practice. The CT1, a small study with the obetacolic acid showing here on the top turning from red to green showing improvement in the OCA arm versus in the placebo, it doesn't change. That's a small arm. And the CT1 on the MGM also has shown improvement in the treatment arm versus the placebo. So the CT1 signal is also out there longitudinally suggesting its usefulness. Dr. Abdel-Malik showed you that MRE data correlate with outcomes. That was longitudinal study from the Mayo Clinic. This is a study that was done from our group. It is not longitudinal study. It's cross-sectional study also showing correlation with outcome change. 1 kPa correlates with decompensation by odds ratio of 3.28. The reason why I put this here is because it's inclusion criteria for cirrhotic study. So for decompensation, MRE caught up with 6.48 and for cirrhosis was about 4.39. So you would imagine in a cirrhotic study, you can put your entry criteria between 4.3-ish or 4.4 and 6.5 to define your group for entry criteria without a liver biopsy using MRE. So let's summarize the imaging. So this concept of the slow VCT assess severity, longitudinal changes correlate with outcomes. MRE assess severity and outcomes. The data on longitudinal changes, meaning the regression, meaning MRE improved with histology is emerging. There are some data from resmitterome. We need more. CT1 just started showing data. We need more. And MRI-PDFF, I should put two check marks here on longitudinal changes because now there are very good data showing it is, longitudinal assessment is very good. So I'm gonna finish with this of NIT, the concept of identifying the group at risk, which is the NASH with NASH more than four and F2. And you heard about the FAST and NIS4. I'm gonna tell you about one more. And this is very much what has been in the last three, four years, we have been using on histology to identify a group at risk when we choose them for phase three trials, which is NASH more than four with NASH and F2 and higher. You know about the FAST score by now, which is using the FIBROSCAN, the CABS-TI2, stiffness and the AST. One problem with it is the people that they fall in the gray zone is about 30% of people. The NIS score also identifying the group at risk. It's a serum biomarker uses micro RNA, as well as hemoglobin A1C. It also has this group of the 30% gray zone area, but it's good for identifying a risk group. And in our group and along with Dr. Cory and Harrison, we have this NASH score. It's similar to the FAST score. It's now submitted for publication. It uses MRI instead of FIBROSCAN, PDFF, MRE and AST. And it's very accurate with lesser gray zone. And it makes sense because it's a more eye-based technique and it identified those with NASH, NASH four and more than F2. So as a summary for this test that identifying a risk group, we have the FAST, NASH four and MAST. There's also metabolomic one. And FAST has been already being used on longitudinal assessment in some of the studies as posters and abstracts. So let's get creative. Some people might say it is, let's get controversial. I'm gonna design trials based on NITs, but I think it's worth talking about it at least now. So I do agree with Dr. Abdel-Malik that now the NITs are surrogates of outcomes, not a surrogate of a surrogate. But if you want some reassurance that you are designing your trial with NITs based on solid foundation, let's set some rules. I call it the MAC here. Consider the mechanism of action, use accurate NITs and combine more than one NITs to give them some reassurance. So let's design the sumaclutide phase three study as just an example. So it's a steatohepatitis drug. Again, they did not achieve fibrous statistical significance. We can do, I use the primary end point, efficacy end point as more than 30% reduction in MRI-PDFF and ALT reduction and ELF. ELF, I remind you, it correlates with outcome. And they actually achieved that in their clinical trial. So if there is hesitancy about just MRI-PDFF alone, we're combining multiple end points. Of course, there's consideration about composite end points, the p-value, but it's still worth trying and replacing the liver biopsy. It's better for the patients, better for a lot of things. Entry criteria can be one of these, the scores that I told you about. Fibrous can count more than 300, stiffness more than eight, and AST more than 40. And don't forget, you're in subpart H. You still have to continue to show outcome at the end before you get a final approval. Let's talk about the lenifibrinol. It gets a little bit easier because there's both natural solution as well as fibrosis that Lanny have achieved. So you get a fibrous scan. I put the primary end point here. You won't get more than 20% reduction and ELF reduction and ALT reduction. Again, we can talk about composite end points and the p-value and you enrich your secondary end points with multiple others. The entry criteria can be something like FAST on this score and you're in subpart H. Finally, let's say this is a cirrhosis trial and you have a pill that's going to reverse cirrhosis and reduce fibrosis called Superzam, the super pill. So you can use the fibrous scan along with ELF, the 20% reduction in ELF. And we can talk about MRE. MRE, I told you, we're still reading the data to correct with the regression, but you can use maybe the 1.kpa reduction. And I told you about what can be entry criteria, which is a value between 4.5 and 6.5, and you can use multiple secondary end points such as liver-related outcomes and even subpart H. So in summary, NIT now assess severe disease severity, monitor disease changes, longitudinally correlate with liver-related outcomes. And at least we should start the dialogue in how to move beyond liver biopsy for drug approvals. And I'll open the stage for discussion. Thank you. Thank you, Mazen. And thank you all. You put a lot of thought and a lot of effort in your presentation. So again, thank you on behalf of all the participants. And I guess we have a few minutes for some discussion. Is that right? So maybe I can start with just a few questions. So Naeem, you explained to us the difficulties in trying to make sense from very different assessments when you use several pathologists. Does that mean that the preferred method should be to just use one single pathologist? You know, we need more data on this. I think, you know, what we are noticing as I showed, you know, if you have the same pathologist, we read the slides, there's a lot of variability. If you have two pathologists read the slides separately, that they only agree in about 60%. And then in 40%, they have to read the slides together. So in my mind, I think we just need to actually conduct a good study where we compare these different ways of reading. So you look at one pathologist, maybe three pathologists reading separately, and then all three pathologists reading at the same time, and then have all these slides being re-read with the same method. This way we can answer this question and actually, you know, give you a good answer. Because even if you have three pathologists reading the slides, maybe initially the concordance will increase. But what if you have them re-read the slides a year later? Would you have different readings? I suspect that we might actually. So we just need to study this. I mean, we all have ideas how to improve it, but we really don't have any good systematic data at this point. Okay, okay. I would say if you trust the pathologist, what's the use of using several pathologists? It creates a lot of problems. The more pathologists you use, because why won't you use five or 10? The less agreement you'll have. So, okay, this is a complicated issue, but I think it's worth asking the question, should we rely on a single pathologist? Is it just minimizing at least part of the variability? Or do you see any downside to that? So maybe we'll come back to that a little later. You didn't mention at all sampling variability. Is that on purpose? Do you think that's no longer an issue? No, I mean, I think you need to have a good biopsy. Absolutely. So, you know, the length of the biopsy, in my mind, should be at least two centimeters. You want to have more than 10 portal tracts. You know, you want to make sure you have a large enough gauge of the needle you're using. I think this is very important. I didn't mention it, I think it's a known fact, but absolutely you cannot use a biopsy that's one centimeter with two portal tracks to give an accurate assessment. This is very important. I don't know if our participants can talk live, but if so, I think we have Elizabeth Brunt and I'm sure she may want to have a very short comment on this, if this is possible. If not, we will move with maybe other considerations. So please let me know if that is possible. Manal, the issue with using these biomarkers, and this is both for Manal and Mazen, is not so much to show that the baseline values are somehow associated with whatever happens in the future. If you use them in a clinical trial, what you have to show is much more than that. And precisely what needs to be demonstrated is that changes in those biomarkers between end-of-treatment and baseline are associated with changes in histology or with the occurrence of clinical events. And not only you have to show that, but you have to show that in both directions, meaning that not only when it increases you have more events or you have worsening of histology, but also, and that's probably where we have less evidence, is when they decrease that you have less events and or an improvement in histology. And I think there is very little evidence for this at this point. Any short comment from one of you? I would agree with all those points that you've made, Vlad, but the same argument holds true for histology. There's no evidence that resolution of NASH at this point changes a clinically meaningful liver-related outcome. And at the same time, one of the baseline, it could be utilized, for example, as an enrichment of, to develop a homogeneity of that phenotype that you'd like to enroll into a trial and that the longitudinal follow-up, which I believe that resorting to use of non-invasive markers will allow us to enrich our sample sizes. I believe that our NASH trials are grossly underpowered. And we need to be considering conducting NASH trials in the same way we've seen endocrinologists and cardiologists perform registration trials. And that is an Achilles heel for us as a field. But the other nice thing about the non-invasive is you're looking at a change from baseline and hopefully a longer duration of follow-up that can inform these endpoints. But the other strength is that there's no subjective interpretation in them. They are very objective. And MR-PDFF is an objective measure. So is the elastography. So is a PIV4 measure. So is an ELF. So this observer variability falls by the wayside. And we don't need to talk about whether we need one pathologist, three pathologists, or 10 pathologists. So then follow-up question to Mazen. It's nice and everything to show all these markers and how they correlate with things. But you very well know that in the studies that have been performed so far, not all of them show the same result. Sometimes you get the result you expect with PIV4, but not with ELF. Or sometimes you get it with NFS, but you don't get it with MRE. So if you want to use these biomarkers knowing that they're all imperfect, meaning the accuracy is not 100% versus the imperfections of biopsy or whatever. Do you need to use two, three out of four, two out of three? Do they need to go both a certain amount above or below a threshold? Do you need to combine them or you just trust the one that happens to be gives you a good result? I think that is a different question. That's a good question, Vlad. I think you can mix the same argument for a biopsy as well. I think you need to use the, as I said, you need to use the combination of the most accurate ones that they showed correlation with outcomes. And so I showed them, you have MRE, you have ELF, you have fibroscans, and you put them together and you use those that you have the most confidence on. And the more you build, the more data you build on correlation with outcomes, you start using those. They'll have to combine marker or they'll have to all be positive in the way they change in order for your trial to be positive. And I'm afraid we'll have to devise some algorithm or we'll have to consider them one by one and check all of those who are positive and then say, okay, the trial is relatively positive. I think you have to combine them and see that you have to have a confidence that this combination is going down to replace a liver biopsy. That's how I see it in my mind. And let me also build in on what Manal said. The Nash Resolution has the same story. It doesn't have a very good data that Nash Resolution correlates with clinical outcomes. So why worsening? You're showing that this stiffness correlates with clinical events. If they get worse, why them getting better doesn't help the clinical events? So you have to have some, I guess, faith in the process and try to advance the field in a way or another. Well, you know that sometimes things that get better do not necessarily correlate with the other outcomes. Sure. And that's how we moved with the Nash Resolution. And I think we can do the same thing with NITs. And I'm not saying go use Elfalone or use Fibroscan alone. I'm just saying have a combination of those. Let's all sit down on the table and discuss. We all get frustrated here and there when you have a patient with F3 has no ballooning and he's excluded from clinical trials. And you're like, what do I do with this patient? So we have to sit down and talk. Just as a reminder, when we treated HCV or HBV, we all remember that Fibroscan goes down very fast before even there is any change in fibrosis. So anything that goes down does not necessarily reflect the outcome of interest. I'm going to turn to a question that was asked here in the chat box. And this is for Naim. How can we feed the artificial intelligence system and algorithm if the pathologists themselves do not have unified and consensually agreed criteria? The machine depends on the uncertainty of the human being, so therefore it will only replicate the uncertainty. How do you respond to that, Naim? Yeah, I mean, I think that's a valid point. You know, with these algorithms, they depend on how you train them initially. But what's being done now is that, you know, you have the pathologists basically annotate every single histologic finding that we care about. So they circle the balloon hepatocytes. They circle the steatotic hepatocytes. They circle the inflammation, the bile duct changes that we are noticing now, the fibrosis stage. So they do it, and then you're training the algorithm to find these specific findings. So by doing this, by having, as I showed, you know, 65 pathologists looking at 600 slides, annotating thousands of features, you think that the algorithm is going to actually get better in finding these and then providing this quantitative assessment. So I think, you know, this is, it holds true to everything we do in artificial intelligence, whether it's detecting polyps or detecting, you know, cancerous skin lesions. There's the human variability, but as you have more humans, you have more readings, you validate the algorithm against the labeled data, then you actually use it against unlabeled data and then have the humans re-evaluate that the algorithm will continue to get better. And these algorithms just improve. You know, humans have good days, bad days. You have experienced people, less experienced people. The algorithm only gets better with time. Yeah. So this is an important point. I think at some point, humans will need to evaluate what the algorithm has produced. Otherwise, and basically your answer is it depends on the amount of data you give the system. A question for whoever wants to take this question from the chat box. Any comment on what is CT1 worth and what does it correlate with and whether there are any thresholds? Who wants to take this? I mean, I think CT1 is emerging as one of these non-invasive tools that has great value in establishing the baseline severity, but also to follow response to treatment. There is some emerging data that a reduction by 88 milliseconds from baseline CT1 or maybe a 21% reduction from baseline may correlate with histologic improvement. Of course, we need more data to be determined, but similar to the MRE story, MRE is a great test at baseline to determine stage of fibrosis, but the longitudinal piece needs more data. So more to follow on this, but I see it as a promising NIT. There are a lot of questions about which are the best drugs. We're not going to answer any of these questions because we're given the mandate to talk about diagnostics and how to build clinical trials, not to discuss about specific therapeutics. So although we probably have a lot to say there, I think it's better to leave that for another webinar. Now, one thing about, of course, what Mazen has shown us is very provocative, meaning trying to, in a concrete way, design a trial based on non-invasive markers as entry criteria. What bothers me a little bit though, and I know you did that for didactic purposes, you chose two different molecules, but you also chose two different sets of non-invasive markers as inclusion criteria, selection criteria. The problem is we should not pick and choose the NITs that look good or provided a positive result with one given molecule and do trials differently when using different molecules. It has to be, if we got to use these NITs, it's got to be something that can be universally used regardless of the type of drug, I would assume, unless you give me a reason. No, you're right. You're right. Absolutely right. I think we have to. Because otherwise we'll never be able really even to compare between trials and we'll be accused of cherry picking things that work best for a given trial. No, no, no. You are absolutely right. It's not like, I'm not saying those are the different models that we should go ahead and use them. Those are, I meant to present examples, but I think what should be done next is a discussion table or a conference that we agree what is the best design, best NITs that we should move forward with. Of course, considering mechanisms, considering serotics versus F2 and F3, considering entry criteria and go with them. But I meant to present different flavors that can be considered, but you're absolutely right. I do agree with you. I didn't mean to bother you. I didn't mean to bother you, Vlad. I just want to say one thing, Vlad, about how we moved forward with the field. If you recall, the white paper in 2011 that established NASH resolution, fibrosis regression. At the time, there was no data that even fibrosis regression correlates with any outcomes, right? We had none of this data. There was data that the stage of fibrosis at baseline predicts outcomes. More recently, Dr. Arun Sanyal presented the ASOB data from the Symptosoma of the Salon Sorter programs, combining about 1,100 patients, showing that in serotic patients, when you have fibrosis regression by one stage, you had less outcome. As far as I know, this is the only data, and if you look at how many people developed outcomes, it was 70 or so. There's no other data showing that even fibrosis regression from F3 to F2 correlates with any outcomes. So, we made the jump with liver biopsy from day one, and we accepted this as a field. The FDA accepted it, and we appreciate that, and I think that was the right decision. But I feel like as a field, if we sit down and decide, hey, we're going to evaluate a set of NITs, and then, of course, do an interim analysis, and then do a long-term follow-up study to see what happens to these patients in terms of heart outcomes, this is not an unreasonable argument to make. Yeah, so the goal, we're not trying to say, it's not the goal to try to deliver biopsy at all. We're just saying this is a very prevalent disease. We are having difficulties with liver biopsies. There will be 6,000 to 8,000 patients there. They're going to enter Phase III clinical trials. We're going to require biopsies. You talked about the pathologists we trust. There are very good pathologists out there, but we're going to have 8,000-plus patients as well. So, we should start thinking about alternative for drug approvals. And remind you now, this disease, we have been managing it for not the last year or two. Clinically, it has been around for some time, and the clinicians are not doing liver biopsies every day. They are using whatever they can, the hepatologists, noninvasively, and they're making clinical sense out of it. So, something to think of. Just to add to this, go ahead, Manal. I think a comment that was raised by Dr. Elizabeth Brunt, which is very appropriate. I mean, we have the same issues with NHTSA as we do with histology on the borderline NASH cohort and their performance. And at the same time, I think it is worthy to revisit, you know, NASH, even NASH resolution is an endpoint. We haven't yet demonstrated that it correlates definitively short of fibrosis prediction with a clinically meaningful outcome. But yet, this all will ultimately boil down to what the FDA has always said. It's a safety, efficacy, risk ratio. If we have drugs that are incredibly safe and cost effective, these hard and fast parameters of borderline NASH and definite NASH will fall by the wayside because the patients have not only favorable clinical outcomes beyond liver disease alone, all-cause mortality, cardiovascular mortality, the rigor by which any surrogate, whether it be histology or a NIT, which I think are comparably performing with their insensitivities, falls by the wayside. And we just have to be believers, as Vlad nicely said in previous forums. Okay. I have a question for you, Manal. I'm told that we need to wrap up. So I think we'll leave Naim's question insufficiently debated, although this is a crucial point. But it'll be for another time. Manal has shown, in the end, if I understand well your talk, a very provocative paradigm where non-invasive markers would basically replace at all stages, both for the interim and for the end-of-treatment assessment, liver histology and heart clinical outcomes respectively. This is a very bold move. I think that for the moment, we would be very satisfied if at least non-invasive markers could replace the interim histological analysis of the histological surrogates. I believe, and that's an open question for you to answer very briefly, and I know it's a hard exercise, but the question is, won't people always feel the need for having true heart clinical outcomes in hand and not just surrogates? I mean, death is death, transplantation is transplantation. People might not be convinced by an NID changing, but they could be convinced that interim for histology. Don't you think that would be a more reasonable approach? Well, I don't know what was hard to answer briefly or to answer boldly. You understood my message correct. I think we are at a stage where we can be bold, that these biomarkers, and we can have lower risk, can be used not only for interim, but why not keep patients on longer-term trials to not only assess interim, but endpoints that are meaningful. Right now, we're truncating our studies probably short, and we're doing so with insufficient sample sizes to render any meaningful interpretation of our data, and I think we need to change that paradigm, and to do so means lower risk interventions, lower costs, and longer trials. Okay. I'm told we need to wrap it up. I would go on for much longer. This is a very thoughtful panel, very nice to discuss with you, but I guess we have to finish here. I'm told that some of the questions can still be answered privately by each one of the speakers, so thank you again for giving me the opportunity to chair this, and congratulations to the wonderful talks of my colleagues, and meet you again physically, not virtually in the very near future. Hopefully soon. Bye-bye. Thank you. Take care. Bye-bye. Thank you, everyone.
Video Summary
The speakers discuss the challenges of using liver biopsy as a clinical trial endpoint in NASH trials. They highlight the subjective nature of histologic assessment and the variability among pathologists in interpreting biopsy slides. They also discuss the limitations of histologic assessment in identifying patients with fibrotic NASH and in determining response to treatment. They argue that histologic assessment fails to meet the tasks of accurately identifying patients with fibrotic NASH and accurately identifying response to treatment. They propose the use of non-invasive biomarkers as an alternative and more objective method for assessing disease severity and response to treatment. They discuss various non-invasive biomarkers, including serum markers and imaging techniques, and highlight their ability to assess severity, monitor disease changes longitudinally, and correlate with liver-related outcomes. They suggest that these biomarkers could be used as primary endpoints in clinical trials and as a basis for drug approval. They also discuss the need for further research to determine the best combination of biomarkers and to establish thresholds for disease severity and response to treatment. Overall, they argue that non-invasive biomarkers can provide a more accurate and objective assessment of disease severity and response to treatment in NASH trials, and that the field should consider moving away from reliance on liver biopsy as a primary endpoint.
Keywords
liver biopsy
clinical trial endpoint
NASH trials
histologic assessment
pathologists
biopsy slides
fibrotic NASH
response to treatment
non-invasive biomarkers
serum markers
imaging techniques
×
Please select your language
1
English