Numerous diet-based models have been developed to induce metabolic dysfunction-associated steatotic liver disease and steatohepatitis (MASLD/MASH) in mice. Chronic consumption of high-energy feeds enriched with fat causes steatotic buildup, lipotoxic inflammation, hepatic dysfunction, and liver fibrosis. Next generation diets1 include fructose and cholesterol as accelerants toward the diseased liver state.
A key challenge to using diet-induced MASH mice for preclinical research involves selection and quantification of disease endpoints. In humans, underlying metabolic syndrome is a typical precursor for the later onset of MASH. For both the underlying and liver stages of disease progression, some of the hallmarks of the human condition are recapitulated in the mouse, while others are not. Even among conserved symptoms, some manifest themselves differently in the mouse, despite similar physiological underpinnings. Deciding which endpoints to measure and how to interrogate them requires careful consideration. New trends in sampling techniques will also be presented.
Endpoint selection aims
When selecting disease endpoints to measure in the mouse, one should consider:
- Translational relevance to the human condition, especially with regards to markers measured during clinical trialing.
- Economic and practical limitations of the researcher's budget, skillset, and supporting analytical infrastructure.
- Enabling noninvasive, nonterminal sampling, so disease progression/resolution can be measured serially and longitudinally, and terminal values can be compared to baseline for individuals rather than between groups.
Reliable quantification of disease induction is crucial. Individual mice respond variably to MASH diets, even within an inbred background. Some will prove resistant to the diet, and it is desirable to exclude them from the therapeutic phase of the study. Sorting subjects into treatment arms should occur when sufficiently high baseline values exist to offer measurable windows for therapeutic effect to be confirmed. Sorting should minimize standard deviation between groups. This will be more challenging if the trial is prophylactic in design, since low responders will not be obvious and larger group sizes will be needed to appropriately power the study.
Standards for the Evaluation of MASH
Liver biopsy is the gold standard for baseline and post-treatment evaluation of human patients in Phase 2/3 MASH trials. The FDA's draft guidance on MASH trial design recommends the following efficacy endpoints:
- Resolution of steatohepatitis on overall histopathological reading and no worsening of liver fibrosis, or
- Improvement in liver fibrosis greater than or equal to one stage and no worsening of steatohepatitis, or
- Both resolution of steatohepatitis and improvement in fibrosis.
Human MASH is scored along one of several consensus composite indices. This requires a pathologist to evaluate criteria including steatosis severity, inflammation, and hepatocyte morphology. Fibrosis is also evaluated by the pathologist along an index that measures the localization and extent of scarring. These indices have been adapted for MASH/fibrosis as they present themselves in the mouse2.
Liver biopsy is necessarily an invasive surgical procedure, with high challenge, cost, and risk burdens, and scoring is inherently subjective. To generate a more robust data set, patient serum is also obtained and used to measure surrogate markers of liver injury and impairment. Additional analytical chemistry interrogates metabolites that are salient to MASH.
Two Phase 3 trial drug candidates have recently completed their interim (18 month) analyses. Intercept Pharmaceuticals' Ocaliva (REGERATE) generated positive data3 and is continuing with the long-term safety arm of the trial. Genfit's elafibranor (RESOLVE IT) did not achieve key objectives and was discontinued4. Regardless of outcome, the trial endpoints and patient enrollment strategies provide a useful context to discuss MASH study design for the mouse.
Direct liver interrogation of MASH
Human endpoints: Ocaliva improved biopsy-confirmed liver fibrosis by ≥1 stage in a greater number of patients than placebo. Ocaliva did not worsen patient MASH composite score. Elafibranor did not achieve significant improvement in either metric.
Human endpoints explained: Liver biopsies were obtained at baseline screening and at 18 months. A MASLD activity score5 (NAS) and a fibrosis staging score were determined by pathologist evaluations. The NAS is a composite score ranging from 0 to 8, comprising: steatosis grade (0-3) + lobular inflammation degree (0-3) + hepatocellular ballooning grade (0-2). Fibrosis staging ranges from 0 to 4; at stage 3, "bridging" fibrosis is observed to connect lobules and portal areas.
Mouse endpoint equivalents: Survival biopsy of the mouse liver is possible6. This requires advanced technical expertise, and risks animal infection or death. If survival biopsies are not feasible, a cohort of mice can be designated for terminal biopsy collection to serve as a representative baseline group. In this fashion, experimental groups could not be sorted based on biopsy results, so would not generate longitudinal histological data; furthermore, sub-responders could not be sorted out.
A mouse NAS (0-8) has been adapted from the human scoring system. The same parameters are interrogated. A key limitation is that hepatocyte ballooning rarely exceeds a score of 1. Fibrosis staging can also be evaluated, but stage 3 (bridging) typically takes extremely long induction periods to achieve using purely dietary means, thus may not be a practical endpoint to aim for.
Recent trends: To overcome the qualitative and subjective nature of a pathologist's review, quantitative data can be derived from stained and immunostained liver tissue. This is also useful for describing MASH fibrosis in shorter timeframes. It is increasingly common for histology providers to offer high content imaging and algorithm analysis of endpoints including but not limited to:
- % area fibrosis (collagen deposition using PicroSirius Red or Masson's Trichrome stain)
- % area steatosis (lipid droplets using hematoxylin & eosin stain); this can also be delineated into micro- and macro-vesicular steatosis levels
- Inflammation (% area using galectin 3 immunostain, immune cell density using H&E stain)
- Hepatic stellate cell activation (cell density using α smooth muscle actin stain)
An emerging technique may soon become broadly available for noninvasive and quantitative measurement of rodent liver fibrosis. Shear wave elastography (SWE)7 involves pulsing acoustic force toward the liver and measuring the velocity of the shear waves that are generated; the calculated Vs has a strong correlation to the severity of liver fibrosis. SWE is an upgraded modality of transient elastography that incorporates imaging for improved resolution of liver disease foci.
MASH surrogate and biomarker interrogation of liver diseases
Human endpoints: Ocaliva robustly decreased levels of the circulating biomarkers alanine and aspartate aminotransferase (ALT, AST) and γ glutamyl transferase (GGT). In a Phase 2b trial (GOLDEN 505)8, elafibranor had moderate effect on ALT but robustly improved GGT and alkaline phosphatase (ALP).
Human endpoints explained: Liver function tests quantify hepatic enzymes that are released into circulation more abundantly in cases of liver injury or dysfunction. ALT and AST are gold standard biomarkers for hepatocellular diseases including hepatitis, toxicity, and cirrhosis; GGT and ALP describe cholestasis and oxidative stress. Sampling for these is noninvasive, can be done with great frequency, and routinely serves as a surrogate for liver biopsy in Phase 2 trials.
Mouse endpoint equivalents: All four biomarkers are routinely used to monitor MASH and other liver diseases in mice. ALT is extensively used for sorting animals by disease severity at baseline. If mouse blood is not required for additional analyses, these biomarkers might be sampled as frequently as every two weeks, although a four-week sampling interval is more common.
An animal's fasted/fed state and the route of blood sampling may influence biomarker levels. Downstream analytical methodologies may stipulate the use of specific anticoagulants that can also impact values. Best practices should include standardization of sample collection time of day, route, and collection vessels used throughout a study.
Recent trends: ALT, AST, ALP, and GGT are biomarkers for liver dysfunction, but not MASH per se. In the clinic, the lack of widely accepted noninvasive tests specifically developed to quantify MASH is a barrier for patient diagnosis, but numerous candidate assays are emerging in popularity. These include NIS4, a multianalyte assay for quantifying microRNA miR-34a (steatosis/inflammation marker), alpha 2 macroglobulin and glycoprotein YKL-40 (fibrosis markers) and HbA1c (metabolic marker)9. These biomarkers merit validation in different rodent models of MASH as noninvasive tools to quantify disease progression.
Serum chemistry and metabolic comorbidities
Human endpoints: Ocaliva trial recruits were required to present with at least one comorbidity of MASH, including obesity or type 2 diabetes. Elafibranor recruits were evaluated at baseline and post-treatment for blood glucose, serum triglycerides, and insulin resistance.
Human endpoints explained: MASH is commonly preceded by metabolic syndrome, defined as the presence of at least three of five of the following cluster factors: abdominal obesity, high serum triglycerides, low HDL cholesterol, high blood pressure, and hyperglycemia/insulin resistance. MASH drugs may work along a mechanistic axis that acts systemically on fat or glucose metabolism, and thus have beneficial secondary effects on metabolic syndrome.
Mouse endpoint equivalents: Routine measurements of body weight (weekly or daily) is best practice for most mouse studies, and especially practical for high fat MASH diet models. Body weight is convenient for sorting animals at baseline, and to weed out sub-responders. If infrastructure permits, the relative proportions of fat to lean mass can be measured using dual energy X ray absorptiometry. For improved resolution, depot specific fat masses can be quantified using magnetic resonance imaging. Hepatomegaly (liver as % of body weight) is also an easily scored terminal endpoint.
For many mouse dietary MASH models, the facets of hypertriglyceridemia and hyperglycemia may be absent or only weakly recapitulated10,11. The former is not crucial, as fat accumulation within the liver is a more pressing focus and a proven model prerequisite. Regarding the latter: while fasted blood glucose may appear normal in diet-induced MASH mice, a glucose incursion test may confirm these animals are in fact insulin resistant. Glucose clearance rates and HOMA IR are practical efficacy endpoints that may also be used for sorting purposes in addition to disease monitoring.
Sampling strategy summary
The table below summarizes various potential endpoints for preclinical MASH studies.
| Technique | Invasive? | Useful for sorting? | Utility for longitudinal sampling? | Popularity | Comments |
| Body weight | No | Yes; routine | Routine, weekly | High | Routine best practice for rodents studies; no cost |
| Serum ALT | No | Yes; routine | Routine, every 4 weeks | High | Most commonly used surrogate for direct liver interrogation; affordable |
| Serum AST, ALP, GGT | No | Yes | Common, every 4 weeks | Mid | AST is more commonly analyzed than GGT, ALP; AST is usually not affected to same degree as ALT |
| Serum glucose | No | May be useful to disqualify outliers | Poor (not reliable) | Mid | Many diet-based models do not induce appreciable hyperglycemia; easy and affordable |
| Serum triglycerides | No | Not reliable | Poor (not reliable) | Mid | Many diet-based models do not elevate, or may actually decrease serum triglycerides |
| Glucose tolerance test/ HOMA-IR | No | Yes | Fair, every 4 weeks | Mid | Time consuming; requires technical expertise; measuring insulin for HOMA-IR adds cost |
| % body fat composition (DXA) | No | Yes | Good, every 4 weeks | Low | Requires specialized instrumentation and training |
| Body composition (MRI) | No | Yes | Good, every 4 weeks | Low | Requires specialized instrumentation and training |
| Hepatomegaly (% liver weight) | Yes; terminal | No; may be useful as a baseline endpoint for non-longitudinal sampling strategy | No | High | Routinely calculated when livers are harvested for other analyses |
| Survival liver biopsy | Yes | Yes | No, repeat sampling is strongly discouraged | Mid | Most direct way to interrogate liver for sorting animals and obtaining longitudinal baseline data; requires specialized training; can provide numerous post-hoc histology readouts |
| Terminal liver biopsy | Yes; terminal | No; may be useful as a baseline endpoint for non-longitudinal sampling strategy | No | High | Most direct way to interrogate liver; requires larger animal cohorts so study will be adequately powered, as subresponders will be sampled; can provide numerous post-hoc histology readouts |
| Liver stiffness (SWE) | No | Low | Good, every 4-8 weeks | Low | Relatively new technique that does not have track record to support widespread use |
