KLF14, adipose dysfunction, insulin resistance and type 2 diabetes: from genetic discovery to biological mechanisms and translation.

Lead Research Organisation: University of Oxford
Department Name: RDM OCDEM

Abstract

Type 2 diabetes is a major and growing cause of illness and death across the world. However, incomplete understanding of the processes involved in the development of this condition acts as a barrier to the development of better ways for treating and preventing the disease.

In the past few years, collaborative efforts in human genetic discovery have identified over fifty positions in the human genome where individual sequence variation is associated with risk of type 2 diabetes. In principle, each of these genetic "signals" holds important clues to the mechanisms that are responsible for the maintenance of normal metabolic health. However, in most of these regions, we have yet to define the specific genetic variants responsible for the signal, or the particular genes through which the effects on diabetes-risk are mediated. As a result, the pace of biological insights has lagged behind that of genetic discovery. A key challenge for the field is to develop strategies that enable genetic signals such as these to be mined for the biological clues they can provide.

We recently demonstrated that one of these diabetes signals maps near a gene named KLF14, and that this effect on risk of diabetes is driven by a reduced ability of insulin-sensitive tissues, including muscle and liver, to respond to insulin. We have also shown that, in fat tissue, the same variants near KLF14 have widespread effects on the levels of expression of a wide range of genes, including KLF14 itself. These data are consistent with a model whereby KLF14 acts as a major regulator of events in fat tissue, with these alterations in the levels of KLF14 leading, through as yet unspecified mechanisms, to peripheral insulin resistance and type 2 diabetes.

The overall aim of this research is to define the molecular, cellular and physiological mechanisms which are responsible for the relationship between KLF14 and type 2 diabetes. To do this, we will pursue a number of complementary research strategies which include studies of human genetics and physiology and cellular studies in fat cells.

More specifically, our work aims to:
(a) define the specific sequence variants that are responsible for all of these effects, and how they influence levels of KLF14 within fat cells;
(b) characterise the suite of genes that are turned on and off by KLF14, and understand the consequences of those secondary changes on the function of fat tissue; and
(c) understand how it is that these KLF14-dependent changes in fat tissue lead to changes in the ability of remote tissues (such as muscle and liver) to respond to insulin.

We expect that, by focusing on the role of KLF14 and the ways in which it leads to an increased risk of diabetes, we will gain valuable, generic insights into the mechanisms whereby events in fat have widespread metabolic effects in other tissues. An answer to this question would help us improve our understanding of the connections between obesity and diabetes, an issue of supreme importance given that much of the increase in diabetes prevalence around the world is driven by changes in the amount, distribution and function of fat.

The research provides an opportunity to capitalise on recent discoveries in human genetics to advance understanding of disease biology. In the application, we explain how the biological information generated by this project can be exploited in directly translational studies to define novel approaches for treating, preventing, diagnosing and monitoring type 2 diabetes.

Technical Summary

This major objective of this proposal is to define the molecular, cellular and physiological mechanisms responsible for the association between variants upstream of KLF14 and type 2 diabetes. Our recent work has demonstrated that KLF14 is a master regulator of expression in adipose tissue, and implicate this KLF14-regulated transcriptional network in the genesis of insulin resistance in remote tissues such as muscle and liver.

The research will seek to understand the causes and consequences of altered KLF14 expression using a variety of complementary approaches that include:
(a) human genetics: resequencing, fine-mapping and epigenomic analyses;
(b) integrative physiology: detailed metabolic analyses and genomic studies and their relationship to genotype; and
(c) cellular studies: cellular phenotyping and genomic studies in adipocytes following knockdown and overexpression of KLF14 and selected trans-genes.

The work has four main aims:
(a) To characterise the molecular, cellular and physiological consequences of altered KLF14 expression;
(b) To identify the trans-genes mediating the effect of KLF14 on insulin resistance and type 2 diabetes;
(c) To define the cellular and physiological mechanisms whereby altered KLF14 expression in adipose tissue leads to generalised insulin resistance and T2D;
(d) To characterise the mechanisms whereby common variants upstream of KLF14 influence KLF14 expression.

Our research will, by elucidating the mechanisms linking KLF14 sequence variation to diabetes predisposition, provide valuable, generic, insights into the consequences of adipocyte dysfunction on whole-body physiology. The research can be expected to define novel, causally-validated, targets that are substrates for therapeutic and biomarker development, and the application sets out some of the strategies that we plan to deploy to support those translational goals.

Planned Impact

The principal beneficiaries of the research will be:
a) academics in the fields outlined in the "academic beneficiaries" section;
b) industry and biotechnology companies, in a position to exploit the improved biological understanding we seek to provide to develop novel products and services (see below);
c) the public sector (NHS, policy-makers), in the event that the research generates translational advances that provide more cost-effective means of treating and/or preventing diabetes and other insulin-resistance related diseases (e.g. hyperlipidaemias);
d) the wider public, if those translational advances provide more acceptable, more effective strategies for the treatment and prevention of those conditions.

The academic benefits will be manifest through:
a) the generation of new knowledge related to diabetes pathogenesis, which has the potential to contribute to amelioration of the social, economic and personal costs of the "epidemic" of global diabetes;
b) the development of research models and molecules of value for academic research;
c) bolstering of research in human integrative physiology (given concerns about declining expertise);
d) improved training of researchers in the specific areas of research activity, and in the development of cross-disciplinary expertise.

The broader economic and social impact will be manifest through:
a) economic benefits to pharma and biotechnology companies (including "spin-outs" with potential for attracting "inwards" investment) able to exploit actionable translational opportunities with respect to the development of novel therapeutic approaches (building on targets we validate) or clinically-useful biomarkers (building on candidates we identify). Given the scale of the global problem and the inadequacy of current therapeutic and preventative options, the opportunities for wealth creation are substantial;
b) improved effectiveness of public services if the biological insights result in better ways of treating and preventing T2D and related conditions (novel treatments, better diagnostics, improved strategies for stratifying risk and response to interventions);
c) transformation of public policy if the research leads, over time, to improved public health strategies for the prevention of T2D;
d) improved health outcomes (less diabetes-related morbidity and mortality, fewer diabetes complications) if the work leads to effective clinical translation, resulting in further personal, social and economic benefits.

It is of course important to be realistic about the timelines for effective clinical translation: too often expectations in this regard are unrealistic. In practice, the time from "new biology" to "novel treatment" involves years of biological validation, target characterisation, lead molecule optimisation, and clinical evaluation. As is well-known, substantial attrition is typical at each stage. On the positive side:
a) we have recently shown (with the validation of novel biomarkers for monogenic forms of diabetes) that it is possible to move rapidly from genetic discovery to clinical utility, at least where biomarkers are concerned;
b) the massive unmet clinical need and the scale of the global problem with respect to diabetes and insulin resistance will support investments that would not be economic for other diseases;
c) the origins of our research in genetic association data means that the benefits of modulation of the KLF14 network in man have intrinsic causal validation: they also make it possible to explore the whole body effects of such modulation (including potential "on-target" side-effects) through human genetic and physiological studies, helping thereby to minimise attrition;
d) we are well-equipped, via the Target Discovery Institute in Oxford, to initiate high-throughput screens for potential small molecule modulators of this pathway, and to do so in parallel with some of the biological validation described here, thereby expediting progress.

Publications

10 25 50
 
Description Identification and functional evaluation of genetic and epigenetic determinants of human fat distribution; investigations to understand the cardio-protective effect of lower body adiposity.
Amount £779,060 (GBP)
Funding ID RG/17/1/32663 
Organisation British Heart Foundation (BHF) 
Sector Charity/Non Profit
Country United Kingdom
Start 06/2017 
End 06/2023
 
Description Innovative Medicines Initiative BEATDKD
Amount € 15,000,000 (EUR)
Funding ID BEATDKD 
Organisation European Commission 
Department Innovative Medicines Initiative (IMI)
Sector Public
Country Belgium
Start 08/2017 
End 08/2022
 
Description Innovative Medicines Initiative RHAPSODY
Amount € 8,000,000 (EUR)
Funding ID 115881 
Organisation European Commission 
Department Innovative Medicines Initiative (IMI)
Sector Public
Country Belgium
Start 03/2016 
End 03/2020
 
Description MRC Experimental Challenges Grant
Amount £2,400,000 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 08/2014 
End 08/2018
 
Description NIH R01
Amount $1,900,000 (USD)
Funding ID MH101814 
Organisation National Institutes of Health (NIH) 
Department National Institute of Mental Health (NIMH)
Sector Public
Country United States
Start 08/2013 
End 08/2016
 
Description NIH R01
Amount $2,400,000 (USD)
Funding ID DK098032 
Organisation National Institutes of Health (NIH) 
Department National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Sector Public
Country United States
Start 08/2012 
End 08/2016
 
Description NovoNordisk Funden Immunometabolism
Amount 12,000,000 kr. (DKK)
Funding ID TRiiC 
Organisation Novo Nordisk Foundation 
Sector Charity/Non Profit
Country Denmark
Start 04/2016 
End 04/2020
 
Description Program grant
Amount £1,700,000 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 02/2015 
End 01/2018
 
Description Project Grant- MRC KLF14
Amount £138,271 (GBP)
Funding ID MR/J0106421/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 08/2012 
End 01/2015
 
Description RFP2 call for available datasets
Amount $2,400,000 (USD)
Organisation Foundation for the National Institutes of Health (FNIH) 
Sector Charity/Non Profit
Country United States
Start 03/2016 
End 08/2017
 
Description U01
Amount $1,800,000 (USD)
Funding ID DK105535 
Organisation National Institutes of Health (NIH) 
Department National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Sector Public
Country United States
Start 03/2015 
End 03/2019
 
Description Wellcome Investigator Award
Amount £2,250,000 (GBP)
Funding ID 212259/Z/18/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 11/2018 
End 10/2024
 
Description Wellcome Trust Strategic Award (WTCHG/WIMM)
Amount £2,400,000 (GBP)
Funding ID 106130/Z/14/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2015 
End 12/2018
 
Title KLF14 knockout mice 
Description Global KLF14 knockout mouse 
Type Of Material Model of mechanisms or symptoms - mammalian in vivo 
Provided To Others? No  
Impact No recapitulation of the human phenotype in this mouse 
 
Title DeepCytometer pipeline parameter files, Klf14 mouse white adipose tissue histology and hand-traced training contours 
Description Latest description of this data set: Data.md at cytometer project
# Publications related to the data The data associated to the DeepCytometer project (https://github.com/MRC-Harwell/cytometer) is available from Zenodo (doi: 10.5281/zenodo.5137433 and 10.5281/zenodo.5149005). The histology and mouse measures were generated as part of the Small et al. 2018 study: > Small et al. "Regulatory variants at KLF14 influence type 2 diabetes risk via a female-specific effect on adipocyte size and body composition". Nature Genetics, 50:572-580, 2018. The hand traced data set, colour maps, and automatic segmentations were generated for the Casero et al. 2021 paper: > Casero et al. "Phenotyping of Klf14 mouse white adipose tissue enabled by whole slide segmentation with deep neural networks". bioRxiv, 2021. doi: [10.1101/2021.06.03.444997](https://www.biorxiv.org/content/10.1101/2021.06.03.444997v1.full). # Data protocols ## Histology and laboratory measures To develop and evaluate our methods we used Klf14tm1(KOMP)Vlcg C57BL/6NTac (B6NTac) mice tissue samples and additional data generated as part of the Small et al. 2018 study(Small et al. 2018). It should be noted that the single exon Klf14 gene is imprinted and only expressed from the maternally inherited allele(Parker-Katiraee et al. 2007). This was taken into account by (Small et al. 2018) by crossing a Het parent with a WT parent, so that each offspring inherited a WT allele from the WT parent, and the Klf14 gene knockout or a WT allele from the other parent (from the father, PAT, or the mother, MAT). We also take Klf14 imprinting into account by using as controls the PAT mice and comparing them to the MAT WT and MAT Het (or functional KO, FKO) mice. We used a total of 76 Klf14-B6NTac mice (nfemale=nmale=38), of which 20 mice from the Control and FKO groups were used for training and testing the DeepCytometer pipeline, as well as the hand traced population experiment (summary in Table MICE). The histopathology screen involved fixing, processing and embedding in wax, sectioning and staining with Hematoxylin and Eosin (H&E) both inguinal subcutaneous and gonadal adipose depots. For paraffin-embedded sections, all samples were fixed in 10% neutral buffered formalin (Surgipath) for at least 48 hours at RT and processed using an Excelsior™ AS Tissue Processor (Thermo Scientific). Samples were embedded in molten paraffin wax and 8 µm sections were cut through the respective depots using a Finesse™ ME+ microtome (Thermo Scientific). Sampling was conducted at 2sxns per slide, 3 slides per depot block onto simultaneous charged slides, stained with haematoxylin Gill 3 and eosin (Thermo scientific) and scanned using an NDP NanoZoomer Digital pathology scanner (RS C10730 Series; Hamamatsu). Body weight (BW) and depot weight (DW) were measured with Satorius BAL7000 scales. ## White adipose tissue segmentation For cell area quantification, we applied DeepCytometer v8 to 75 inguinal subcutaneous and 72 gonadal whole histology slides with DeepCytometer (with the Corrected method), including the 20 slides sampled for the hand-traced data set, corresponding to 73 females and 74 males, to produce 2,560,067 subcutaneous and 2,467,686 gonadal cells (on average, 34,134 and 34,273 cells per slide, respectively). Full segmentation of all whole slides was performed with script [klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py](https://github.com/MRC-Harwell/cytometer/blob/39358ed1d79df07d1d522b98728c7efd745513f7/scripts/klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py). In this case, the segmentation contours were grouped by tiles in the output AIDA annotation `.json` file (one contour per cell, one file per slide). Non-white adipocyte contours were filtered out, and white adipocyte contours were aggregated into an AIDA annotation `.json` file with a single tile with script [klf14_b6ntac_exp_0106_annotations_postprocessing_v8.py](https://github.com/MRC-Harwell/cytometer/blob/39358ed1d79df07d1d522b98728c7efd745513f7/scripts/klf14_b6ntac_exp_0106_annotations_postprocessing_v8.py) (one contour per cell, one file per slide). # List of directories and files ## Casero et al. (2021) "DeepCytometer pipeline parameter files, Klf14 mouse white adipose tissue histology and hand-traced training contours" (doi: 10.5281/zenodo.5137433) ### `deepcytometer_pipeline_v8.zip` (60.6 MB) Weights, colourmaps, etc. necessary to run the pipeline (v8, with mode colour correction). This is the version of the pipeline described in the paper. There are 10 weight files per convolutional neural network (CNN), corresponding to 10-fold cross-validation * `klf14_b6ntac_exp_0086_cnn_dmap_model_fold_[0..9].h5`: Keras weights for the **EDT CNN** (Histology to Euclidean Distance Transform regression) * `klf14_b6ntac_exp_0089_cnn_segmentation_correction_overlapping_scaled_contours_model_fold_[0..9].h5`: Keras weights for the **Correction CNN** (Segmentation Correction regression) * `klf14_b6ntac_exp_0091_cnn_contour_after_dmap_model_fold_[0..9].h5`: Keras weights for the **Contour CNN** (EDT to Contour detection) * `klf14_b6ntac_exp_0095_cnn_tissue_classifier_fcn_model_fold_[0..9].h5`: Keras weights for the **Tissue CNN** (Pixel-wise tissue classifier) * `klf14_b6ntac_exp_0094_generate_extra_training_images.pickle`: training dataset description * **'file_list'**: list of SVG files with hand-traced contours for network training. Each SVG file has a corresponding TIFF file with the histology used for segmentation * **'idx_test'**: 10 lists with file indices for testing in 10-fold cross-validation * **'idx_train'**: 10 lists with file indices for training in 10-fold cross-validation * **'fold_seed'**: seed number used for the random number generator to assign file indices to folds * `klf14_b6ntac_exp_0098_filename_area2quantile.npz`: quantile colour maps calculated in `klf14_b6ntac_exp_0098_full_slide_size_analysis_v7.py` using the whole Klf14 data set with v7 of the pipeline, and used in earlier experiments, including some where v8 of the pipeline was used for segmentation. * `klf14_b6ntac_exp_0106_filename_area2quantile_v8.npz`: quantile colour maps calculated in `klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py` using the whole Klf14 data set with v8 of the pipeline, and used in later experiments. * `klf14_training_colour_histogram.npz`: statistics from Klf14 histology images to be used in colour correction * **'xbins_edge'**, **'xbins'**: edges and centres of the bins used for histogram calculations * **'hist_r_q1'**, **'hist_r_q2'**, **'hist_r_q3'** * **'hist_g_q1'**, **'hist_g_q2'**, **'hist_g_q3'** * **'hist_b_q1'**, **'hist_b_q2'**, **'hist_b_q3'**: density quartiles (Q1, Q2, Q3) for RGB channels for each bin the histogram * **'mode_r'**, **'mode_g'**, **'mode_b'**: modes for RGB channels (this corresponds to the most typical background colour in the histology images) * **'mean_l'**, **'mean_a'**, **'mean_b'**: mean intensity for L*a*b channels of the image * **'std_l'**, **'std_a'**, **'std_b'**: intensity standard deviations for L*a*b channels of the image * `klf14_exp_0112_training_colour_histogram.npz`: other statistics from Klf14 histology images to be used in colour correction * **'p'**: vector of quantile values used in ECDF calculations * **'val_r_klf14'**, **'val_g_klf14'**, **'val_b_klf14'**: all intensity values for the RGB channels of Klf14 training images that contain at least a white adipocyte * **'f_ecdf_to_val_r_klf14'**, **'f_ecdf_to_val_g_klf14'**, **'f_ecdf_to_val_b_klf14'**: linear interpolation function that maps ECDF quantiles to intensity values in the Klf14 training data set. These functions can be used together with intensity->quantile interpolation functions calculated for a new histology image to perform histogram matching colour correction * **'mean_klf14'**, **'std_klf14'**: mean and standard deviation of the **'val_r_klf14'**, **'val_g_klf14'**, **'val_b_klf14'** vectors There are also weight files for the pipeline trained with all the data, instead of the 10-fold cross-validation partition. These were not used for the paper, but could be useful for future experiments * `klf14_b6ntac_exp_0101_cnn_dmap_model.h5`: Keras weights for the **EDT CNN** (Histology to Euclidean Distance Transform regression) * `klf14_b6ntac_exp_0104_cnn_segmentation_correction_overlapping_scaled_contours_model.h5`: Keras weights for the **Correction CNN** (Segmentation Correction regression) * `klf14_b6ntac_exp_0102_cnn_contour_after_dmap_model.h5`: Keras weights for the **Contour CNN** (EDT to Contour detection) * `klf14_b6ntac_exp_0103_cnn_tissue_classifier_fcn_model.h5`: Keras weights for the **Tissue CNN** (Pixel-wise tissue classifier) ### `histology.7z` (29.1 GB) 165 H&E histology whole slides from Hamamatsu scanner (`.ndpi`). ### `klf14.7z` (2.3 GB) Mice metadata, training/testing data sets for the pipeline, intermediate files created during training, and neural network weights for multiple experiments. * `klf14_b6ntac_meta_info.csv`: Klf14 mice metadata * **Animal Identifier**, **id:** unique ID for each mouse * **ko_parent:** heterozygous parent of origin for the KO allele (father, PAT or mother, MAT) * **sex:** female or male * **genotype:** wild type (KLF14-KO:WT) or heterozygous (KLF14-KO:Het) * **BW:** body weight (g) * **SC:** subcutaneous depot weight (g) * **gWAT:** gonadal depot weight (g) * **Liver:** livel weight (g) * **cull_age:** age at time of culling (days) * **BW_alive:** body weight measured before culling * **BW_alive_date:** age at time of BW_alive measure * **mother:** unique ID for mouse's mother * **mother_genotype:** mouse's mother genotype * `klf14_b6ntac_training`: Directory with hand-traced segmentations of training histology windows. 131 windows sampled from 20 whole slides, plus hand-traced contours that were used for training DeepCytometer and compute population distributions. These segmentations were used for CNN training, but note that there's a cleaned-up version of these data below, and it was the cleaned-up version that was used for the paper experiments * `ndpifile_row_YYYYYY_col_XXXXXX[.tif/.xcf/.svg]`: * **ndpifile:** name of the whole slide file (e.g. `KLF14-B6NTAC 36.1c PAT 98-16 C1 - 2016-02-11 10.45.00`) * **row_YYYYYY:** Y-coordinate of the top-left corner of the sampling window, in pixels * **col_XXXXXX:** X-coordinate of the top-left corner of the sampling window, in pixels * **.tif:** TIFF file with the histology sampling window * **.xcf:** Gimp file with the histology and hand-traced contours (the contours were drawn in Gimp) * **.svg:** SVG (Scalable Vector Graphics) that contains the hand-traced contours in the XCF file * `klf14_b6ntac_training_v2`: Same as `klf14_b6ntac_training`, but the hand-traced data set was cleaned up to remove small contours of dubious cells, or cells that are fully overlapped by others * `klf14_b6ntac_training_non_overlap`: Directory with intermediate images to train the networks. These images are generated by script [`klf14_b6ntac_training_non_overlap`](https://github.com/MRC-Harwell/cytometer/blob/main/scripts/klf14_b6ntac_exp_0077_generate_non_overlap_training_images.py) * `klf14_b6ntac_training_augmented`: Directory with intermediate images used to train the networks (using augmentation to reduce overfitting). These images are generated by script [`klf14_b6ntac_exp_0078_generate_augmented_training_images.py`](https://github.com/MRC-Harwell/cytometer/blob/main/scripts/klf14_b6ntac_exp_0078_generate_augmented_training_images.py) * `klf14_b6ntac_seg`: Deprecated. Directory to store whole slide coarse segmentations in old experiments (e.g. `klf14_b6ntac_exp_0076_generate_training_images.py`). Of little interest for most users * `klf14_b6ntac_results`: Deprecated. Directory to store miscellanea output from some experiments. Of little interest for most users ## Casero et al. (2021). "Klf14 mouse white adipose tissue histology DeepZoom files and AIDA annotations for visualisation of DeepCytometer white adipocyte segmentations" (doi: 10.5281/zenodo.5149005) ### `aida_data_Klf14_v8_images.7z` (16.9 GB) Histology images converted to DeepZoom so that they can be visualised with [AIDA](https://github.com/alanaberdeen/AIDA). To use this, decompress this file and put the resulting `images` directory in your `AIDA/dist/data/` directory. ### `aida_data_Klf14_v8_annotations.7z` (18 GB) White adipocyte segmentations in AIDA annotation `.json` files (one contour per cell, one file per whole slide). Each slide has the following files: * `SLIDENAME.json`: Soft link to the annotations file that we want to associate to slide `SLIDENAME.ndpi`, e.g. `SLIDENAME` = `KLF14-B6NTAC-PAT-39.2d 454-16 B1 - 2016-03-17 12.16.06` * `SLIDENAME.lock`: Empty file used to tell the pipeline that `SLIDENAME.ndpi` has already been processed or is being currently processed * `SLIDENAME_coarse_mask.npz`: File with the coarse tissue segmentation of `SLIDENAME.ndpi` and the internal state of the pipeline (execution times, steps, etc) * `SLIDENAME_exp_0106_auto.json`: Annotations (all segmentations without filtering from the Auto algorithm, i.e. segmentation without object overlap). Contours are grouped by the tile they were processed in * `SLIDENAME_exp_0106_auto_aggregated.json`: Filtered annotations (non-white adipocytes removed) of the Auto algorithm. All contours aggregated into a single tile * `SLIDENAME_exp_0106_corrected.json`: Annotations (all segmentations without filtering from the Corrected algorithm, i.e. segmentation with object overlap). Contours are grouped by the tile they were processed in * `SLIDENAME_exp_0106_corrected_aggregated.json`: Filtered annotations (non-white adipocytes removed) of the Corrected algorithm. All contours aggregated into a single tile To use this, decompress this file and put the resulting `annotations` directory in your `AIDA/dist/data/` directory. 
 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://zenodo.org/record/5137432
 
Title Klf14 mouse white adipose tissue histology DeepZoom files and AIDA annotations for visualisation of DeepCytometer white adipocyte segmentations 
Description Latest description of this data set: Data.md at cytometer project
# Publications related to the data The data associated to the DeepCytometer project (https://github.com/MRC-Harwell/cytometer) is available from Zenodo (doi: 10.5281/zenodo.5137433 and 10.5281/zenodo.5149005). The histology and mouse measures were generated as part of the Small et al. 2018 study: > Small et al. "Regulatory variants at KLF14 influence type 2 diabetes risk via a female-specific effect on adipocyte size and body composition". Nature Genetics, 50:572-580, 2018. The hand traced data set, colour maps, and automatic segmentations were generated for the Casero et al. 2021 paper: > Casero et al. "Phenotyping of Klf14 mouse white adipose tissue enabled by whole slide segmentation with deep neural networks". bioRxiv, 2021. doi: [10.1101/2021.06.03.444997](https://www.biorxiv.org/content/10.1101/2021.06.03.444997v1.full). # Data protocols ## Histology and laboratory measures To develop and evaluate our methods we used Klf14tm1(KOMP)Vlcg C57BL/6NTac (B6NTac) mice tissue samples and additional data generated as part of the Small et al. 2018 study(Small et al. 2018). It should be noted that the single exon Klf14 gene is imprinted and only expressed from the maternally inherited allele(Parker-Katiraee et al. 2007). This was taken into account by (Small et al. 2018) by crossing a Het parent with a WT parent, so that each offspring inherited a WT allele from the WT parent, and the Klf14 gene knockout or a WT allele from the other parent (from the father, PAT, or the mother, MAT). We also take Klf14 imprinting into account by using as controls the PAT mice and comparing them to the MAT WT and MAT Het (or functional KO, FKO) mice. We used a total of 76 Klf14-B6NTac mice (nfemale=nmale=38), of which 20 mice from the Control and FKO groups were used for training and testing the DeepCytometer pipeline, as well as the hand traced population experiment (summary in Table MICE). The histopathology screen involved fixing, processing and embedding in wax, sectioning and staining with Hematoxylin and Eosin (H&E) both inguinal subcutaneous and gonadal adipose depots. For paraffin-embedded sections, all samples were fixed in 10% neutral buffered formalin (Surgipath) for at least 48 hours at RT and processed using an Excelsior™ AS Tissue Processor (Thermo Scientific). Samples were embedded in molten paraffin wax and 8 µm sections were cut through the respective depots using a Finesse™ ME+ microtome (Thermo Scientific). Sampling was conducted at 2sxns per slide, 3 slides per depot block onto simultaneous charged slides, stained with haematoxylin Gill 3 and eosin (Thermo scientific) and scanned using an NDP NanoZoomer Digital pathology scanner (RS C10730 Series; Hamamatsu). Body weight (BW) and depot weight (DW) were measured with Satorius BAL7000 scales. ## White adipose tissue segmentation For cell area quantification, we applied DeepCytometer v8 to 75 inguinal subcutaneous and 72 gonadal whole histology slides with DeepCytometer (with the Corrected method), including the 20 slides sampled for the hand-traced data set, corresponding to 73 females and 74 males, to produce 2,560,067 subcutaneous and 2,467,686 gonadal cells (on average, 34,134 and 34,273 cells per slide, respectively). Full segmentation of all whole slides was performed with script [klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py](https://github.com/MRC-Harwell/cytometer/blob/39358ed1d79df07d1d522b98728c7efd745513f7/scripts/klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py). In this case, the segmentation contours were grouped by tiles in the output AIDA annotation `.json` file (one contour per cell, one file per slide). Non-white adipocyte contours were filtered out, and white adipocyte contours were aggregated into an AIDA annotation `.json` file with a single tile with script [klf14_b6ntac_exp_0106_annotations_postprocessing_v8.py](https://github.com/MRC-Harwell/cytometer/blob/39358ed1d79df07d1d522b98728c7efd745513f7/scripts/klf14_b6ntac_exp_0106_annotations_postprocessing_v8.py) (one contour per cell, one file per slide). # List of directories and files ## Casero et al. (2021) "DeepCytometer pipeline parameter files, Klf14 mouse white adipose tissue histology and hand-traced training contours" (doi: 10.5281/zenodo.5137433) ### `deepcytometer_pipeline_v8.zip` (60.6 MB) Weights, colourmaps, etc. necessary to run the pipeline (v8, with mode colour correction). This is the version of the pipeline described in the paper. There are 10 weight files per convolutional neural network (CNN), corresponding to 10-fold cross-validation * `klf14_b6ntac_exp_0086_cnn_dmap_model_fold_[0..9].h5`: Keras weights for the **EDT CNN** (Histology to Euclidean Distance Transform regression) * `klf14_b6ntac_exp_0089_cnn_segmentation_correction_overlapping_scaled_contours_model_fold_[0..9].h5`: Keras weights for the **Correction CNN** (Segmentation Correction regression) * `klf14_b6ntac_exp_0091_cnn_contour_after_dmap_model_fold_[0..9].h5`: Keras weights for the **Contour CNN** (EDT to Contour detection) * `klf14_b6ntac_exp_0095_cnn_tissue_classifier_fcn_model_fold_[0..9].h5`: Keras weights for the **Tissue CNN** (Pixel-wise tissue classifier) * `klf14_b6ntac_exp_0094_generate_extra_training_images.pickle`: training dataset description * **'file_list'**: list of SVG files with hand-traced contours for network training. Each SVG file has a corresponding TIFF file with the histology used for segmentation * **'idx_test'**: 10 lists with file indices for testing in 10-fold cross-validation * **'idx_train'**: 10 lists with file indices for training in 10-fold cross-validation * **'fold_seed'**: seed number used for the random number generator to assign file indices to folds * `klf14_b6ntac_exp_0098_filename_area2quantile.npz`: quantile colour maps calculated in `klf14_b6ntac_exp_0098_full_slide_size_analysis_v7.py` using the whole Klf14 data set with v7 of the pipeline, and used in earlier experiments, including some where v8 of the pipeline was used for segmentation. * `klf14_b6ntac_exp_0106_filename_area2quantile_v8.npz`: quantile colour maps calculated in `klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py` using the whole Klf14 data set with v8 of the pipeline, and used in later experiments. * `klf14_training_colour_histogram.npz`: statistics from Klf14 histology images to be used in colour correction * **'xbins_edge'**, **'xbins'**: edges and centres of the bins used for histogram calculations * **'hist_r_q1'**, **'hist_r_q2'**, **'hist_r_q3'** * **'hist_g_q1'**, **'hist_g_q2'**, **'hist_g_q3'** * **'hist_b_q1'**, **'hist_b_q2'**, **'hist_b_q3'**: density quartiles (Q1, Q2, Q3) for RGB channels for each bin the histogram * **'mode_r'**, **'mode_g'**, **'mode_b'**: modes for RGB channels (this corresponds to the most typical background colour in the histology images) * **'mean_l'**, **'mean_a'**, **'mean_b'**: mean intensity for L*a*b channels of the image * **'std_l'**, **'std_a'**, **'std_b'**: intensity standard deviations for L*a*b channels of the image * `klf14_exp_0112_training_colour_histogram.npz`: other statistics from Klf14 histology images to be used in colour correction * **'p'**: vector of quantile values used in ECDF calculations * **'val_r_klf14'**, **'val_g_klf14'**, **'val_b_klf14'**: all intensity values for the RGB channels of Klf14 training images that contain at least a white adipocyte * **'f_ecdf_to_val_r_klf14'**, **'f_ecdf_to_val_g_klf14'**, **'f_ecdf_to_val_b_klf14'**: linear interpolation function that maps ECDF quantiles to intensity values in the Klf14 training data set. These functions can be used together with intensity->quantile interpolation functions calculated for a new histology image to perform histogram matching colour correction * **'mean_klf14'**, **'std_klf14'**: mean and standard deviation of the **'val_r_klf14'**, **'val_g_klf14'**, **'val_b_klf14'** vectors There are also weight files for the pipeline trained with all the data, instead of the 10-fold cross-validation partition. These were not used for the paper, but could be useful for future experiments * `klf14_b6ntac_exp_0101_cnn_dmap_model.h5`: Keras weights for the **EDT CNN** (Histology to Euclidean Distance Transform regression) * `klf14_b6ntac_exp_0104_cnn_segmentation_correction_overlapping_scaled_contours_model.h5`: Keras weights for the **Correction CNN** (Segmentation Correction regression) * `klf14_b6ntac_exp_0102_cnn_contour_after_dmap_model.h5`: Keras weights for the **Contour CNN** (EDT to Contour detection) * `klf14_b6ntac_exp_0103_cnn_tissue_classifier_fcn_model.h5`: Keras weights for the **Tissue CNN** (Pixel-wise tissue classifier) ### `histology.7z` (29.1 GB) 165 H&E histology whole slides from Hamamatsu scanner (`.ndpi`). ### `klf14.7z` (2.3 GB) Mice metadata, training/testing data sets for the pipeline, intermediate files created during training, and neural network weights for multiple experiments. * `klf14_b6ntac_meta_info.csv`: Klf14 mice metadata * **Animal Identifier**, **id:** unique ID for each mouse * **ko_parent:** heterozygous parent of origin for the KO allele (father, PAT or mother, MAT) * **sex:** female or male * **genotype:** wild type (KLF14-KO:WT) or heterozygous (KLF14-KO:Het) * **BW:** body weight (g) * **SC:** subcutaneous depot weight (g) * **gWAT:** gonadal depot weight (g) * **Liver:** livel weight (g) * **cull_age:** age at time of culling (days) * **BW_alive:** body weight measured before culling * **BW_alive_date:** age at time of BW_alive measure * **mother:** unique ID for mouse's mother * **mother_genotype:** mouse's mother genotype * `klf14_b6ntac_training`: Directory with hand-traced segmentations of training histology windows. 131 windows sampled from 20 whole slides, plus hand-traced contours that were used for training DeepCytometer and compute population distributions. These segmentations were used for CNN training, but note that there's a cleaned-up version of these data below, and it was the cleaned-up version that was used for the paper experiments * `ndpifile_row_YYYYYY_col_XXXXXX[.tif/.xcf/.svg]`: * **ndpifile:** name of the whole slide file (e.g. `KLF14-B6NTAC 36.1c PAT 98-16 C1 - 2016-02-11 10.45.00`) * **row_YYYYYY:** Y-coordinate of the top-left corner of the sampling window, in pixels * **col_XXXXXX:** X-coordinate of the top-left corner of the sampling window, in pixels * **.tif:** TIFF file with the histology sampling window * **.xcf:** Gimp file with the histology and hand-traced contours (the contours were drawn in Gimp) * **.svg:** SVG (Scalable Vector Graphics) that contains the hand-traced contours in the XCF file * `klf14_b6ntac_training_v2`: Same as `klf14_b6ntac_training`, but the hand-traced data set was cleaned up to remove small contours of dubious cells, or cells that are fully overlapped by others * `klf14_b6ntac_training_non_overlap`: Directory with intermediate images to train the networks. These images are generated by script [`klf14_b6ntac_training_non_overlap`](https://github.com/MRC-Harwell/cytometer/blob/main/scripts/klf14_b6ntac_exp_0077_generate_non_overlap_training_images.py) * `klf14_b6ntac_training_augmented`: Directory with intermediate images used to train the networks (using augmentation to reduce overfitting). These images are generated by script [`klf14_b6ntac_exp_0078_generate_augmented_training_images.py`](https://github.com/MRC-Harwell/cytometer/blob/main/scripts/klf14_b6ntac_exp_0078_generate_augmented_training_images.py) * `klf14_b6ntac_seg`: Deprecated. Directory to store whole slide coarse segmentations in old experiments (e.g. `klf14_b6ntac_exp_0076_generate_training_images.py`). Of little interest for most users * `klf14_b6ntac_results`: Deprecated. Directory to store miscellanea output from some experiments. Of little interest for most users ## Casero et al. (2021). "Klf14 mouse white adipose tissue histology DeepZoom files and AIDA annotations for visualisation of DeepCytometer white adipocyte segmentations" (doi: 10.5281/zenodo.5149005) ### `aida_data_Klf14_v8_images.7z` (16.9 GB) Histology images converted to DeepZoom so that they can be visualised with [AIDA](https://github.com/alanaberdeen/AIDA). To use this, decompress this file and put the resulting `images` directory in your `AIDA/dist/data/` directory. ### `aida_data_Klf14_v8_annotations.7z` (18 GB) White adipocyte segmentations in AIDA annotation `.json` files (one contour per cell, one file per whole slide). Each slide has the following files: * `SLIDENAME.json`: Soft link to the annotations file that we want to associate to slide `SLIDENAME.ndpi`, e.g. `SLIDENAME` = `KLF14-B6NTAC-PAT-39.2d 454-16 B1 - 2016-03-17 12.16.06` * `SLIDENAME.lock`: Empty file used to tell the pipeline that `SLIDENAME.ndpi` has already been processed or is being currently processed * `SLIDENAME_coarse_mask.npz`: File with the coarse tissue segmentation of `SLIDENAME.ndpi` and the internal state of the pipeline (execution times, steps, etc) * `SLIDENAME_exp_0106_auto.json`: Annotations (all segmentations without filtering from the Auto algorithm, i.e. segmentation without object overlap). Contours are grouped by the tile they were processed in * `SLIDENAME_exp_0106_auto_aggregated.json`: Filtered annotations (non-white adipocytes removed) of the Auto algorithm. All contours aggregated into a single tile * `SLIDENAME_exp_0106_corrected.json`: Annotations (all segmentations without filtering from the Corrected algorithm, i.e. segmentation with object overlap). Contours are grouped by the tile they were processed in * `SLIDENAME_exp_0106_corrected_aggregated.json`: Filtered annotations (non-white adipocytes removed) of the Corrected algorithm. All contours aggregated into a single tile To use this, decompress this file and put the resulting `annotations` directory in your `AIDA/dist/data/` directory.
 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://zenodo.org/record/5149004
 
Description Adipocyte biology (Claussnitzer) 
Organisation Broad Institute
Country United States 
Sector Charity/Non Profit 
PI Contribution T2D association data
Collaborator Contribution Access to adipocyte genomic data
Impact none to date
Start Year 2018
 
Description Adipose tissue expression 
Organisation Icahn School of Medicine at Mount Sinai
Country United States 
Sector Academic/University 
PI Contribution Collabroation over KLF14 knockout mice
Collaborator Contribution Information from human and murine expression studies
Impact Information from their proprietary data base
Start Year 2012
 
Description Bo Ahren collaboration 
Organisation Lund University
Country Sweden 
Sector Academic/University 
PI Contribution This relates to responding to reviewer comments on our Nature submission
Collaborator Contribution Sharing of data from adipose tissue biopsies made by Bo Ahren's group
Impact Sharing of data on adipocyte size from adipose biopsies
Start Year 2015
 
Description METSIM 
Organisation University of California, Los Angeles (UCLA)
Department School of Medicine UCLA
Country United States 
Sector Academic/University 
PI Contribution We collaborate on understanding the mechanisms underlying KLF14 effects on T2D
Collaborator Contribution They have contributed data from their studies
Impact Collaborative research
Start Year 2014
 
Description METSIM 
Organisation University of Eastern Finland
Country Finland 
Sector Academic/University 
PI Contribution We collaborate on understanding the mechanisms underlying KLF14 effects on T2D
Collaborator Contribution They have contributed data from their studies
Impact Collaborative research
Start Year 2014
 
Description Nobrega collaboration 
Organisation University of Chicago
Country United States 
Sector Academic/University 
PI Contribution This is a collaboration to furnish additional data to support revision of our main manuscript which has been reviewed at Nature.
Collaborator Contribution KLF14 ChipSeq
Impact Marcelo Nobrega is completing studies that will add to our response to reviewers
Start Year 2016
 
Description STEMBANCC 
Organisation Eli Lilly & Company Ltd
Country United Kingdom 
Sector Private 
PI Contribution This is a new IMI collaboration (total funding 53M€ from IMI and in kind) to Oxford and 25 other academic institutions.
Collaborator Contribution We will provide clinical material (from patients), characterise stem cell derived tissues derived and manage the diabetes work package
Impact Nil to date
Start Year 2012
 
Description STEMBANCC 
Organisation F. Hoffmann-La Roche AG
Country Global 
Sector Private 
PI Contribution This is a new IMI collaboration (total funding 53M€ from IMI and in kind) to Oxford and 25 other academic institutions.
Collaborator Contribution We will provide clinical material (from patients), characterise stem cell derived tissues derived and manage the diabetes work package
Impact Nil to date
Start Year 2012
 
Description STEMBANCC 
Organisation Novo Nordisk
Country Denmark 
Sector Private 
PI Contribution This is a new IMI collaboration (total funding 53M€ from IMI and in kind) to Oxford and 25 other academic institutions.
Collaborator Contribution We will provide clinical material (from patients), characterise stem cell derived tissues derived and manage the diabetes work package
Impact Nil to date
Start Year 2012
 
Description STEMBANCC 
Organisation Sanofi
Department Aventis
Country France 
Sector Private 
PI Contribution This is a new IMI collaboration (total funding 53M€ from IMI and in kind) to Oxford and 25 other academic institutions.
Collaborator Contribution We will provide clinical material (from patients), characterise stem cell derived tissues derived and manage the diabetes work package
Impact Nil to date
Start Year 2012
 
Description The MUTHER (Multiple Tissues for Human Expression Resource) Consortium 
Organisation King's College Hospital
Country United Kingdom 
Sector Hospitals 
PI Contribution We have contributed to sample preparation, database design and to data analysis
Collaborator Contribution it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits.
Impact We have assembled one of the largest tissue expression banks available and are instituting detailed analysis. Two papers have already been published in high profile journals (Nica, PLoS GENETICS 2010; Small Nature genetics 2011) and further publications are in preparation as of Oct 2011.
Start Year 2007
 
Description The MUTHER (Multiple Tissues for Human Expression Resource) Consortium 
Organisation The Wellcome Trust Sanger Institute
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We have contributed to sample preparation, database design and to data analysis
Collaborator Contribution it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits.
Impact We have assembled one of the largest tissue expression banks available and are instituting detailed analysis. Two papers have already been published in high profile journals (Nica, PLoS GENETICS 2010; Small Nature genetics 2011) and further publications are in preparation as of Oct 2011.
Start Year 2007
 
Description The MUTHER (Multiple Tissues for Human Expression Resource) Consortium 
Organisation University of Geneva
Country Switzerland 
Sector Academic/University 
PI Contribution We have contributed to sample preparation, database design and to data analysis
Collaborator Contribution it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits.
Impact We have assembled one of the largest tissue expression banks available and are instituting detailed analysis. Two papers have already been published in high profile journals (Nica, PLoS GENETICS 2010; Small Nature genetics 2011) and further publications are in preparation as of Oct 2011.
Start Year 2007
 
Description UCLA collaboration on KLF14 
Organisation University of California, Los Angeles (UCLA)
Department School of Medicine UCLA
Country United States 
Sector Academic/University 
PI Contribution Sharing of data regarding role of KLF14
Collaborator Contribution Sharing of data regarding role of KLF14
Impact None so far
Start Year 2013
 
Description UPenn collaboration re KLF14 
Organisation University of Pennsylvania
Country United States 
Sector Academic/University 
PI Contribution Sharing of data regarding role of KLF14
Collaborator Contribution Sharing of data regarding role of KLF14
Impact None as yet
Start Year 2015
 
Description American Society of Nephrology, meeting, New Orleans 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop and talk at ASN 2017
Year(s) Of Engagement Activity 2017
 
Description Conference on personalised nutrition, Shanghai China 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact International conference on personalised nutrition organised by Chinese colleagues
Year(s) Of Engagement Activity 2017
 
Description East Meets West Conference Hong Kong 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Lecture, debates, discussions about diabetes genetics in East Asia and beyond
Year(s) Of Engagement Activity 2016
 
Description Genomics for Clinicians meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I gave a presentation at this 4 day workshop on the value of diabetes genetics in genomic medicine
Year(s) Of Engagement Activity 2017
 
Description International Diabetes federation, Abu DHabi 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact PResentation at IDF meeting
Year(s) Of Engagement Activity 2017
 
Description Leena peltonen School of Human Genetics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Type Of Presentation Workshop Facilitator
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I co-organise this meeting which is attended by 20+ PhD students each year. They present to, and interact with 20+ senior academics providing a unique environment

Feedback from participants overwhelmingly positive
Year(s) Of Engagement Activity 2009,2010,2011,2012,2013,2014,2015,2016
 
Description Personalised genomics 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Public debate; conference presentations

Improved understanding of the issues by the audience
Year(s) Of Engagement Activity 2009,2012,2013,2015,2016
 
Description Scientific presentations and seminars 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Type Of Presentation Keynote/Invited Speaker
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Data from these two grants has been presented at a large number of international meetings including American Diabetes Association, American Soc Human Genetics, Genomics of Common Diseases and other meetings (approx 20 a year)

Large audiences
Year(s) Of Engagement Activity 2012,2013,2015,2016
 
Description chinese diabetes society, Chongqing 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presentation to several thousand people at Chinese DIabetes Association
Year(s) Of Engagement Activity 2017