KLF14, adipose dysfunction, insulin resistance and type 2 diabetes: from genetic discovery to biological mechanisms and translation.

Lead Research Organisation: University of Oxford

Department Name: RDM OCDEM

Abstract

Type 2 diabetes is a major and growing cause of illness and death across the world. However, incomplete understanding of the processes involved in the development of this condition acts as a barrier to the development of better ways for treating and preventing the disease.

In the past few years, collaborative efforts in human genetic discovery have identified over fifty positions in the human genome where individual sequence variation is associated with risk of type 2 diabetes. In principle, each of these genetic "signals" holds important clues to the mechanisms that are responsible for the maintenance of normal metabolic health. However, in most of these regions, we have yet to define the specific genetic variants responsible for the signal, or the particular genes through which the effects on diabetes-risk are mediated. As a result, the pace of biological insights has lagged behind that of genetic discovery. A key challenge for the field is to develop strategies that enable genetic signals such as these to be mined for the biological clues they can provide.

We recently demonstrated that one of these diabetes signals maps near a gene named KLF14, and that this effect on risk of diabetes is driven by a reduced ability of insulin-sensitive tissues, including muscle and liver, to respond to insulin. We have also shown that, in fat tissue, the same variants near KLF14 have widespread effects on the levels of expression of a wide range of genes, including KLF14 itself. These data are consistent with a model whereby KLF14 acts as a major regulator of events in fat tissue, with these alterations in the levels of KLF14 leading, through as yet unspecified mechanisms, to peripheral insulin resistance and type 2 diabetes.

The overall aim of this research is to define the molecular, cellular and physiological mechanisms which are responsible for the relationship between KLF14 and type 2 diabetes. To do this, we will pursue a number of complementary research strategies which include studies of human genetics and physiology and cellular studies in fat cells.

More specifically, our work aims to:
(a) define the specific sequence variants that are responsible for all of these effects, and how they influence levels of KLF14 within fat cells;
(b) characterise the suite of genes that are turned on and off by KLF14, and understand the consequences of those secondary changes on the function of fat tissue; and
(c) understand how it is that these KLF14-dependent changes in fat tissue lead to changes in the ability of remote tissues (such as muscle and liver) to respond to insulin.

We expect that, by focusing on the role of KLF14 and the ways in which it leads to an increased risk of diabetes, we will gain valuable, generic insights into the mechanisms whereby events in fat have widespread metabolic effects in other tissues. An answer to this question would help us improve our understanding of the connections between obesity and diabetes, an issue of supreme importance given that much of the increase in diabetes prevalence around the world is driven by changes in the amount, distribution and function of fat.

The research provides an opportunity to capitalise on recent discoveries in human genetics to advance understanding of disease biology. In the application, we explain how the biological information generated by this project can be exploited in directly translational studies to define novel approaches for treating, preventing, diagnosing and monitoring type 2 diabetes.

Technical Summary

This major objective of this proposal is to define the molecular, cellular and physiological mechanisms responsible for the association between variants upstream of KLF14 and type 2 diabetes. Our recent work has demonstrated that KLF14 is a master regulator of expression in adipose tissue, and implicate this KLF14-regulated transcriptional network in the genesis of insulin resistance in remote tissues such as muscle and liver.

The research will seek to understand the causes and consequences of altered KLF14 expression using a variety of complementary approaches that include:
(a) human genetics: resequencing, fine-mapping and epigenomic analyses;
(b) integrative physiology: detailed metabolic analyses and genomic studies and their relationship to genotype; and
(c) cellular studies: cellular phenotyping and genomic studies in adipocytes following knockdown and overexpression of KLF14 and selected trans-genes.

The work has four main aims:
(a) To characterise the molecular, cellular and physiological consequences of altered KLF14 expression;
(b) To identify the trans-genes mediating the effect of KLF14 on insulin resistance and type 2 diabetes;
(c) To define the cellular and physiological mechanisms whereby altered KLF14 expression in adipose tissue leads to generalised insulin resistance and T2D;
(d) To characterise the mechanisms whereby common variants upstream of KLF14 influence KLF14 expression.

Our research will, by elucidating the mechanisms linking KLF14 sequence variation to diabetes predisposition, provide valuable, generic, insights into the consequences of adipocyte dysfunction on whole-body physiology. The research can be expected to define novel, causally-validated, targets that are substrates for therapeutic and biomarker development, and the application sets out some of the strategies that we plan to deploy to support those translational goals.

Planned Impact

The principal beneficiaries of the research will be:
a) academics in the fields outlined in the "academic beneficiaries" section;
b) industry and biotechnology companies, in a position to exploit the improved biological understanding we seek to provide to develop novel products and services (see below);
c) the public sector (NHS, policy-makers), in the event that the research generates translational advances that provide more cost-effective means of treating and/or preventing diabetes and other insulin-resistance related diseases (e.g. hyperlipidaemias);
d) the wider public, if those translational advances provide more acceptable, more effective strategies for the treatment and prevention of those conditions.

The academic benefits will be manifest through:
a) the generation of new knowledge related to diabetes pathogenesis, which has the potential to contribute to amelioration of the social, economic and personal costs of the "epidemic" of global diabetes;
b) the development of research models and molecules of value for academic research;
c) bolstering of research in human integrative physiology (given concerns about declining expertise);
d) improved training of researchers in the specific areas of research activity, and in the development of cross-disciplinary expertise.

The broader economic and social impact will be manifest through:
a) economic benefits to pharma and biotechnology companies (including "spin-outs" with potential for attracting "inwards" investment) able to exploit actionable translational opportunities with respect to the development of novel therapeutic approaches (building on targets we validate) or clinically-useful biomarkers (building on candidates we identify). Given the scale of the global problem and the inadequacy of current therapeutic and preventative options, the opportunities for wealth creation are substantial;
b) improved effectiveness of public services if the biological insights result in better ways of treating and preventing T2D and related conditions (novel treatments, better diagnostics, improved strategies for stratifying risk and response to interventions);
c) transformation of public policy if the research leads, over time, to improved public health strategies for the prevention of T2D;
d) improved health outcomes (less diabetes-related morbidity and mortality, fewer diabetes complications) if the work leads to effective clinical translation, resulting in further personal, social and economic benefits.

It is of course important to be realistic about the timelines for effective clinical translation: too often expectations in this regard are unrealistic. In practice, the time from "new biology" to "novel treatment" involves years of biological validation, target characterisation, lead molecule optimisation, and clinical evaluation. As is well-known, substantial attrition is typical at each stage. On the positive side:
a) we have recently shown (with the validation of novel biomarkers for monogenic forms of diabetes) that it is possible to move rapidly from genetic discovery to clinical utility, at least where biomarkers are concerned;
b) the massive unmet clinical need and the scale of the global problem with respect to diabetes and insulin resistance will support investments that would not be economic for other diseases;
c) the origins of our research in genetic association data means that the benefits of modulation of the KLF14 network in man have intrinsic causal validation: they also make it possible to explore the whole body effects of such modulation (including potential "on-target" side-effects) through human genetic and physiological studies, helping thereby to minimise attrition;
d) we are well-equipped, via the Target Discovery Institute in Oxford, to initiate high-throughput screens for potential small molecule modulators of this pathway, and to do so in parallel with some of the biological validation described here, thereby expediting progress.

Funded Value:

£605,221

Funded Period:

Sep 12 - Aug 15

Funder:

MRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

MR/J010642/1

Principal Investigator:

Mark Maccarthy

Health Category:

Unclassified

Organisations

People	ORCID iD
Mark Maccarthy (Principal Investigator)
Roger Cox (Co-Investigator)
Fredrik Karpe (Co-Investigator)
Kerrin Shannon Small (Co-Investigator)	http://orcid.org/0000-0003-4566-0005

Publications

Author Name

Title Publication Date Published

10 25 50

Barroso I (2019) The Genetic Basis of Metabolic Disease. in Cell

Civelek M (2017) Genetic Regulation of Adipose Gene Expression and Cardio-Metabolic Traits. in American journal of human genetics

Loh N (2020) RSPO3 impacts body fat distribution and regulates adipose cell biology in vitro in Nature Communications

Lu Y (2016) New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk. in Nature communications

Small KS (2018) Regulatory variants at KLF14 influence type 2 diabetes risk via a female-specific effect on adipocyte size and body composition. in Nature genetics

Visscher PM (2017) 10 Years of GWAS Discovery: Biology, Function, and Translation. in American journal of human genetics

Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration
Engagement Activities


Description	Identification and functional evaluation of genetic and epigenetic determinants of human fat distribution; investigations to understand the cardio-protective effect of lower body adiposity.
Amount	£779,060 (GBP)
Funding ID	RG/17/1/32663
Organisation	British Heart Foundation (BHF)
Sector	Charity/Non Profit
Country	United Kingdom
Start	06/2017
End	06/2023


Description	Innovative Medicines Initiative BEATDKD
Amount	€ 15,000,000 (EUR)
Funding ID	BEATDKD
Organisation	European Commission
Department	Innovative Medicines Initiative (IMI)
Sector	Public
Country	Belgium
Start	08/2017
End	08/2022


Description	Innovative Medicines Initiative RHAPSODY
Amount	€ 8,000,000 (EUR)
Funding ID	115881
Organisation	European Commission
Department	Innovative Medicines Initiative (IMI)
Sector	Public
Country	Belgium
Start	03/2016
End	03/2020


Description	MRC Experimental Challenges Grant
Amount	£2,400,000 (GBP)
Organisation	Medical Research Council (MRC)
Sector	Public
Country	United Kingdom
Start	08/2014
End	08/2018


Description	NIH R01
Amount	$1,900,000 (USD)
Funding ID	MH101814
Organisation	National Institutes of Health (NIH)
Department	National Institute of Mental Health (NIMH)
Sector	Public
Country	United States
Start	08/2013
End	08/2016


Description	NIH R01
Amount	$2,400,000 (USD)
Funding ID	DK098032
Organisation	National Institutes of Health (NIH)
Department	National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Sector	Public
Country	United States
Start	08/2012
End	08/2016


Description	NovoNordisk Funden Immunometabolism
Amount	12,000,000 kr. (DKK)
Funding ID	TRiiC
Organisation	Novo Nordisk Foundation
Sector	Charity/Non Profit
Country	Denmark
Start	04/2016
End	04/2020


Description	Program grant
Amount	£1,700,000 (GBP)
Organisation	Medical Research Council (MRC)
Sector	Public
Country	United Kingdom
Start	02/2015
End	01/2018


Description	Project Grant- MRC KLF14
Amount	£138,271 (GBP)
Funding ID	MR/J0106421/1
Organisation	Medical Research Council (MRC)
Sector	Public
Country	United Kingdom
Start	08/2012
End	01/2015


Description	RFP2 call for available datasets
Amount	$2,400,000 (USD)
Organisation	Foundation for the National Institutes of Health (FNIH)
Sector	Charity/Non Profit
Country	United States
Start	03/2016
End	08/2017


Description	U01
Amount	$1,800,000 (USD)
Funding ID	DK105535
Organisation	National Institutes of Health (NIH)
Department	National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Sector	Public
Country	United States
Start	03/2015
End	03/2019


Description	Wellcome Investigator Award
Amount	£2,250,000 (GBP)
Funding ID	212259/Z/18/Z
Organisation	Wellcome Trust
Sector	Charity/Non Profit
Country	United Kingdom
Start	11/2018
End	10/2024


Description	Wellcome Trust Strategic Award (WTCHG/WIMM)
Amount	£2,400,000 (GBP)
Funding ID	106130/Z/14/Z
Organisation	Wellcome Trust
Sector	Charity/Non Profit
Country	United Kingdom
Start	01/2015
End	12/2018


Title	KLF14 knockout mice
Description	Global KLF14 knockout mouse
Type Of Material	Model of mechanisms or symptoms - mammalian in vivo
Provided To Others?	No
Impact	No recapitulation of the human phenotype in this mouse


Title	DeepCytometer pipeline parameter files, Klf14 mouse white adipose tissue histology and hand-traced training contours
Description	Latest description of this data set: Data.md at cytometer project # Publications related to the data The data associated to the DeepCytometer project (https://github.com/MRC-Harwell/cytometer) is available from Zenodo (doi: 10.5281/zenodo.5137433 and 10.5281/zenodo.5149005). The histology and mouse measures were generated as part of the Small et al. 2018 study: > Small et al. "Regulatory variants at KLF14 influence type 2 diabetes risk via a female-specific effect on adipocyte size and body composition". Nature Genetics, 50:572-580, 2018. The hand traced data set, colour maps, and automatic segmentations were generated for the Casero et al. 2021 paper: > Casero et al. "Phenotyping of Klf14 mouse white adipose tissue enabled by whole slide segmentation with deep neural networks". bioRxiv, 2021. doi: [10.1101/2021.06.03.444997](https://www.biorxiv.org/content/10.1101/2021.06.03.444997v1.full). # Data protocols ## Histology and laboratory measures To develop and evaluate our methods we used Klf14tm1(KOMP)Vlcg C57BL/6NTac (B6NTac) mice tissue samples and additional data generated as part of the Small et al. 2018 study(Small et al. 2018). It should be noted that the single exon Klf14 gene is imprinted and only expressed from the maternally inherited allele(Parker-Katiraee et al. 2007). This was taken into account by (Small et al. 2018) by crossing a Het parent with a WT parent, so that each offspring inherited a WT allele from the WT parent, and the Klf14 gene knockout or a WT allele from the other parent (from the father, PAT, or the mother, MAT). We also take Klf14 imprinting into account by using as controls the PAT mice and comparing them to the MAT WT and MAT Het (or functional KO, FKO) mice. We used a total of 76 Klf14-B6NTac mice (nfemale=nmale=38), of which 20 mice from the Control and FKO groups were used for training and testing the DeepCytometer pipeline, as well as the hand traced population experiment (summary in Table MICE). The histopathology screen involved fixing, processing and embedding in wax, sectioning and staining with Hematoxylin and Eosin (H&E) both inguinal subcutaneous and gonadal adipose depots. For paraffin-embedded sections, all samples were fixed in 10% neutral buffered formalin (Surgipath) for at least 48 hours at RT and processed using an Excelsior™ AS Tissue Processor (Thermo Scientific). Samples were embedded in molten paraffin wax and 8 µm sections were cut through the respective depots using a Finesse™ ME+ microtome (Thermo Scientific). Sampling was conducted at 2sxns per slide, 3 slides per depot block onto simultaneous charged slides, stained with haematoxylin Gill 3 and eosin (Thermo scientific) and scanned using an NDP NanoZoomer Digital pathology scanner (RS C10730 Series; Hamamatsu). Body weight (BW) and depot weight (DW) were measured with Satorius BAL7000 scales. ## White adipose tissue segmentation For cell area quantification, we applied DeepCytometer v8 to 75 inguinal subcutaneous and 72 gonadal whole histology slides with DeepCytometer (with the Corrected method), including the 20 slides sampled for the hand-traced data set, corresponding to 73 females and 74 males, to produce 2,560,067 subcutaneous and 2,467,686 gonadal cells (on average, 34,134 and 34,273 cells per slide, respectively). Full segmentation of all whole slides was performed with script [klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py](https://github.com/MRC-Harwell/cytometer/blob/39358ed1d79df07d1d522b98728c7efd745513f7/scripts/klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py). In this case, the segmentation contours were grouped by tiles in the output AIDA annotation `.json` file (one contour per cell, one file per slide). Non-white adipocyte contours were filtered out, and white adipocyte contours were aggregated into an AIDA annotation `.json` file with a single tile with script [klf14_b6ntac_exp_0106_annotations_postprocessing_v8.py](https://github.com/MRC-Harwell/cytometer/blob/39358ed1d79df07d1d522b98728c7efd745513f7/scripts/klf14_b6ntac_exp_0106_annotations_postprocessing_v8.py) (one contour per cell, one file per slide). # List of directories and files ## Casero et al. (2021) "DeepCytometer pipeline parameter files, Klf14 mouse white adipose tissue histology and hand-traced training contours" (doi: 10.5281/zenodo.5137433) ### `deepcytometer_pipeline_v8.zip` (60.6 MB) Weights, colourmaps, etc. necessary to run the pipeline (v8, with mode colour correction). This is the version of the pipeline described in the paper. There are 10 weight files per convolutional neural network (CNN), corresponding to 10-fold cross-validation * `klf14_b6ntac_exp_0086_cnn_dmap_model_fold_[0..9].h5`: Keras weights for the EDT CNN (Histology to Euclidean Distance Transform regression) * `klf14_b6ntac_exp_0089_cnn_segmentation_correction_overlapping_scaled_contours_model_fold_[0..9].h5`: Keras weights for the Correction CNN (Segmentation Correction regression) * `klf14_b6ntac_exp_0091_cnn_contour_after_dmap_model_fold_[0..9].h5`: Keras weights for the Contour CNN (EDT to Contour detection) * `klf14_b6ntac_exp_0095_cnn_tissue_classifier_fcn_model_fold_[0..9].h5`: Keras weights for the Tissue CNN (Pixel-wise tissue classifier) * `klf14_b6ntac_exp_0094_generate_extra_training_images.pickle`: training dataset description * 'file_list': list of SVG files with hand-traced contours for network training. Each SVG file has a corresponding TIFF file with the histology used for segmentation * 'idx_test': 10 lists with file indices for testing in 10-fold cross-validation * 'idx_train': 10 lists with file indices for training in 10-fold cross-validation * 'fold_seed': seed number used for the random number generator to assign file indices to folds * `klf14_b6ntac_exp_0098_filename_area2quantile.npz`: quantile colour maps calculated in `klf14_b6ntac_exp_0098_full_slide_size_analysis_v7.py` using the whole Klf14 data set with v7 of the pipeline, and used in earlier experiments, including some where v8 of the pipeline was used for segmentation. * `klf14_b6ntac_exp_0106_filename_area2quantile_v8.npz`: quantile colour maps calculated in `klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py` using the whole Klf14 data set with v8 of the pipeline, and used in later experiments. * `klf14_training_colour_histogram.npz`: statistics from Klf14 histology images to be used in colour correction * 'xbins_edge', 'xbins': edges and centres of the bins used for histogram calculations * 'hist_r_q1', 'hist_r_q2', 'hist_r_q3' * 'hist_g_q1', 'hist_g_q2', 'hist_g_q3' * 'hist_b_q1', 'hist_b_q2', 'hist_b_q3': density quartiles (Q1, Q2, Q3) for RGB channels for each bin the histogram * 'mode_r', 'mode_g', 'mode_b': modes for RGB channels (this corresponds to the most typical background colour in the histology images) * 'mean_l', 'mean_a', 'mean_b': mean intensity for Lab channels of the image * 'std_l', 'std_a', 'std_b': intensity standard deviations for Lab channels of the image * `klf14_exp_0112_training_colour_histogram.npz`: other statistics from Klf14 histology images to be used in colour correction * 'p': vector of quantile values used in ECDF calculations * 'val_r_klf14', 'val_g_klf14', 'val_b_klf14': all intensity values for the RGB channels of Klf14 training images that contain at least a white adipocyte * 'f_ecdf_to_val_r_klf14', 'f_ecdf_to_val_g_klf14', 'f_ecdf_to_val_b_klf14': linear interpolation function that maps ECDF quantiles to intensity values in the Klf14 training data set. These functions can be used together with intensity->quantile interpolation functions calculated for a new histology image to perform histogram matching colour correction * 'mean_klf14', 'std_klf14': mean and standard deviation of the 'val_r_klf14', 'val_g_klf14', 'val_b_klf14' vectors There are also weight files for the pipeline trained with all the data, instead of the 10-fold cross-validation partition. These were not used for the paper, but could be useful for future experiments * `klf14_b6ntac_exp_0101_cnn_dmap_model.h5`: Keras weights for the EDT CNN (Histology to Euclidean Distance Transform regression) * `klf14_b6ntac_exp_0104_cnn_segmentation_correction_overlapping_scaled_contours_model.h5`: Keras weights for the Correction CNN (Segmentation Correction regression) * `klf14_b6ntac_exp_0102_cnn_contour_after_dmap_model.h5`: Keras weights for the Contour CNN (EDT to Contour detection) * `klf14_b6ntac_exp_0103_cnn_tissue_classifier_fcn_model.h5`: Keras weights for the Tissue CNN (Pixel-wise tissue classifier) ### `histology.7z` (29.1 GB) 165 H&E histology whole slides from Hamamatsu scanner (`.ndpi`). ### `klf14.7z` (2.3 GB) Mice metadata, training/testing data sets for the pipeline, intermediate files created during training, and neural network weights for multiple experiments. * `klf14_b6ntac_meta_info.csv`: Klf14 mice metadata * Animal Identifier, id: unique ID for each mouse * ko_parent: heterozygous parent of origin for the KO allele (father, PAT or mother, MAT) * sex: female or male * genotype: wild type (KLF14-KO:WT) or heterozygous (KLF14-KO:Het) * BW: body weight (g) * SC: subcutaneous depot weight (g) * gWAT: gonadal depot weight (g) * Liver: livel weight (g) * cull_age: age at time of culling (days) * BW_alive: body weight measured before culling * BW_alive_date: age at time of BW_alive measure * mother: unique ID for mouse's mother * mother_genotype: mouse's mother genotype * `klf14_b6ntac_training`: Directory with hand-traced segmentations of training histology windows. 131 windows sampled from 20 whole slides, plus hand-traced contours that were used for training DeepCytometer and compute population distributions. These segmentations were used for CNN training, but note that there's a cleaned-up version of these data below, and it was the cleaned-up version that was used for the paper experiments * `ndpifile_row_YYYYYY_col_XXXXXX[.tif/.xcf/.svg]`: * ndpifile: name of the whole slide file (e.g. `KLF14-B6NTAC 36.1c PAT 98-16 C1 - 2016-02-11 10.45.00`) * row_YYYYYY: Y-coordinate of the top-left corner of the sampling window, in pixels * col_XXXXXX: X-coordinate of the top-left corner of the sampling window, in pixels * .tif: TIFF file with the histology sampling window * .xcf: Gimp file with the histology and hand-traced contours (the contours were drawn in Gimp) * .svg: SVG (Scalable Vector Graphics) that contains the hand-traced contours in the XCF file * `klf14_b6ntac_training_v2`: Same as `klf14_b6ntac_training`, but the hand-traced data set was cleaned up to remove small contours of dubious cells, or cells that are fully overlapped by others * `klf14_b6ntac_training_non_overlap`: Directory with intermediate images to train the networks. These images are generated by script [`klf14_b6ntac_training_non_overlap`](https://github.com/MRC-Harwell/cytometer/blob/main/scripts/klf14_b6ntac_exp_0077_generate_non_overlap_training_images.py) * `klf14_b6ntac_training_augmented`: Directory with intermediate images used to train the networks (using augmentation to reduce overfitting). These images are generated by script [`klf14_b6ntac_exp_0078_generate_augmented_training_images.py`](https://github.com/MRC-Harwell/cytometer/blob/main/scripts/klf14_b6ntac_exp_0078_generate_augmented_training_images.py) * `klf14_b6ntac_seg`: Deprecated. Directory to store whole slide coarse segmentations in old experiments (e.g. `klf14_b6ntac_exp_0076_generate_training_images.py`). Of little interest for most users * `klf14_b6ntac_results`: Deprecated. Directory to store miscellanea output from some experiments. Of little interest for most users ## Casero et al. (2021). "Klf14 mouse white adipose tissue histology DeepZoom files and AIDA annotations for visualisation of DeepCytometer white adipocyte segmentations" (doi: 10.5281/zenodo.5149005) ### `aida_data_Klf14_v8_images.7z` (16.9 GB) Histology images converted to DeepZoom so that they can be visualised with [AIDA](https://github.com/alanaberdeen/AIDA). To use this, decompress this file and put the resulting `images` directory in your `AIDA/dist/data/` directory. ### `aida_data_Klf14_v8_annotations.7z` (18 GB) White adipocyte segmentations in AIDA annotation `.json` files (one contour per cell, one file per whole slide). Each slide has the following files: * `SLIDENAME.json`: Soft link to the annotations file that we want to associate to slide `SLIDENAME.ndpi`, e.g. `SLIDENAME` = `KLF14-B6NTAC-PAT-39.2d 454-16 B1 - 2016-03-17 12.16.06` * `SLIDENAME.lock`: Empty file used to tell the pipeline that `SLIDENAME.ndpi` has already been processed or is being currently processed * `SLIDENAME_coarse_mask.npz`: File with the coarse tissue segmentation of `SLIDENAME.ndpi` and the internal state of the pipeline (execution times, steps, etc) * `SLIDENAME_exp_0106_auto.json`: Annotations (all segmentations without filtering from the Auto algorithm, i.e. segmentation without object overlap). Contours are grouped by the tile they were processed in * `SLIDENAME_exp_0106_auto_aggregated.json`: Filtered annotations (non-white adipocytes removed) of the Auto algorithm. All contours aggregated into a single tile * `SLIDENAME_exp_0106_corrected.json`: Annotations (all segmentations without filtering from the Corrected algorithm, i.e. segmentation with object overlap). Contours are grouped by the tile they were processed in * `SLIDENAME_exp_0106_corrected_aggregated.json`: Filtered annotations (non-white adipocytes removed) of the Corrected algorithm. All contours aggregated into a single tile To use this, decompress this file and put the resulting `annotations` directory in your `AIDA/dist/data/` directory.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
URL	https://zenodo.org/record/5137432


Title	Klf14 mouse white adipose tissue histology DeepZoom files and AIDA annotations for visualisation of DeepCytometer white adipocyte segmentations
Description	Latest description of this data set: Data.md at cytometer project # Publications related to the data The data associated to the DeepCytometer project (https://github.com/MRC-Harwell/cytometer) is available from Zenodo (doi: 10.5281/zenodo.5137433 and 10.5281/zenodo.5149005). The histology and mouse measures were generated as part of the Small et al. 2018 study: > Small et al. "Regulatory variants at KLF14 influence type 2 diabetes risk via a female-specific effect on adipocyte size and body composition". Nature Genetics, 50:572-580, 2018. The hand traced data set, colour maps, and automatic segmentations were generated for the Casero et al. 2021 paper: > Casero et al. "Phenotyping of Klf14 mouse white adipose tissue enabled by whole slide segmentation with deep neural networks". bioRxiv, 2021. doi: [10.1101/2021.06.03.444997](https://www.biorxiv.org/content/10.1101/2021.06.03.444997v1.full). # Data protocols ## Histology and laboratory measures To develop and evaluate our methods we used Klf14tm1(KOMP)Vlcg C57BL/6NTac (B6NTac) mice tissue samples and additional data generated as part of the Small et al. 2018 study(Small et al. 2018). It should be noted that the single exon Klf14 gene is imprinted and only expressed from the maternally inherited allele(Parker-Katiraee et al. 2007). This was taken into account by (Small et al. 2018) by crossing a Het parent with a WT parent, so that each offspring inherited a WT allele from the WT parent, and the Klf14 gene knockout or a WT allele from the other parent (from the father, PAT, or the mother, MAT). We also take Klf14 imprinting into account by using as controls the PAT mice and comparing them to the MAT WT and MAT Het (or functional KO, FKO) mice. We used a total of 76 Klf14-B6NTac mice (nfemale=nmale=38), of which 20 mice from the Control and FKO groups were used for training and testing the DeepCytometer pipeline, as well as the hand traced population experiment (summary in Table MICE). The histopathology screen involved fixing, processing and embedding in wax, sectioning and staining with Hematoxylin and Eosin (H&E) both inguinal subcutaneous and gonadal adipose depots. For paraffin-embedded sections, all samples were fixed in 10% neutral buffered formalin (Surgipath) for at least 48 hours at RT and processed using an Excelsior™ AS Tissue Processor (Thermo Scientific). Samples were embedded in molten paraffin wax and 8 µm sections were cut through the respective depots using a Finesse™ ME+ microtome (Thermo Scientific). Sampling was conducted at 2sxns per slide, 3 slides per depot block onto simultaneous charged slides, stained with haematoxylin Gill 3 and eosin (Thermo scientific) and scanned using an NDP NanoZoomer Digital pathology scanner (RS C10730 Series; Hamamatsu). Body weight (BW) and depot weight (DW) were measured with Satorius BAL7000 scales. ## White adipose tissue segmentation For cell area quantification, we applied DeepCytometer v8 to 75 inguinal subcutaneous and 72 gonadal whole histology slides with DeepCytometer (with the Corrected method), including the 20 slides sampled for the hand-traced data set, corresponding to 73 females and 74 males, to produce 2,560,067 subcutaneous and 2,467,686 gonadal cells (on average, 34,134 and 34,273 cells per slide, respectively). Full segmentation of all whole slides was performed with script [klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py](https://github.com/MRC-Harwell/cytometer/blob/39358ed1d79df07d1d522b98728c7efd745513f7/scripts/klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py). In this case, the segmentation contours were grouped by tiles in the output AIDA annotation `.json` file (one contour per cell, one file per slide). Non-white adipocyte contours were filtered out, and white adipocyte contours were aggregated into an AIDA annotation `.json` file with a single tile with script [klf14_b6ntac_exp_0106_annotations_postprocessing_v8.py](https://github.com/MRC-Harwell/cytometer/blob/39358ed1d79df07d1d522b98728c7efd745513f7/scripts/klf14_b6ntac_exp_0106_annotations_postprocessing_v8.py) (one contour per cell, one file per slide). # List of directories and files ## Casero et al. (2021) "DeepCytometer pipeline parameter files, Klf14 mouse white adipose tissue histology and hand-traced training contours" (doi: 10.5281/zenodo.5137433) ### `deepcytometer_pipeline_v8.zip` (60.6 MB) Weights, colourmaps, etc. necessary to run the pipeline (v8, with mode colour correction). This is the version of the pipeline described in the paper. There are 10 weight files per convolutional neural network (CNN), corresponding to 10-fold cross-validation * `klf14_b6ntac_exp_0086_cnn_dmap_model_fold_[0..9].h5`: Keras weights for the EDT CNN (Histology to Euclidean Distance Transform regression) * `klf14_b6ntac_exp_0089_cnn_segmentation_correction_overlapping_scaled_contours_model_fold_[0..9].h5`: Keras weights for the Correction CNN (Segmentation Correction regression) * `klf14_b6ntac_exp_0091_cnn_contour_after_dmap_model_fold_[0..9].h5`: Keras weights for the Contour CNN (EDT to Contour detection) * `klf14_b6ntac_exp_0095_cnn_tissue_classifier_fcn_model_fold_[0..9].h5`: Keras weights for the Tissue CNN (Pixel-wise tissue classifier) * `klf14_b6ntac_exp_0094_generate_extra_training_images.pickle`: training dataset description * 'file_list': list of SVG files with hand-traced contours for network training. Each SVG file has a corresponding TIFF file with the histology used for segmentation * 'idx_test': 10 lists with file indices for testing in 10-fold cross-validation * 'idx_train': 10 lists with file indices for training in 10-fold cross-validation * 'fold_seed': seed number used for the random number generator to assign file indices to folds * `klf14_b6ntac_exp_0098_filename_area2quantile.npz`: quantile colour maps calculated in `klf14_b6ntac_exp_0098_full_slide_size_analysis_v7.py` using the whole Klf14 data set with v7 of the pipeline, and used in earlier experiments, including some where v8 of the pipeline was used for segmentation. * `klf14_b6ntac_exp_0106_filename_area2quantile_v8.npz`: quantile colour maps calculated in `klf14_b6ntac_exp_0106_full_slide_pipeline_v8.py` using the whole Klf14 data set with v8 of the pipeline, and used in later experiments. * `klf14_training_colour_histogram.npz`: statistics from Klf14 histology images to be used in colour correction * 'xbins_edge', 'xbins': edges and centres of the bins used for histogram calculations * 'hist_r_q1', 'hist_r_q2', 'hist_r_q3' * 'hist_g_q1', 'hist_g_q2', 'hist_g_q3' * 'hist_b_q1', 'hist_b_q2', 'hist_b_q3': density quartiles (Q1, Q2, Q3) for RGB channels for each bin the histogram * 'mode_r', 'mode_g', 'mode_b': modes for RGB channels (this corresponds to the most typical background colour in the histology images) * 'mean_l', 'mean_a', 'mean_b': mean intensity for Lab channels of the image * 'std_l', 'std_a', 'std_b': intensity standard deviations for Lab channels of the image * `klf14_exp_0112_training_colour_histogram.npz`: other statistics from Klf14 histology images to be used in colour correction * 'p': vector of quantile values used in ECDF calculations * 'val_r_klf14', 'val_g_klf14', 'val_b_klf14': all intensity values for the RGB channels of Klf14 training images that contain at least a white adipocyte * 'f_ecdf_to_val_r_klf14', 'f_ecdf_to_val_g_klf14', 'f_ecdf_to_val_b_klf14': linear interpolation function that maps ECDF quantiles to intensity values in the Klf14 training data set. These functions can be used together with intensity->quantile interpolation functions calculated for a new histology image to perform histogram matching colour correction * 'mean_klf14', 'std_klf14': mean and standard deviation of the 'val_r_klf14', 'val_g_klf14', 'val_b_klf14' vectors There are also weight files for the pipeline trained with all the data, instead of the 10-fold cross-validation partition. These were not used for the paper, but could be useful for future experiments * `klf14_b6ntac_exp_0101_cnn_dmap_model.h5`: Keras weights for the EDT CNN (Histology to Euclidean Distance Transform regression) * `klf14_b6ntac_exp_0104_cnn_segmentation_correction_overlapping_scaled_contours_model.h5`: Keras weights for the Correction CNN (Segmentation Correction regression) * `klf14_b6ntac_exp_0102_cnn_contour_after_dmap_model.h5`: Keras weights for the Contour CNN (EDT to Contour detection) * `klf14_b6ntac_exp_0103_cnn_tissue_classifier_fcn_model.h5`: Keras weights for the Tissue CNN (Pixel-wise tissue classifier) ### `histology.7z` (29.1 GB) 165 H&E histology whole slides from Hamamatsu scanner (`.ndpi`). ### `klf14.7z` (2.3 GB) Mice metadata, training/testing data sets for the pipeline, intermediate files created during training, and neural network weights for multiple experiments. * `klf14_b6ntac_meta_info.csv`: Klf14 mice metadata * Animal Identifier, id: unique ID for each mouse * ko_parent: heterozygous parent of origin for the KO allele (father, PAT or mother, MAT) * sex: female or male * genotype: wild type (KLF14-KO:WT) or heterozygous (KLF14-KO:Het) * BW: body weight (g) * SC: subcutaneous depot weight (g) * gWAT: gonadal depot weight (g) * Liver: livel weight (g) * cull_age: age at time of culling (days) * BW_alive: body weight measured before culling * BW_alive_date: age at time of BW_alive measure * mother: unique ID for mouse's mother * mother_genotype: mouse's mother genotype * `klf14_b6ntac_training`: Directory with hand-traced segmentations of training histology windows. 131 windows sampled from 20 whole slides, plus hand-traced contours that were used for training DeepCytometer and compute population distributions. These segmentations were used for CNN training, but note that there's a cleaned-up version of these data below, and it was the cleaned-up version that was used for the paper experiments * `ndpifile_row_YYYYYY_col_XXXXXX[.tif/.xcf/.svg]`: * ndpifile: name of the whole slide file (e.g. `KLF14-B6NTAC 36.1c PAT 98-16 C1 - 2016-02-11 10.45.00`) * row_YYYYYY: Y-coordinate of the top-left corner of the sampling window, in pixels * col_XXXXXX: X-coordinate of the top-left corner of the sampling window, in pixels * .tif: TIFF file with the histology sampling window * .xcf: Gimp file with the histology and hand-traced contours (the contours were drawn in Gimp) * .svg: SVG (Scalable Vector Graphics) that contains the hand-traced contours in the XCF file * `klf14_b6ntac_training_v2`: Same as `klf14_b6ntac_training`, but the hand-traced data set was cleaned up to remove small contours of dubious cells, or cells that are fully overlapped by others * `klf14_b6ntac_training_non_overlap`: Directory with intermediate images to train the networks. These images are generated by script [`klf14_b6ntac_training_non_overlap`](https://github.com/MRC-Harwell/cytometer/blob/main/scripts/klf14_b6ntac_exp_0077_generate_non_overlap_training_images.py) * `klf14_b6ntac_training_augmented`: Directory with intermediate images used to train the networks (using augmentation to reduce overfitting). These images are generated by script [`klf14_b6ntac_exp_0078_generate_augmented_training_images.py`](https://github.com/MRC-Harwell/cytometer/blob/main/scripts/klf14_b6ntac_exp_0078_generate_augmented_training_images.py) * `klf14_b6ntac_seg`: Deprecated. Directory to store whole slide coarse segmentations in old experiments (e.g. `klf14_b6ntac_exp_0076_generate_training_images.py`). Of little interest for most users * `klf14_b6ntac_results`: Deprecated. Directory to store miscellanea output from some experiments. Of little interest for most users ## Casero et al. (2021). "Klf14 mouse white adipose tissue histology DeepZoom files and AIDA annotations for visualisation of DeepCytometer white adipocyte segmentations" (doi: 10.5281/zenodo.5149005) ### `aida_data_Klf14_v8_images.7z` (16.9 GB) Histology images converted to DeepZoom so that they can be visualised with [AIDA](https://github.com/alanaberdeen/AIDA). To use this, decompress this file and put the resulting `images` directory in your `AIDA/dist/data/` directory. ### `aida_data_Klf14_v8_annotations.7z` (18 GB) White adipocyte segmentations in AIDA annotation `.json` files (one contour per cell, one file per whole slide). Each slide has the following files: * `SLIDENAME.json`: Soft link to the annotations file that we want to associate to slide `SLIDENAME.ndpi`, e.g. `SLIDENAME` = `KLF14-B6NTAC-PAT-39.2d 454-16 B1 - 2016-03-17 12.16.06` * `SLIDENAME.lock`: Empty file used to tell the pipeline that `SLIDENAME.ndpi` has already been processed or is being currently processed * `SLIDENAME_coarse_mask.npz`: File with the coarse tissue segmentation of `SLIDENAME.ndpi` and the internal state of the pipeline (execution times, steps, etc) * `SLIDENAME_exp_0106_auto.json`: Annotations (all segmentations without filtering from the Auto algorithm, i.e. segmentation without object overlap). Contours are grouped by the tile they were processed in * `SLIDENAME_exp_0106_auto_aggregated.json`: Filtered annotations (non-white adipocytes removed) of the Auto algorithm. All contours aggregated into a single tile * `SLIDENAME_exp_0106_corrected.json`: Annotations (all segmentations without filtering from the Corrected algorithm, i.e. segmentation with object overlap). Contours are grouped by the tile they were processed in * `SLIDENAME_exp_0106_corrected_aggregated.json`: Filtered annotations (non-white adipocytes removed) of the Corrected algorithm. All contours aggregated into a single tile To use this, decompress this file and put the resulting `annotations` directory in your `AIDA/dist/data/` directory.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
URL	https://zenodo.org/record/5149004


Description	Adipocyte biology (Claussnitzer)
Organisation	Broad Institute
Country	United States
Sector	Charity/Non Profit
PI Contribution	T2D association data
Collaborator Contribution	Access to adipocyte genomic data
Impact	none to date
Start Year	2018


Description	Adipose tissue expression
Organisation	Icahn School of Medicine at Mount Sinai
Country	United States
Sector	Academic/University
PI Contribution	Collabroation over KLF14 knockout mice
Collaborator Contribution	Information from human and murine expression studies
Impact	Information from their proprietary data base
Start Year	2012


Description	Bo Ahren collaboration
Organisation	Lund University
Country	Sweden
Sector	Academic/University
PI Contribution	This relates to responding to reviewer comments on our Nature submission
Collaborator Contribution	Sharing of data from adipose tissue biopsies made by Bo Ahren's group
Impact	Sharing of data on adipocyte size from adipose biopsies
Start Year	2015


Description	METSIM
Organisation	University of California, Los Angeles (UCLA)
Department	School of Medicine UCLA
Country	United States
Sector	Academic/University
PI Contribution	We collaborate on understanding the mechanisms underlying KLF14 effects on T2D
Collaborator Contribution	They have contributed data from their studies
Impact	Collaborative research
Start Year	2014


Description	METSIM
Organisation	University of Eastern Finland
Country	Finland
Sector	Academic/University
PI Contribution	We collaborate on understanding the mechanisms underlying KLF14 effects on T2D
Collaborator Contribution	They have contributed data from their studies
Impact	Collaborative research
Start Year	2014


Description	Nobrega collaboration
Organisation	University of Chicago
Country	United States
Sector	Academic/University
PI Contribution	This is a collaboration to furnish additional data to support revision of our main manuscript which has been reviewed at Nature.
Collaborator Contribution	KLF14 ChipSeq
Impact	Marcelo Nobrega is completing studies that will add to our response to reviewers
Start Year	2016


Description	STEMBANCC
Organisation	Eli Lilly & Company Ltd
Country	United Kingdom
Sector	Private
PI Contribution	This is a new IMI collaboration (total funding 53M€ from IMI and in kind) to Oxford and 25 other academic institutions.
Collaborator Contribution	We will provide clinical material (from patients), characterise stem cell derived tissues derived and manage the diabetes work package
Impact	Nil to date
Start Year	2012


Description	STEMBANCC
Organisation	F. Hoffmann-La Roche AG
Country	Global
Sector	Private
PI Contribution	This is a new IMI collaboration (total funding 53M€ from IMI and in kind) to Oxford and 25 other academic institutions.
Collaborator Contribution	We will provide clinical material (from patients), characterise stem cell derived tissues derived and manage the diabetes work package
Impact	Nil to date
Start Year	2012


Description	STEMBANCC
Organisation	Novo Nordisk
Country	Denmark
Sector	Private
PI Contribution	This is a new IMI collaboration (total funding 53M€ from IMI and in kind) to Oxford and 25 other academic institutions.
Collaborator Contribution	We will provide clinical material (from patients), characterise stem cell derived tissues derived and manage the diabetes work package
Impact	Nil to date
Start Year	2012


Description	STEMBANCC
Organisation	Sanofi
Department	Aventis
Country	France
Sector	Private
PI Contribution	This is a new IMI collaboration (total funding 53M€ from IMI and in kind) to Oxford and 25 other academic institutions.
Collaborator Contribution	We will provide clinical material (from patients), characterise stem cell derived tissues derived and manage the diabetes work package
Impact	Nil to date
Start Year	2012


Description	The MUTHER (Multiple Tissues for Human Expression Resource) Consortium
Organisation	King's College Hospital
Country	United Kingdom
Sector	Hospitals
PI Contribution	We have contributed to sample preparation, database design and to data analysis
Collaborator Contribution	it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits.
Impact	We have assembled one of the largest tissue expression banks available and are instituting detailed analysis. Two papers have already been published in high profile journals (Nica, PLoS GENETICS 2010; Small Nature genetics 2011) and further publications are in preparation as of Oct 2011.
Start Year	2007


Description	The MUTHER (Multiple Tissues for Human Expression Resource) Consortium
Organisation	The Wellcome Trust Sanger Institute
Country	United Kingdom
Sector	Charity/Non Profit
PI Contribution	We have contributed to sample preparation, database design and to data analysis
Collaborator Contribution	it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits.
Impact	We have assembled one of the largest tissue expression banks available and are instituting detailed analysis. Two papers have already been published in high profile journals (Nica, PLoS GENETICS 2010; Small Nature genetics 2011) and further publications are in preparation as of Oct 2011.
Start Year	2007


Description	The MUTHER (Multiple Tissues for Human Expression Resource) Consortium
Organisation	University of Geneva
Country	Switzerland
Sector	Academic/University
PI Contribution	We have contributed to sample preparation, database design and to data analysis
Collaborator Contribution	it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits. it has supported methods development, and provided samples and a collaborative network for analysis of expression and methylation data relevant to multiple human traits.
Impact	We have assembled one of the largest tissue expression banks available and are instituting detailed analysis. Two papers have already been published in high profile journals (Nica, PLoS GENETICS 2010; Small Nature genetics 2011) and further publications are in preparation as of Oct 2011.
Start Year	2007


Description	UCLA collaboration on KLF14
Organisation	University of California, Los Angeles (UCLA)
Department	School of Medicine UCLA
Country	United States
Sector	Academic/University
PI Contribution	Sharing of data regarding role of KLF14
Collaborator Contribution	Sharing of data regarding role of KLF14
Impact	None so far
Start Year	2013


Description	UPenn collaboration re KLF14
Organisation	University of Pennsylvania
Country	United States
Sector	Academic/University
PI Contribution	Sharing of data regarding role of KLF14
Collaborator Contribution	Sharing of data regarding role of KLF14
Impact	None as yet
Start Year	2015


Description	American Society of Nephrology, meeting, New Orleans
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshop and talk at ASN 2017
Year(s) Of Engagement Activity	2017


Description	Conference on personalised nutrition, Shanghai China
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	International conference on personalised nutrition organised by Chinese colleagues
Year(s) Of Engagement Activity	2017


Description	East Meets West Conference Hong Kong
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Lecture, debates, discussions about diabetes genetics in East Asia and beyond
Year(s) Of Engagement Activity	2016


Description	Genomics for Clinicians meeting
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	I gave a presentation at this 4 day workshop on the value of diabetes genetics in genomic medicine
Year(s) Of Engagement Activity	2017


Description	International Diabetes federation, Abu DHabi
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	PResentation at IDF meeting
Year(s) Of Engagement Activity	2017


Description	Leena peltonen School of Human Genetics
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Type Of Presentation	Workshop Facilitator
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	I co-organise this meeting which is attended by 20+ PhD students each year. They present to, and interact with 20+ senior academics providing a unique environment Feedback from participants overwhelmingly positive
Year(s) Of Engagement Activity	2009,2010,2011,2012,2013,2014,2015,2016


Description	Personalised genomics
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Public debate; conference presentations Improved understanding of the issues by the audience
Year(s) Of Engagement Activity	2009,2012,2013,2015,2016


Description	Scientific presentations and seminars
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Type Of Presentation	Keynote/Invited Speaker
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Data from these two grants has been presented at a large number of international meetings including American Diabetes Association, American Soc Human Genetics, Genomics of Common Diseases and other meetings (approx 20 a year) Large audiences
Year(s) Of Engagement Activity	2012,2013,2015,2016


Description	chinese diabetes society, Chongqing
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Presentation to several thousand people at Chinese DIabetes Association
Year(s) Of Engagement Activity	2017