Kidney Cancer Staging using Deep Learning Neural Network: Comparing Models Trained on Whole Kidney with Cancer and Only the Cancer

N. Hadjiyski
Ann Arbor Pioneer High School,
United States

Keywords: kidney cancer staging, AI, deep learning, CT, whole kidney, only cancer


The American Cancer Society’s most recent estimates for kidney cancer in the United States for 2021 are that about 76,080 new cases of kidney cancer (48,780 in men and 27,300 in women) will occur, and about 13,780 people (8,790 men and 4,990 women) will die from this disease. In kidney cancer, Stage 1 is an important threshold for the decision of organ preservation surgery versus chemotherapy and organ removal for higher stages. Incorrect staging results in under- or over-treatment. Previously a Deep Learning Neural Network (DLNN) was developed to predict kidney cancer Stage 1 versus higher stages using 3D computer tomography (CT) scans of kidney cancer. The purpose of this project is to compare the classification accuracy of DLNN trained on cropped CT images of kidneys containing cancer to the classification accuracy of DLNN trained on cropped CT images of only kidney cancers. The publicly available cancer research database from the National Cancer Institute TCIA provided anonymized 3D CT scans and clinical data from 227 patients with 231 kidney cancers of different stages. Both the whole kidneys containing cancer and the cancers alone were cropped from the 3D CT scans and used for training and testing of the DLNNs. Approximately 7800 cropped CT images were obtained for the whole kidneys and 4200 for the cancers alone. Inception V3 deep learning network structure (IV3-DLNN) within the TensorFlow platform was used. Transfer learning technique was used to train the IV3-DLNN for the task of staging kidney cancer. The IV3-DLNN was first trained on the ImageNet dataset, which consists of more than 1,000,000 natural scene images. Then only a part of the IV3-DLNN was retrained with the kidney cancer data. Two different IV3-DLNNs were trained. The input of the first one was the cropped CT images of kidneys with cancer. The input of the second one was the cropped CT images of the kidney cancers alone. The output of both IV3-DLNNs was the likelihood of the kidney cancer being Stage 1. The CT datasets for the kidneys with cancer and the corresponding cancers were split into 48% training, 10% validation, and 42% testing sets respectively. The IV3-DLNNs were trained using the training and the validation sets, and tested on the test sets to ensure accurate and reliable evaluation results. The IV3-DLNNs were trained on the training sets until the training error converged. The best IV3-DLNNs were selected using the validation sets and then ran on the test sets. The classification accuracy was estimated by the area under the ROC curve (AUC). The AUC for the IV3-DLNN trained on the whole kidney was 0.96 for training, 0.88 for validation and 0.87 for test sets. The AUC for the IV3-DLNN trained on the cancer alone was of 0.97 for training, 0.91 for validation and 0.90 for test sets. These results show that the IV3-DLNN trained on the kidney cancer alone is slightly more accurate than the IV3-DLNN trained on the whole kidney. Both AI systems show promise for potentially assisting physicians in kidney cancer staging.